Skip to main content

Hi Community people!

I have a question regarding data observability. I am especially interested in noticing changes in record volume and applying this observability functionality in the process of a data delivery. This delivery is orchestrated with a tool, Jams in our case, starting with data retrieval and ending with the actual data delivery. Somewhere in this data flow orchestration I want to check for anomalous changes in record volume. The orchestration tool could then, for instance just before file creation, trigger an observability job to check the underlying database tables for record volume.

But when configuring data observability I see a scheduler, which does not match with the idea of an orchestrated flow.

So question 1 is if data observablity can be triggered.
Maybe in a scenario where the orchestration tool starts an Ataccama job via a command, like a workflow that starts the observability and reads the observability results and returns an OK or Not OK to the orchestration tool. Something like that. Any other suggestions are welcome of course.

Question 2 is how the observability results can be read. In the metadata model I found entity observabilityIssue. Are there other relevant entities as well?

Thanks and kind regards,

Albert

 Hello ​@Albert de Ruiter ,

thank you for your question. There is a Manual Run option under the 3 dots. That triggers the following API:

mutation RunObservability($gid: GID!) {
runObservability(sourceId: $gid) {
success
__typename
}
}

variables:
{
"gid": "5f463b19-0000-7000-0000-0000000518c7"
}

Checking the underlying jobs, it triggers many imports and profilings on the background, so it is not just one job where you can monitoring if it’s done. I would need to think about the solution here - if nothing else is running, you can simply ask for running jobs and wait until none is running. But that is usually not the case. There is an option of the email notifications so might be something that could be incorporated in your process.

As for obtaining details, I found this query:

query SourceObservabilityOverview($gid: GID!, $from: Timestamp!, $to: Timestamp!) {
source(gid: $gid) {
publishedVersion {
systemOverview(timeRange: {from: $from, to: $to}) {
generalStatistics {
connectionsCount
catalogItemsCount
__typename
}
domainStatistics {
profiledCatalogItemsCount
termsCount
termInstancesCount
catalogItemsWithTermsCount
catalogItemsWithObservedTermsCount
observedCatalogItemsWithRulesOnAttributesCount
newTermsCount
newTermInstancesCount
unresolvedIssueCount
__typename
}
dqStatistics {
totalCount
monitoredItemsCount
results {
id
name
count
__typename
}
unresolvedIssueCount
__typename
}
anomalyStatistics {
anomaliesCount
catalogItemsWithAnomaliesCount
unresolvedIssueCount
__typename
}
volumeAnomalyStatistics {
anomaliesCount
catalogItemsWithAnomaliesCount
unresolvedIssueCount
__typename
}
schemaChangeStatistics {
unresolvedIssueCount
addedAttributes
addedCatalogItems
changedAttributes
deletedAttributes
deletedCatalogItems
__typename
}
freshnessStatistics {
configuredCatalogItemCount
executedCatalogItemCount
unresolvedIssueCount
__typename
}
__typename
}
__typename
}
__typename
}
}

Variables:

{
"gid": "5f463b19-0000-7000-0000-0000000518c7",
"from": "2025-07-24T07:47:50.566Z",
"to": "2025-07-31T07:47:50.566Z"
}

Let me know if this is something that can help you. You can always tweek the queries using playground to find additional properties and options.

Kind regards,

Anna


Hi ​@anna.spakova !

Thanks for your answer, really nice input. So main thing is that how the finishing of the all underlying observability jobs can be noticed. You mention the idea of sending an e-mail, being standard functionality. But that would not be noticed as trigger by either an Ataccama workflow or the orchestration tool itself. So if some alternative solution comes ot mind that would be great.

Meanwhile we will dive into the details that you provided. I will be on holiday for a while now, so I will respond later to further reactions. I will share the url of this post within my team.

Kind regards,

Albert


Hello ​@Albert de Ruiter ,

let me ask internally if we get some different indication that observability is finished. I cannot see any other way than checking DPM (via API) to see any running jobs that were triggered as part of the observability - and not even sure if that can be easily found in other way than through the timestamp. I will get back to you as soon as I have some more information.

Kind regards,

Anna


Hi ​@Albert de Ruiter ,

I reviewed the requirement with our engineering team, and they confirmed that at the moment, there is no “parent” job for the observability itself, that you could monitor. So, the only solution is as described above in my comment, to monitor the whole DPM and jobs triggered at the time when you called the Observability. It doesn’t guarantee that those are only the observability jobs, and possibly there might be a slight shift in the timestamps if the number of jobs and data is big, but it will be close and might be enough for your use case.

I have created a feature request internally, and the engineering team will review it and consider it as an improvement.

Please let me know if you need further assistance with this.

Kind regards,

Anna


Hi ​@anna.spakova ,

Many thanks for your further investigation and creating the internal feature request. I really believe that it would be helpful for other clients as well, being able to stop a data delivery in case of a data observability anomaly.

I think for now we have some expermineting to do, based on your input. For new questions I will post a new question.

Thanks again and regards,

Albert


Reply