Solved

Can Data Observability be triggered?

Forum|Forum|4 months ago
July 29, 2025
5 replies
73 views

Albert de Ruiter
Rocket Pioneer L1

Hi Community people!

I have a question regarding data observability. I am especially interested in noticing changes in record volume and applying this observability functionality in the process of a data delivery. This delivery is orchestrated with a tool, Jams in our case, starting with data retrieval and ending with the actual data delivery. Somewhere in this data flow orchestration I want to check for anomalous changes in record volume. The orchestration tool could then, for instance just before file creation, trigger an observability job to check the underlying database tables for record volume.

But when configuring data observability I see a scheduler, which does not match with the idea of an orchestrated flow.

So question 1 is if data observablity can be triggered.
Maybe in a scenario where the orchestration tool starts an Ataccama job via a command, like a workflow that starts the observability and reads the observability results and returns an OK or Not OK to the orchestration tool. Something like that. Any other suggestions are welcome of course.

Question 2 is how the observability results can be read. In the metadata model I found entity observabilityIssue. Are there other relevant entities as well?

Thanks and kind regards,

Albert

Best answer by anna.spakova

Hello @Albert de Ruiter ,

thank you for your question. There is a Manual Run option under the 3 dots. That triggers the following API:

mutation RunObservability($gid: GID!) {
  runObservability(sourceId: $gid) {
    success
    __typename
  }
}

variables:
{
  "gid": "5f463b19-0000-7000-0000-0000000518c7"
}

Checking the underlying jobs, it triggers many imports and profilings on the background, so it is not just one job where you can monitoring if it’s done. I would need to think about the solution here - if nothing else is running, you can simply ask for running jobs and wait until none is running. But that is usually not the case. There is an option of the email notifications so might be something that could be incorporated in your process.

As for obtaining details, I found this query:

query SourceObservabilityOverview($gid: GID!, $from: Timestamp!, $to: Timestamp!) {
  source(gid: $gid) {
    publishedVersion {
      systemOverview(timeRange: {from: $from, to: $to}) {
        generalStatistics {
          connectionsCount
          catalogItemsCount
          __typename
        }
        domainStatistics {
          profiledCatalogItemsCount
          termsCount
          termInstancesCount
          catalogItemsWithTermsCount
          catalogItemsWithObservedTermsCount
          observedCatalogItemsWithRulesOnAttributesCount
          newTermsCount
          newTermInstancesCount
          unresolvedIssueCount
          __typename
        }
        dqStatistics {
          totalCount
          monitoredItemsCount
          results {
            id
            name
            count
            __typename
          }
          unresolvedIssueCount
          __typename
        }
        anomalyStatistics {
          anomaliesCount
          catalogItemsWithAnomaliesCount
          unresolvedIssueCount
          __typename
        }
        volumeAnomalyStatistics {
          anomaliesCount
          catalogItemsWithAnomaliesCount
          unresolvedIssueCount
          __typename
        }
        schemaChangeStatistics {
          unresolvedIssueCount
          addedAttributes
          addedCatalogItems
          changedAttributes
          deletedAttributes
          deletedCatalogItems
          __typename
        }
        freshnessStatistics {
          configuredCatalogItemCount
          executedCatalogItemCount
          unresolvedIssueCount
          __typename
        }
        __typename
      }
      __typename
    }
    __typename
  }
}

Variables:

{
  "gid": "5f463b19-0000-7000-0000-0000000518c7",
  "from": "2025-07-24T07:47:50.566Z",
  "to": "2025-07-31T07:47:50.566Z"
}

Let me know if this is something that can help you. You can always tweek the queries using playground to find additional properties and options.

Kind regards,

Anna

anna.spakova
Ataccamer
Answer
Forum|Forum|4 months ago
July 31, 2025

Hello @Albert de Ruiter ,

thank you for your question. There is a Manual Run option under the 3 dots. That triggers the following API:

mutation RunObservability($gid: GID!) {
  runObservability(sourceId: $gid) {
    success
    __typename
  }
}

variables:
{
  "gid": "5f463b19-0000-7000-0000-0000000518c7"
}

As for obtaining details, I found this query:

query SourceObservabilityOverview($gid: GID!, $from: Timestamp!, $to: Timestamp!) {
  source(gid: $gid) {
    publishedVersion {
      systemOverview(timeRange: {from: $from, to: $to}) {
        generalStatistics {
          connectionsCount
          catalogItemsCount
          __typename
        }
        domainStatistics {
          profiledCatalogItemsCount
          termsCount
          termInstancesCount
          catalogItemsWithTermsCount
          catalogItemsWithObservedTermsCount
          observedCatalogItemsWithRulesOnAttributesCount
          newTermsCount
          newTermInstancesCount
          unresolvedIssueCount
          __typename
        }
        dqStatistics {
          totalCount
          monitoredItemsCount
          results {
            id
            name
            count
            __typename
          }
          unresolvedIssueCount
          __typename
        }
        anomalyStatistics {
          anomaliesCount
          catalogItemsWithAnomaliesCount
          unresolvedIssueCount
          __typename
        }
        volumeAnomalyStatistics {
          anomaliesCount
          catalogItemsWithAnomaliesCount
          unresolvedIssueCount
          __typename
        }
        schemaChangeStatistics {
          unresolvedIssueCount
          addedAttributes
          addedCatalogItems
          changedAttributes
          deletedAttributes
          deletedCatalogItems
          __typename
        }
        freshnessStatistics {
          configuredCatalogItemCount
          executedCatalogItemCount
          unresolvedIssueCount
          __typename
        }
        __typename
      }
      __typename
    }
    __typename
  }
}

Variables:

{
  "gid": "5f463b19-0000-7000-0000-0000000518c7",
  "from": "2025-07-24T07:47:50.566Z",
  "to": "2025-07-31T07:47:50.566Z"
}

Let me know if this is something that can help you. You can always tweek the queries using playground to find additional properties and options.

Kind regards,

Anna

Albert de Ruiter
Author
Rocket Pioneer L1
Forum|Forum|4 months ago
July 31, 2025

Hi @anna.spakova !

Thanks for your answer, really nice input. So main thing is that how the finishing of the all underlying observability jobs can be noticed. You mention the idea of sending an e-mail, being standard functionality. But that would not be noticed as trigger by either an Ataccama workflow or the orchestration tool itself. So if some alternative solution comes ot mind that would be great.

Meanwhile we will dive into the details that you provided. I will be on holiday for a while now, so I will respond later to further reactions. I will share the url of this post within my team.

Kind regards,

Albert

anna.spakova
Ataccamer
Forum|Forum|4 months ago
August 6, 2025

Hello @Albert de Ruiter ,

let me ask internally if we get some different indication that observability is finished. I cannot see any other way than checking DPM (via API) to see any running jobs that were triggered as part of the observability - and not even sure if that can be easily found in other way than through the timestamp. I will get back to you as soon as I have some more information.

Kind regards,

Anna

anna.spakova
Ataccamer
Forum|Forum|4 months ago
August 6, 2025

Hi @Albert de Ruiter ,

I reviewed the requirement with our engineering team, and they confirmed that at the moment, there is no “parent” job for the observability itself, that you could monitor. So, the only solution is as described above in my comment, to monitor the whole DPM and jobs triggered at the time when you called the Observability. It doesn’t guarantee that those are only the observability jobs, and possibly there might be a slight shift in the timestamps if the number of jobs and data is big, but it will be close and might be enough for your use case.

I have created a feature request internally, and the engineering team will review it and consider it as an improvement.

Please let me know if you need further assistance with this.

Kind regards,

Anna

Albert de Ruiter
Author
Rocket Pioneer L1
Forum|Forum|3 months ago
August 29, 2025

Hi @anna.spakova ,

Many thanks for your further investigation and creating the internal feature request. I really believe that it would be helpful for other clients as well, being able to stop a data delivery in case of a data observability anomaly.

I think for now we have some expermineting to do, based on your input. For new questions I will post a new question.

Thanks again and regards,

Albert

Sign up

Login to the Ataccama Community

Scanning file for viruses.

This file cannot be downloaded