Working with delta changes in Ataccama ONE

  • 17 July 2023
  • 0 replies
  • 130 views

Userlevel 5
Badge +8

Hello everyone, Anna here! I work as a projects consultant in Ataccama and today I put together a guide for you on how to work with delta changes in Ataccama ONE. Enjoy!

Introduction

As part of every project, some kind of integration with other tools is usually required. In most cases, the integration requires synchronization of some data between the tools on regular bases. When it comes to Ataccama ONE, synchronization of assets like business terms or DQ rules may be required – for example with Collibra. It is desired to synchronize only delta changes (changes that happened since the last synchronization) to keep the integration efficient.

This article explains how the delta changes can be detected in Ataccama ONE using graphQL queries. There is no simple out-of-the-box delta detection implemented, for that reason multiple graphQL queries are used in plans to achieve this goal.

Ataccama doesn’t store a last modified date or similar property within assets. Ataccama creates a History Change Number (HCN) for each change in the Ataccama platform. This HCN corresponds to a specific point in time (so there is a timestamp associated with each HCN) when a change happened.

To extract the delta changes (new or update) from Ataccama, following are the steps:

  1. Get the closest HCN for the given timestamp.
  2. Create filter for the assets using the HCN.
  3. Extract the assets using the filter with HCN.

Get the HCN number

Following is the graphQL query to get the HCN number closest to the given timestamp:

query {
mmdHcnTimestamps(
filter: "timestamp > '2022-05-01T17:00:00'"
versionSelector: { publishedVersion: true }
size: 1
) {
edges {
node {
publishedVersion {
hcn
timestamp
}
}
}
}
}

Closest because the HCN is created when some change happens, and it is very unlikely that a change happened exactly at the time of the given timestamp.

If you remove the size: 1 property, you will get all HCN numbers, that were created after the given timestamp. This property ensures, you will get the closest HCN to the timestamp.

Instead of the graphQL, you can use the ONE metadata reader step and use the timestamp filter under the Filter tab.

Example response:

Create filter for the delta changes

The format of the filter depends on the entities you are trying to get, resp. on the changes you consider to be important for the delta change. When it comes to terms, you may be interested only in the changes inside the term entity (e.g. new description), when it comes to changes in the catalog, changes that may be considered are for instance:

  • Change in a catalog item.
  • Change in an attribute of the catalog item.
  • Change in the assigned terms to an attribute in the catalog item.

As attribute is a separate entity (embedded inside catalogItem), change inside attribute will not be reflected as a change of the catalog item. The same for assigned terms, that are represented by entity termInstance, which is an embedded entity inside the attribute entity.

So simple filter can look like this:

$fromHcn > 6500

Where 6500 is some HCN number.

For the more complicated use-case with the catalog changes, the filter can look like this:

$fromHcn > 6500 or attributes.any($fromHcn > 6500 or termInstances.any($fromHcn > 6500))

This says: Give me all catalog items, that changed since HCN 6500 or their attributes changed or terms on the attributes changed.

Extract the entities with delta changes

Last step is to take the filter and extract the changed entities. Using graphQL, it can look like this:

catalogItems(versionSelector: {draftVersion: true},filter: "$fromHcn > 6500 or attributes.any($fromHcn > 6500 or termInstances.any($fromHcn > 6500))") {    
edges {
node {
gid
draftVersion {
name
}
attributes{
edges{
node{
gid
draftVersion{
name
termInstances{
edges{
node{
draftVersion{
displayName
target{
gid
}
}
}
}
}
}
}
}
}
}
}
}
}

Using the ONE metadata reader, simply put the filter in the Filter tab (the example contains already parametrized HCN number):

How to deal with deleted assets

The steps above can be used for only changed or new items. To get the deleted items, we have to follow these steps:

  • Get the HCN number.
  • Compare the entities between the HCN version and now.
  • Get only deleted entities.

The first step is the same as above. To compare the entities (e.g. terms), following query can be used (only graphQL):

query compareTerms {
compareterm(comparedNodes: {
leftVersionSelector: {versionAt: 7900},
rightVersionSelector: {publishedVersion: true}
}) {
sessionId
}
}

The versionAt is expecting the HCN number. The rightVersionSelector: {publishedVersion: true} refers to the current published version. This command will compare two versions of the database – the one at the time of the given version, and now.

This query will create the comparison but will return only its sessionID.

To get the actual assets, we need one more query:

query getcomparedTerms {
getCompareterm(sessionId: "d18b4768-c962-4d0d-ba48-302a9a8554ea", filter: "$changeType = 'LEFT_ONLY'") {
edges {
node {
left {
gid
storedVersion{
name
}
}
}
}
}
}

The filter LEFT_ONLY indicates we want only deleted changes (because publishedVersion is null for deleted items). Other options are RIGHT_ONLY (for new items), CHANGED (for changed items) or SAME (no changes). It is obvious that we could use this approach for all delta changes, so you can consider this as an alternative solution. For other change types, we have to add (or replace) into the query also right node (the query above contains only left, which is in case of RIGHT_ONLY empty):

query getcomparedTerms {
getCompareterm(sessionId: "d18b4768-c962-4d0d-ba48-302a9a8554ea", filter: "$changeType = 'RIGHT_ONLY'") {
edges {
node {
right {
gid
storedVersion{
name
}
}
}
}
}
}

Under storedVersion, you can obtain additional information about the deleted asset, e.g. name or other properties. You can use additional property compareType (inside node) to get the type of the change (SAME, LEFT_ONLY, …)

How to Try it Out

Attached to this article are example plans that you can use to see how to work with the JSON calls.

The examples contain plans to get changes in assigned terms in the catalog and deleted terms. Both plans within the last 30 days.

Resources


0 replies

Be the first to reply!

Reply