Skip to main content

I have a plan that reads a CSV file from my storage and writes the data to the Business Glossary page.

What I Want to Achieve:
I would like to enhance this plan to not only insert new terms, but also check for existing terms and update them if necessary to avoid duplicate

  • If a term already exists and nothing has changed, it should be skipped.

  • If the term exists but the definition (or any attribute) has been modified, it should update the existing entry with the new information.

  • If the term does not exist in the glossary, it should be inserted as a new entry.

Challenge:
I noticed that some solutions in the community reference a unique ID for each term---community chat on bulk term update, but my business glossary terms currently do not have an ID field.

Question:
How can I structure the plan to perform insert, update, and skip logic effectively,

 What’s the best practice for comparing and managing term updates in this scenario?

below is a snip of what my current plan that write only new plan looks like --

 

Hi ​@Susan24us ,

First a remark about the unique ID's. Every data asset in Ataccama (a term, catalog item etc) has an ID. This ID is not shown in the user screen itself, but you can see it in the url of the page. For instance like this for a term: one-ata-pr.xxx.corp/glossary/term/5877cd9a-0000-7000-0000-00000a798840/.

To find existing terms yuou can add a metadatareader that the business terms. You can then outer join the terms from the test file and the metadata reader on their names (you can consider to lowercase both names first). If there is a name. If you have a matching name from the reader, you have found an existing term. Now you can either compare all properties of the term to conclude an updated or an equal term (and filter out the equals), or you can just update all of them (the equals will be updated with the same details then).

Kind regards,

Albert


Reply