Exporting Catalog item along with the associated terms

Hello, is it possible to export a catalog item along with its associated terms? As you can see in the screenshot below, I set the attribute called “last_update_hostname” as confidential temporarily for an example, but when I export it to a file, I was only able to see the data values not the terms associated with it.

Page 1 / 1

Hi @Prithika ,

To export these Catalog item and related term you need to do a custom plan in one desktop to export those. To achieve this you need to use the metadata reader steps. For example I’m using one CI - CUSTOMER_SOURCE. By using the below plan i’m trying to extract all the terms linked to customer_source and for a particular attribute.

Use the catalog item and attribute metadata reader steps join using attribute_pid and catalogitem_id once you get all the catalog items and attributes linked to those you can use the terminstance metadata reader step and join them using attribute id and terminstance pid. Filter the terminstances which are null and if you need for a particular CI you can filter that out too.

Hope this helps !

Regards

Srija Piratla

Hi @Prithika, I’m closing this thread for now, if you have any follow-up questions please feel free to share them in the comments or create a new post ‍♀️

Hello, thanks a lot for your response! I was not sure how to connect the catalog item to the ONE Metadata Reader as I could not find an option to select it.

And was wondering how I would fill in these columns as part of the plan.

Please let me know if there is a way to fix these errors. Thanks in advance!

@Prithika were you looking to export the data itself, or the metadata?

I think you are after the actual data, but @srija piratla was showing how to export metadata (column names and associated terms).

How are you looking to present the data @Prithika ? Terms associated with the data vs the data itself is on a different level. Imagine you have 100k+ records in your data, where would the information about the associated term go in your extract?

@maykwok_hamilton Thanks a lot for your response. Currently, I’m looking to export just the terns associated with the data like shown in the screenshot below.

Below is the plan that I have created using One desktop but I keep getting errors and I’m not sure how to connect the catalog item to the metadata reader.Do I have to connect it using the table Id, if so how ? Please let me know if there is a way to fix these errors. Thanks in advance!

Oof. OK, we need to talk a bit about the metadata model.

Your first step, the ONE metadata reader. Its purpose is to allow you to retrieve metadata from the metadata model in ONE. You have specified to the ONE platform, that you want to get metadata info on entity type “catalogItem”.

Your Ataccama instance will have some configuration as to what properties any catalogItem will have. Among other things, any catalog items will have a name, a description, it can have multiple attributes attached, and also have terms attached at the catalogItem level.

If you look at an attribute, it can also have a name, description, and terms attached:

When you configure the ONE Metadata Reader step, you will have to request the things that exist from your ONE instance.

I prefer to drag and drop the metadata node from the ONE Metadata Explorer. This way some things are already preconfigured for me:

See that these columns are properties that already exist on the platform. Making stuff up here will cause the plan to fail, because you will request something that does not exist.

The step will only auto populate scalar properties (string, integer like types). We need the complex embedded objects, so we need to click the “Map to Entity” button and find the complex properties inside. I want the termInstances and attributes.

Embedded entity streams will now appear:

Then double click into one of them, and specify which properties of the embedded object you need:

Now… as it is, this plan will export the information on ALL catalog items, and will give you also the information of ALL the attributes belonging to them, and ALL the terms that belong to these catalog items.

If you are only interested in that particular catalog item, you’ll want to do some filtering. (In some use cases you may want to actually get all the info on all catalog items into your plan and then transform them, but let’s consider you want to just find your one catalog item)

There’s several ways to do this. I usually find the catalog item in the webapp, then get the id from the URL:

Put it into filter like this:

My ONE Metadata reader step now has 3 outputs: 1 at the root level (called “out”), and 1 each for termInstances and attributes, the 2 sub streams. I connect each of these to a text file reader to see what they give me:

Run the plan, and here are the results:

But where are the IDs? Surely there’s an ID for everything?

Yes. I forgot to add a column to allow the data stream to provide me with IDs. For these, I can give the ID column whatever name it wants. The step will put the ID into the column configured:

You can only make up names in the bottom section here, can’t make up names in the top section grid.

I configure the ID columns for each of the entity, and the extract now looks better.

And now, notice that the attributes stream, gives us the id for each attribute, and it also tells us which parent catalog item it belongs to. Same thing for the term instance. It tells us that this particular catalog item (as a parent), has a term attached.

Note that the term attached here, is only the term attached at the catalog item, not at the attribute.

To get the terms attached to the attribute, you need to get into the attribute substream of the step, click across to the “embedded entity streams” tab, and “load child streams”.

Fill that one in. In this case, because we have 2 streams that will have the same name (termInstances), the plan won’t like it, so you will need to slightly adjust the sub stream to have a different name for the 2nd termInstances output:

Connect another Text File Reader, and the output now looks like this:

Finally….

If you are just after the names of the attributes and the terms attached at the attribute level, then you don’t need the termInstances attached at the catalog item, so you can ignore that one. (I thought it’s worth clarifying as termInstances can really get attached almost anywhere in the platform)

Now you have the catalog item information, the attributes, and the terms related to attributes, you can then use a variety of the Flow Control steps to help you shape the data into the way you want to present. I recommend the Join plan in the tutorial to understand how the step works:

You would first join the “out” feed with the “attributes” feed, using the ci_id and attribute_parent_id as join keys.

Then take the attribute_termInstances feed, and join on attribute_term_instance_parent_id and attribute_id.

Does this help? This is a lot to digest. Please ask if anything is unclear!

Official reading list:

Using ONE metadata reader: https://docs.ataccama.com/one-desktop/latest/work-with-ataccama-one/read-and-write-metadata.html
Filtering using AQL: https://docs.ataccama.com/one/latest/common-actions/aql-expressions.html#overview

Reply

Sign up

Login to the Ataccama Community

Scanning file for viruses.

This file cannot be downloaded