Hi,We've loaded 2000+ terms and assigned those with desktop as a migration from a data dictionary. When we now have new catalog items, Ataccama suggests 20+ terms for a CI Attribute. This is not usable.How we tweak it to:Limit to only top 5 Stop when a term is already assigned.And here's an idea: If the column header is the same or can be matched using a simple fuzzy match, that is a better indicator than most AI.Another: Please skip some of the datatypes, a boolean … probably not a good idea.Any idea's?We are also wondering if we can turn it off.Thank you in advance!Marnix

Solved

Term suggestions overwhelming - How to tweak and make it stop.

2 years ago
June 13, 2023
8 replies
127 views

Marnix Wisselaar
Star Blazer L2

Hi,

We've loaded 2000+ terms and assigned those with desktop as a migration from a data dictionary. When we now have new catalog items, Ataccama suggests 20+ terms for a CI Attribute. This is not usable.

How we tweak it to:

Limit to only top 5
Stop when a term is already assigned.

And here's an idea: If the column header is the same or can be matched using a simple fuzzy match, that is a better indicator than most AI.

Another: Please skip some of the datatypes, a boolean … probably not a good idea.

Any idea's?

We are also wondering if we can turn it off.

Thank you in advance!

Marnix

Best answer by anna.spakova

Hi!

Thanks for your feedback, I will hand it over the our engineering team.

There are some settings you may try to play with for the AI suggestions (if you have on-premise deployment): https://support.ataccama.com/home/docs/aip/latest/installation-guides/one-platform-configuration-reference/configuring-term-suggestions-services/recommender-configuration#RecommenderConfiguration-TermSuggestionsAIParameters

Now, as for the amount - is it all AI or do you also use detection rules? If a detection rule is too general, it can be incorrectly assigned to attributes that don’t contain the desired data. In this case we recommend to remove it or tune it’s configuration (either change the expression or change the threshold, see below printscreen). As you mention, boolean values often cause an issue because they can be found in some lookups we use out of the box, e.g. for cities or names. In these cases AI is usually much better, but it needs to be trained.

Also, if you start rejecting or approving the suggestions, AI will recompute the suggestions and some should disappear (but of course also appear). In general the more you reject, the more conservative the AI should become and start suggesting much less.

AI can be turned off for each term, just uncheck the AI enabled checkbox:

If you want to remove the detection rule, you need to unassign it from the term settings.

Lastly, regarding the headers → this is actually comming is some of the future releases (14.4 I believe), AI will consider additional metadata for terms like attribute names, tables names etc.

Please let me know if you have additional questions.

Kind regards,

Anna

View original

anna.spakova
Ataccamer
2 years ago
June 14, 2023

Hi!

Thanks for your feedback, I will hand it over the our engineering team.

AI can be turned off for each term, just uncheck the AI enabled checkbox:

If you want to remove the detection rule, you need to unassign it from the term settings.

Lastly, regarding the headers → this is actually comming is some of the future releases (14.4 I believe), AI will consider additional metadata for terms like attribute names, tables names etc.

Please let me know if you have additional questions.

Kind regards,

Anna

Marnix Wisselaar
Star Blazer L2
2 years ago
June 14, 2023

Thank you so much!

PetrD
Ataccamer
2 years ago
June 15, 2023

Thanks @Marnix Wisselaar for the feedback and questions! @anna.spakova gave already a comprehensive answer, but let me add a few points. I will just emphasize that the AI is designed so that it starts very general and “dumb” and adapts to the particular terms and data of each customer. So it is really important to be not only accepting the correct suggestions, but also removing the incorrect ones in order not to bias the algorithm. It might seem tedious in the beginning but it should adapt fairly quickly and most of the wrong suggestions should disappear.
Keep in mind that it learns independently for each term, so even if for one it already works well because a user already accepted and rejected tens of suggestions, for another one it might still not be optimal it the algorithm does not have enough feedback.
As @anna.spakova write, the algorithm works also with attribute names and other metadata since version 14.4. However, even in this version it does not take into consideration other terms already assigned to the attribute, this is something we need yet to improve.
And last point - in earlier versions the AI algorithm learns also from the terms assigned by manual rules which can be sometimes wrong and this is often the main cause why it delivers obviously wrong suggestions. Since 14.4 this behavior is configurable and disabled by default, so it is learning just from term assigned explicitly by a user.

Marnix Wisselaar
Star Blazer L2
2 years ago
June 15, 2023

Hi PetrD,

There is always theory and practice. We migrated over 2000 terms from a data dictionary. We linked those to the CI Attributes. So to me Ataccama has ability to ‘learn’.

Then two other points:

It is too tedious to go through way too many suggestions.
It becomes even more tedious if a user has to keep doing this when there is already a term linked to a CI Attribute.

But we also think we observer AI missing some of the obvious:

An int might be a sequence-number, but it keeps coming up with other ints.
Birthdates and other dates, like loan start date or taxation data, have a certain period (min-max). But it keeps coming up with the idea that ‘a date is a date’.

Is the AI single column or multiple colums? So does it use other columns in a table as context?

PetrD
Ataccamer
2 years ago
June 15, 2023

I agree with you that the algorithm is far from ideal and I am trying to propose some solutions how to make it work better. One of them is to provide also negative examples together with the positive ones. After the migration it now sees 2000 correctly assigned terms and 0 rejected ones, so it is overly confident in the suggestions and they are often wrong. I suggest to start rejecting the suggestions and after a few rejections you should see most of the false positive suggestions for given term to disappear (after they are recomputed, which might take from minutes to hours depending on the total number of attributes in the catalog - see here).

An int might be a sequence-number, but it keeps coming up with other ints.

Birthdates and other dates, like loan start date or taxation data, have a certain period (min-max). But it keeps coming up with the idea that ‘a date is a date’.

The semantics of numbers is currently not understood by the algorithm, so if does not work well on these cases. However, even here it helps if it has not only positive examples, but also diverse negative examples where the term does not belong.

Is the AI single column or multiple colums? So does it use other columns in a table as context

No, it currently does not use the other columns of metadata from the table, just the data content of a single column. From 14.4 it uses some metadata related to the attribute, catalog item or source where they are located, but still not combining the knowledge from multiple columns.

Marnix Wisselaar
Star Blazer L2
2 years ago
June 15, 2023

Can I automate the refusals. I see via the mm-reader there are attributes. But when I delete the suggestion, it seems to come back with the same suggestion. So do I only set the attribute?

PetrD
Ataccamer
2 years ago
June 15, 2023

Yes, you can. There is a special GraphQL mutation for it:

mutation BulkTermSuggestionResolution($resolutions: [TermSuggestionResolution!]!) {
  bulkTermSuggestionResolution(termSuggestionResolutions: $resolutions) {
    nodePath
    gid
    __typename
  }
}

where you put the term suggestion GID and resolution into variables:

{
  "resolutions": [
    {
      "termSuggestionId": "660eef5e-0000-7000-0000-00000009c98b",
      "status": "REJECTED"
    }
  ]
}

Note that this does not publish the rejection, so you need to publish it explicitly afterwards:

mutation EntityPublishMutation($gid: GID!) {
  termSuggestionPublish(gid: $gid) {
    __typename
}

variables:

{
  "gid": "660eef5e-0000-7000-0000-00000009c991",
  "draft_featuresContributions_size": 10
}

Marnix Wisselaar
Star Blazer L2
1 year ago
September 4, 2023

Thank you for this. The GraphQL works fine, but the MetadataWriter will also do the trick.

Reply

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Reply

Related topics

How to enable IPv6 on Arris G36icon

how to enable ipv6 on arris G34icon

SB8200 and Cox IPv6icon

how to enable the bridge modem on g34icon

How to Set Up VPN on G36icon

Most Liked this week

Sign up

Login to the Ataccama Community

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings