Solved

DQC match - slow incremental load

2 years ago
February 27, 2023
6 replies
100 views

sgilla
Data Pioneer

The matching component is slow at the extended unifications step for a incremental load. When a record like ‘WALGREEN%’ is loaded and there are 20K records with similar name on the instance, the records are all considered through the process of matching. Hence the process is slow when processing the increment and even if the incremental load has 11 records to process 30K records have the transaction Id and omni_modified date updated with the work order related on the instance table. When I check history, nothing changes on these records except the omni_modified_date and transaction_id to that of the work order. Tweaking the union keys to create combination of name and key factors or adding separate groups for the high count records is not helping. Any thoughts or suggestions?

Best answer by AKislyakov

Hi,

Usually, the solution is indeed to introduce additional components to union keys to minimize number of records fetched from the repository on expansion phase.

Can you share execution logs, preferably with Increased unification step logging level? (To increase unification step logging level set: unify.verbose=4 parameter.)
Also, you can check cardinality of resulting keys, by querying <REPO>_keys table, to ensure that introduction of new key factors reduced the cardinality.
Finally, what version are you using? and how many operations are there in the Extended Unification step?

View original

AKislyakov
Ataccamer
2 years ago
February 28, 2023

Hi,

Usually, the solution is indeed to introduce additional components to union keys to minimize number of records fetched from the repository on expansion phase.

Can you share execution logs, preferably with Increased unification step logging level? (To increase unification step logging level set: unify.verbose=4 parameter.)
Also, you can check cardinality of resulting keys, by querying <REPO>_keys table, to ensure that introduction of new key factors reduced the cardinality.
Finally, what version are you using? and how many operations are there in the Extended Unification step?

sgilla
Data Pioneer
2 years ago
February 28, 2023

Appreciate your response. I did see improvement after adding a new union key.

I will check if any cardinality and identify new keys.

Version we are using Omni 3.16 and DQS 12.6.3.

sgilla
Data Pioneer
2 years ago
February 28, 2023

I tried to look into repo_keys, When you say cardinality you mean the counts should be low when

SELECT pk
,op
,left([ukey],3)
,count(*)
FROM [omnirepo].[dbo].[repos_providerpractice_keys]
group by pk
,op
,left([ukey],3)
having count(*) >1

Is there a guide describing different fields on the repo table?

Thank you again.

AKislyakov
Ataccamer
2 years ago
March 1, 2023

Unfortunately, there is no guide. But I find it quite straightforward (take it with a grain of salt, I’m not the developer):

pk -- primary key of the record.
op -- operation #, in case you have multiple unify operations inside a single step.
ukey -- unification key, all parts of the union key, combained in a signle string
gid -- the resulting Candidate Id of the record
center -- flag that the record was selected as a center of a group.

When I said cardinality, I meant something like that.

The first one shows how many records are in your largest candidate groups, i.e., how many records might be affected, if a new record arrives. And the second one, shows most frequent keys, it what you usually have to change, to make the candidate groups smaller.

select gid, count(*) group_rec_count
from <REPO>_keys
group by gid order by 2 desc

select ukey, count(*) key_rec_count
from <REPO>_keys
group by ukey order by 2 desc

sgilla
Data Pioneer
2 years ago
March 2, 2023

Thank you for the detailed explanation, it really helps.

I added a union key which helped with the performance. I had to adjust the matching rules based on the new union key as it was a new field. I am currently testing.

sgilla
Data Pioneer
2 years ago
March 3, 2023

Thanks a ton, it was helpful. Except I am trying to resolve the bundle deployment issue which always an issue. I will create a different topic for it to discuss.

Reply

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Reply

Related topics

Social media customer care

The Arboretum’s Holiday Giveaway Extravaganza Hub

Giveaway Day 9: Share Your Social Customer Care Wins & Tips! 💬

The 2023 State of Social Media: AI & Data Take Center Stage

CSAT Benchmarkingicon

Sign up

Login to the Ataccama Community

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings