Solved

DQS matching component results

1 year ago
June 10, 2024
18 replies
136 views

sgilla
Data Pioneer
36 replies

DQS matching component results are not reflected when bundle is deployed and the entity is re-processed. I did re-build the repo with the new Extended unifier but no effect. What could be a reason?

Best answer by Ales

I’m not sure I follow now. If you are getting R roles that means you have bigger MATCH_CAN group than the number of MAX ITERATIONS in that Unify operation. i.e. you can actually have some records grouped together and some not because they were rejected as Renegates.

Seems like you are getting Renegates already from the first Matching operation. Isn’t that you have much bigger groups than before? Could you please do the profile/frequency count on the pri_match_can_id?

View original

Did this topic help you find an answer to your question?

Ales
Ataccamer
27 replies
1 year ago
June 13, 2024

Hi @sgilla , could you please share at least screenshot of your DQS plan or which steps you are using specifically.

What are the changes that you expect some effect to happen?

Have you changed some matching rules? Can you share what exactly you changed?

Thanks

Ales

ivysakh
Space Explorer
22 replies
1 year ago
June 13, 2024

Hi @sgilla ,

It could be that the rematch flag is not set appropriately when the reprocessing is triggered. As Ales mentioned, we would need more information to troubleshoot this.

Vysakh Indrasenan

sgilla
Author
Data Pioneer
36 replies
1 year ago
June 13, 2024

Thank you for the response. I am not sure what screenshots to provide. But I have this entity where the leverage was on Name, address and a type. Name and type are usually populated and address was not. So I cut down on the matching rules to focus more on the name and type.

Which should ensure records with or without address to match together, when the name and type are same. I did a repo rebuild with this update, followed by reprocess. But no change.

About re-match flag I think this is what needs to be commented to ensure it grabs all the records even if already mastered

Which is good too.

ivysakh
Space Explorer
22 replies
1 year ago
June 13, 2024

Hi @sgilla ,

Could you attach the component here, so that I can take a look?

Thanks,
Vysakh

Vysakh Indrasenan

sgilla
Author
Data Pioneer
36 replies
1 year ago
June 17, 2024

Please check the zip file. I will ping you the password.

1 Attachments

payer_matching_2.zip

Ales
Ataccamer
27 replies
1 year ago
June 18, 2024

Hi @sgilla , I was not able to open the zip file as it’s encrypted (I can take a look later if you share the password with me) however based on your screenshots it’s obvious you are using Extended Unification step.

If the goal is to adjust matching rules and reprocess all the data, you should be able to achieve by just processing ALL the data using the updated rules.

I have a few additional questions:

“I did a repo rebuild with this update, followed by reprocess” What exactly you did to rebuild the repo (read the data from repo and write it back)? I guess by reprocess you mean you sent all the data trough the step again.
Do you need to keep the repository or can you just drop it? This would mean you would lose your already calculated unification keys/IDs of course.
have you tested your new matching rules with a completely empty repo? Just to be sure that your new setup work for newly incoming data.

Thanks,

Ales

sgilla
Author
Data Pioneer
36 replies
1 year ago
June 18, 2024

“I did a repo rebuild with this update, followed by reprocess” What exactly you did to rebuild the repo (read the data from repo and write it back)? I guess by reprocess you mean you sent all the data trough the step again. - YES
Do you need to keep the repository or can you just drop it? This would mean you would lose your already calculated unification keys/IDs of course.- I did DROP and rebuilt repo with the extended unifier
have you tested your new matching rules with a completely empty repo? Just to be sure that your new setup work for newly incoming data..- I did DROP and rebuilt repo with the extended unifier. I did check few records on repo and they looked like they had same master_id. Followed by bundle update to OHD and reprocessed the subject. When I debug or create a test plan on DQS and pass the data through matching component it works very well. But when I delete the records automatically by passing deleted records so they are deleted on source /instance/repo completely. Followed by insert, the matching updates are not seen. Neither is the subject reprocess helping. Thank you for the help.

Thanks,

sgilla
Author
Data Pioneer
36 replies
1 year ago
June 18, 2024

It seems when the Pri_match_can_ID is same on all records with same name, while pri_master_id is different (supposed to be single master_id) and match_role is ‘R’ (R. Renegades or records not similar to any center in a candidate group.). How to resolve these records to participate in the matching.

Ales
Ataccamer
27 replies
11 months ago
June 19, 2024

quickly checked your component and here are my observations:

your Extended unification is set to NONINCREMENTAL MODE which means you always need to process all the records. The repo is never use. This seems to me as the most important one. You mentioned rebuilding the repo however your configuration does not use the repo at all. I believe if you switch to NORMAL mode you can get correct results with the REPO.
you have two operations configured:
- match Payer - contains most of the matching rules
- payer final match - just two rules, seems like just technical confirmation of the match. This one actually has the number of Iterations set to 1. i.e. there can be only 1 matching group as part of the candidate one. It might be on purpose however that’s probably why you are getting the Unification Role: R - renegate - “Records not belonging to primary group are records having primary unification role N or possibly R”. You can increase this parameter to increase the number of records in the matching group.

“ It seems when the Pri_match_can_ID is same on all records with same name, while pri_master_id is different (supposed to be single master_id) and match_role is ‘R’ (R. Renegades or records not similar to any center in a candidate group.). How to resolve these records to participate in the matching. ”

If the match_can_id is the same on all records it means that you based on your KEY groups you matched all the records together. Then it goes to pri_master_id - using matching rules. The relevant unification role for this UNIFY OPERATION is pri_match_role.

The match_roles is the result of the second Unify operation (payer final match) which has the number of iterations set to 1 hence the R records (just 1 record is allowed in the candidate group otherwise “renegates”)

Ales

sgilla
Author
Data Pioneer
36 replies
11 months ago
June 20, 2024

Thanks a ton Ales for your time.

During repo-rebuild - do not have it set to nonincremental load. Instead it is set to RELOAD. And the repo is pointed to the right DB.
I did try increating max iterations on the 2nd Unification, but it did not change. yes it was intentionally set to 1

With the details you mentioned I have noticed something happening in all the cases

Name	Type	pri_match_can_id	Pri_master_id	match_role
ABC	1	12345	11111	S
ABC	1	12345	11111	S
ABC	2	12345	22222	R
ABC	2	12345	22223	R

the pri_match_can_id should be different. The keys always have the combination of name and type. But the match_can_id is same although the value is different. And all these with different type are assigned match_role ‘R’.

Would it be right to update center selecton on 2nd unify operation

//Prefer Center records to remain the same from the first round of Matching
case(
   pri_match_role is "M",
   5,
   pri_match_role is "I",
   4,
   pri_match_role is "S",
   3,
   pri_match_role is "R",
   3,
   0
)

This renders the expected result on the test. But wonder if there is any downside on it. As i wonder why these ‘R’s were not included earlier. Will keep you posted further.

@Ales Thank you again.

sgilla
Author
Data Pioneer
36 replies
11 months ago
June 20, 2024

Update: The change does render results only when the repo-rebuild is with few fewer payers and not when full-reload. Max iterations had been adjusted to a higher number too.

Ales
Ataccamer
27 replies
Answer
11 months ago
June 28, 2024

Ales

sgilla
Author
Data Pioneer
36 replies
11 months ago
July 1, 2024

@Ales Thank you I used a higher number for max iterations, which did not healp earlier. I raised it to 10000 which helped. Thank you for the help. Is there any downside in increasing the max iterations?

Thank you for the help.

Ales
Ataccamer
27 replies
11 months ago
July 1, 2024

The only worry now is that you probably have pretty large CANDIDATE groups - if you did the profile on the can_id, you can see the maximum size.

Large groups might have impact on the performance of the whole matching process. The more iterations the engine needs to do, the longer the process takes. It can also have an impact on higher memory consumption.

In general it’s a good practice not to have CAN groups bigger than 1000 records. 5k is already quite a big group, 10k might be a perf. issue.

Have you noticed any significant performance degradation with this new setup?

Ales

sgilla
Author
Data Pioneer
36 replies
11 months ago
July 9, 2024

@Ales thank you for the help. Sorry took this long to monitor the enhancement. The performance is not bad for now as the size of the data is small. Thank you.

sgilla
Author
Data Pioneer
36 replies
10 months ago
August 9, 2024

@Ales You are right about performance. There is another entity where candidate groups were not big but there were “R”. And increasing the Max iterations does fix the issue but the performance degraded. Do you have any suggestions? Thank you. Since its on the same topic I added this comment here but can open a new one, let me know. Thank you again.

Ales
Ataccamer
27 replies
10 months ago
August 9, 2024

Hi @sgilla , do you know how big the groups are? The acceptable number of iterations is typically somewhere around 1000. Which means if you have a group with more than 1000 records you might already have an issue (assuming none of the records will match to any other).

The only solution for this is to

review your matching rules in terms of how the candidate group is created
adjust the rules so that you have candidate group size max 1000 (5k is already a big group)
HW and memory adjustments - this may require some deeper analysis where the bottleneck is. However #1 and #2 is the optimal way to go as it’s not typical to create such a big candidate groups and then try to find matches within those.

Ales

sgilla
Author
Data Pioneer
36 replies
9 months ago
August 19, 2024

@Ales Thank you I am looking through it, seems like the records not getting mastered are the records which were overriden on OGC a while ago and so the master id is not getting updated. Since this opens up a new concern. I opened a new question “Overrides on records in OGC”.

Reply

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

1 Attachments

Reply

Related topics

Monthly Community Round-up 🟣 June 2024 🟣

DQC - matching component - huge groupsicon

Unit testing automation

NULL Valued fields on DQSicon

Closing MDM issues in bulk using Ataccama plan

Sign up

Login to the Ataccama Community

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings