Skip to main content
Solved

DQS matching component results


Forum|alt.badge.img+1

DQS matching component results are not reflected when bundle is deployed and the entity is re-processed. I did re-build the repo with the new Extended unifier but no effect. What could be a reason?

Best answer by Ales

Iโ€™m not sure I follow now. If you are getting R roles that means you have bigger MATCH_CAN group than the number of MAX ITERATIONS in that Unify operation. i.e. you can actually have some records grouped together and some not because they were rejected as Renegates.

Seems like you are getting Renegates already from the first Matching operation. Isnโ€™t that you have much bigger groups than before? Could you please do the profile/frequency count on the pri_match_can_id?

 

View original
Did this topic help you find an answer to your question?

18 replies

Ales
Ataccamer
Forum|alt.badge.img+2
  • Ataccamer
  • 27 replies
  • June 13, 2024

Hi @sgilla , could you please share at least screenshot of your DQS plan or which steps you are using specifically.

What are the changes that you expect some effect to happen?

Have you changed some matching rules? Can you share what exactly you changed?

Thanks

Ales


ivysakh
Space Explorer
Forum|alt.badge.img+3
  • Space Explorer
  • 22 replies
  • June 13, 2024

Hi @sgilla ,

It could be that the rematch flag is not set appropriately when the reprocessing is triggered. As Ales mentioned, we would need more information to troubleshoot this. 


Forum|alt.badge.img+1
  • Author
  • Data Pioneer
  • 36 replies
  • June 13, 2024

Thank you for the response. I am not sure what screenshots to provide. But I have this entity where the leverage was on Name, address and a type. Name and type are usually populated and address was not. So I cut down on the matching rules to focus more on the name and type.

 

Which should ensure records with or without address to match together, when the name and type are same. I did a repo rebuild with this update, followed by reprocess. But no change. 

About re-match flag I think this is what needs to be commented to ensure it grabs all the records even if already mastered

Which is good too. 


ivysakh
Space Explorer
Forum|alt.badge.img+3
  • Space Explorer
  • 22 replies
  • June 13, 2024

Hi @sgilla , 

Could you attach the component here, so that I can take a look? 

Thanks,
Vysakh


Forum|alt.badge.img+1
  • Author
  • Data Pioneer
  • 36 replies
  • June 17, 2024

Please check the zip file. I will ping you the password.


Ales
Ataccamer
Forum|alt.badge.img+2
  • Ataccamer
  • 27 replies
  • June 18, 2024

Hi @sgilla , I was not able to open the zip file as itโ€™s encrypted (I can take a look later if you share the password with me) however based on your screenshots itโ€™s obvious you are using Extended Unification step.

If the goal is to adjust matching rules and reprocess all the data, you should be able to achieve by just processing ALL the data using the updated rules.

I have a few additional questions:

  1. โ€œI did a repo rebuild with this update, followed by reprocessโ€ What exactly you did to rebuild the repo (read the data from repo and write it back)? I guess by reprocess you mean you sent all the data trough the step again.
  2. Do you need to keep the repository or can you just drop it? This would mean you would lose your already calculated unification keys/IDs of course.
  3. have you tested your new matching rules with a completely empty repo? Just to be sure that your new setup work for newly incoming data.

Thanks,

Ales


Forum|alt.badge.img+1
  • Author
  • Data Pioneer
  • 36 replies
  • June 18, 2024
  1. โ€œI did a repo rebuild with this update, followed by reprocessโ€ What exactly you did to rebuild the repo (read the data from repo and write it back)? I guess by reprocess you mean you sent all the data trough the step again. - YES
  2. Do you need to keep the repository or can you just drop it? This would mean you would lose your already calculated unification keys/IDs of course.- I did DROP and rebuilt repo with the extended unifier
  3. have you tested your new matching rules with a completely empty repo? Just to be sure that your new setup work for newly incoming data..- I did DROP and rebuilt repo with the extended unifier. I did check few records on repo and they looked like they had same master_id. Followed by bundle update to OHD and reprocessed the subject. When I debug or create a test plan on DQS and pass the data through matching component it works very well. But when I delete the records automatically by passing deleted records so they are deleted on source /instance/repo completely. Followed by insert, the matching updates are not seen. Neither is the subject reprocess helping. Thank you for the help.  

Thanks,


Forum|alt.badge.img+1
  • Author
  • Data Pioneer
  • 36 replies
  • June 18, 2024

It seems when the Pri_match_can_ID is same on all records with same name, while pri_master_id is different (supposed to be single master_id) and match_role is โ€˜Rโ€™ (R. Renegades or records not similar to any center in a candidate group.).  How to resolve these records to participate in the matching. 


Ales
Ataccamer
Forum|alt.badge.img+2
  • Ataccamer
  • 27 replies
  • June 19, 2024

quickly checked your component and here are my observations:

  • your Extended unification is set to NONINCREMENTAL MODE which means you always need to process all the records. The repo is never use. This seems to me as the most important one. You mentioned rebuilding the repo however your configuration does not use the repo at all. I believe if you switch to NORMAL mode you can get correct results with the REPO.
  • you have two operations configured:
    • match Payer - contains most of the matching rules
    • payer final match - just two rules, seems like just technical confirmation of the match. This one actually has the number of Iterations set to 1. i.e. there can be only 1 matching group as part of the candidate one. It might be on purpose however thatโ€™s probably why you are getting the Unification Role: R - renegate - โ€œRecords not belonging to primary group are records having primary unification role N or possibly Rโ€. You can increase this parameter to increase the number of records in the matching group.

 

โ€œ It seems when the Pri_match_can_ID is same on all records with same name, while pri_master_id is different (supposed to be single master_id) and match_role is โ€˜Rโ€™ (R. Renegades or records not similar to any center in a candidate group.).  How to resolve these records to participate in the matching.  โ€

 

If the match_can_id is the same on all records it means that you based on your KEY groups you matched all the records together. Then it goes to pri_master_id - using matching rules. The relevant unification role for this UNIFY OPERATION is pri_match_role.

The match_roles is the result of the second Unify operation (payer final match) which has the number of iterations set to 1 hence the R records (just 1 record is allowed in the candidate group otherwise โ€œrenegatesโ€)

 

 


Forum|alt.badge.img+1
  • Author
  • Data Pioneer
  • 36 replies
  • June 20, 2024

Thanks a ton Ales for your time.

  1. During repo-rebuild - do not have it set to nonincremental load. Instead it is set to RELOAD. And the repo is pointed to the right DB.
  2. I did try increating max iterations on the 2nd Unification, but it did not change. yes it was intentionally set to 1
  3. With the details you mentioned I have noticed something happening in all the cases
    Name Type pri_match_can_id Pri_master_id match_role
    ABC 1 12345 11111 S
    ABC 1 12345 11111 S
    ABC 2 12345 22222 R
    ABC 2 12345 22223 R
     the pri_match_can_id should be different. The keys always have the combination of name and type. But the match_can_id is same although the value is different. And all these with different type are assigned match_role โ€˜Rโ€™.

 

Would it be right to update center selecton on 2nd unify operation

//Prefer Center records to remain the same from the first round of Matching
case(
    pri_match_role is "M",
    5,
    pri_match_role is "I",
    4,
    pri_match_role is "S",
    3,
    pri_match_role is "R",
    3,
    0
)

This renders the expected result on the test. But wonder if there is any downside on it. As i wonder why these โ€˜Rโ€™s were not included earlier. Will keep you posted further. 

@Ales Thank you again.


Forum|alt.badge.img+1
  • Author
  • Data Pioneer
  • 36 replies
  • June 20, 2024

Update: The change does render results only when the repo-rebuild is with few fewer payers and not when full-reload. Max iterations had been adjusted to a higher number too.


Ales
Ataccamer
Forum|alt.badge.img+2
  • Ataccamer
  • 27 replies
  • Answer
  • June 28, 2024

Iโ€™m not sure I follow now. If you are getting R roles that means you have bigger MATCH_CAN group than the number of MAX ITERATIONS in that Unify operation. i.e. you can actually have some records grouped together and some not because they were rejected as Renegates.

Seems like you are getting Renegates already from the first Matching operation. Isnโ€™t that you have much bigger groups than before? Could you please do the profile/frequency count on the pri_match_can_id?

 


Forum|alt.badge.img+1
  • Author
  • Data Pioneer
  • 36 replies
  • July 1, 2024

@Ales Thank you I used a higher number for max iterations, which did not healp earlier. I raised it to 10000 which helped. Thank you for the help. Is there any downside in increasing the max iterations?

 

Thank you for the help.


Ales
Ataccamer
Forum|alt.badge.img+2
  • Ataccamer
  • 27 replies
  • July 1, 2024

The only worry now is that you probably have pretty large CANDIDATE groups - if you did the profile on the can_id, you can see the maximum size.

Large groups might have impact on the performance of the whole matching process. The more iterations the engine needs to do, the longer the process takes. It can also have an impact on higher memory consumption.

In general itโ€™s a good practice not to have CAN groups bigger than 1000 records. 5k is already quite a big group, 10k might be a perf. issue.

Have you noticed any significant performance degradation with this new setup?


Forum|alt.badge.img+1
  • Author
  • Data Pioneer
  • 36 replies
  • July 9, 2024

@Ales  thank you for the help. Sorry took this long to monitor the enhancement. The performance is not bad for now as the size of the data is small. Thank you. 


Forum|alt.badge.img+1
  • Author
  • Data Pioneer
  • 36 replies
  • August 9, 2024

@Ales You are right about performance. There is another entity where candidate groups were not big but there were โ€œRโ€. And increasing the Max iterations does fix the issue but the performance degraded. Do you have any suggestions? Thank you. Since its on the same topic I added this comment here but can open a new one, let me know. Thank you again.


Ales
Ataccamer
Forum|alt.badge.img+2
  • Ataccamer
  • 27 replies
  • August 9, 2024

Hi  @sgilla , do you know how big the groups are? The acceptable number of iterations is typically somewhere around 1000. Which means if you have a group with more than 1000 records you might already have an issue (assuming none of the records will match to any other). 

The only solution for this is to

  1. review your matching rules in terms of how the candidate group is created
  2. adjust the rules so that you have candidate group size max 1000 (5k is already a big group)
  3. HW and memory adjustments - this may require some deeper analysis where the bottleneck is. However #1 and #2 is the optimal way to go as itโ€™s not typical to create such a big candidate groups and then try to find matches within those.

Forum|alt.badge.img+1
  • Author
  • Data Pioneer
  • 36 replies
  • August 19, 2024

@Ales  Thank you I am looking through it, seems like the records not getting mastered are the records which were overriden on OGC a while ago and so the master id is not getting updated. Since this opens up a new concern. I opened a new question โ€œOverrides on records in OGCโ€. 


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings