Skip to main content
Solved

Duplicate source_id MDM


Can we have duplicate source_id column values in mdm full load?

Best answer by ivysakh

Hi @aish_TF ,

The party_source_id, being a FK in the contact entity should not cause you any duplicate errors in the load operation. However, in order to simplify the model complexity, you can do below:

  • Introduce an intermediate entity to represent the "Party-Location" relationship. This entity would have a 1:N relationship with both Party and Contact.

    • Party 1:N Party-Location 1:N Contact
  • Key Definition:

    • Party-Location: The primary key would be a combination of party_source_id and ph_location.
    • Contact: The primary key would be ph_no (assuming phone number is unique within a location).
  • Mapping:

    • Map party_source_id and ph_location from your source to the Party-Location entity.
    • Map ph_no, contact_type, and other contact details to the Contact entity.
  • This method povides a more accurate representation of the relationships in your data and helps avoid duplicates and ensures contacts are correctly linked to parties and locations.

 

Cheers!

Vysakh Indrasenan

View original
Did this topic help you find an answer to your question?

5 replies

  • Author
  • Star Blazer L1
  • 40 replies
  • October 3, 2024

Adding some more information,
I have a1:N relationship between party and contact entities. i.e. a party record has multiple contact values  associated with it in a source system.
Example

Party

party_id party_name
01 abc
02 fgh

 

Contact for party_id 01

party_id ph_id ph_no ph_location
01 01 9387782823 A
01 02 2738467389 B
01 03 9384677283 C



I want all the contact values to be a part of the master contact record(based on ph location- one for each location. And hence, I am defining contact_type to represent each location. However, this situation associates multiple party_source_id for each ph number record and hence party_source_id will have duplicates in load for contact.(see source column mapping SS above). I am concerned if this will break the load operation.

PS: The model defined has a 1:N relation between party and contact entities
 


ivysakh
Space Explorer
Forum|alt.badge.img+3
  • Space Explorer
  • 22 replies
  • Answer
  • October 7, 2024

Hi @aish_TF ,

The party_source_id, being a FK in the contact entity should not cause you any duplicate errors in the load operation. However, in order to simplify the model complexity, you can do below:

  • Introduce an intermediate entity to represent the "Party-Location" relationship. This entity would have a 1:N relationship with both Party and Contact.

    • Party 1:N Party-Location 1:N Contact
  • Key Definition:

    • Party-Location: The primary key would be a combination of party_source_id and ph_location.
    • Contact: The primary key would be ph_no (assuming phone number is unique within a location).
  • Mapping:

    • Map party_source_id and ph_location from your source to the Party-Location entity.
    • Map ph_no, contact_type, and other contact details to the Contact entity.
  • This method povides a more accurate representation of the relationships in your data and helps avoid duplicates and ensures contacts are correctly linked to parties and locations.

 

Cheers!

Vysakh Indrasenan


  • Author
  • Star Blazer L1
  • 40 replies
  • October 9, 2024

Hi @ivysakh 
I see what you are suggesting. This would give me a M:N relation between party and contact. What I was thinking of is this-
the contact entity can have contact_type as {location}_PHONE and contact_value as whatever is the value. As I mentioned, this would mean that my party_source_id would have duplicate values. 
There are two reasons for me to have it this way:

  1. The party entity already has the primary location, if I bring in a location entity, it might confuse the business users.
  2. The contact entity has multiple locations only in one source system, there are other source system which do not have a location or even duplicates for phone number values.

Please let me know if my approach makes sense, or is there any way apart from creating a M:N relationship between party and contact. In theory, I don’t think it is a 1:N relationship as a party has N contacts but the reverse cannot be true. 


ivysakh
Space Explorer
Forum|alt.badge.img+3
  • Space Explorer
  • 22 replies
  • October 21, 2024

Hi @aish_TF ,

I agree, if location isn't a major analytical dimension in your model, a separate entity might be overkill. It will keep the model simpler and potentially easier for business users to understand. Your approach of having the location information captured in the  contact_type. attribute would work, or rather add a new attribute called ph_location directly to the contact entity. This attribute would store the location code ("A", "B", "C", etc.) for each phone number. This would mean:

  • Explicit location information: The ph_location attribute clearly stores the location of each phone number.
  • Simpler than M:N: Avoids the complexity of an intermediate "Party-Location" entity.
  • More flexible for future analysis: You can easily query and analyze contacts based on their location.

Combining location and  contact_type might make the contact_type less intuitive and harder to manage.


  • Author
  • Star Blazer L1
  • 40 replies
  • October 22, 2024

Hi  @ivysakh 
Will try this approach.

Thank you :)


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings