Skip to main content

Can we have duplicate source_id column values in mdm full load?

Adding some more information,
I have a1:N relationship between party and contact entities. i.e. a party record has multiple contact values  associated with it in a source system.
Example

Party

party_id party_name
01 abc
02 fgh

 

Contact for party_id 01

party_id ph_id ph_no ph_location
01 01 9387782823 A
01 02 2738467389 B
01 03 9384677283 C



I want all the contact values to be a part of the master contact record(based on ph location- one for each location. And hence, I am defining contact_type to represent each location. However, this situation associates multiple party_source_id for each ph number record and hence party_source_id will have duplicates in load for contact.(see source column mapping SS above). I am concerned if this will break the load operation.

PS: The model defined has a 1:N relation between party and contact entities
 


Hi @aish_TF ,

The party_source_id, being a FK in the contact entity should not cause you any duplicate errors in the load operation. However, in order to simplify the model complexity, you can do below:

  • Introduce an intermediate entity to represent the "Party-Location" relationship. This entity would have a 1:N relationship with both Party and Contact.

    • Party 1:N Party-Location 1:N Contact
  • Key Definition:

    • Party-Location: The primary key would be a combination of party_source_id and ph_location.
    • Contact: The primary key would be ph_no (assuming phone number is unique within a location).
  • Mapping:

    • Map party_source_id and ph_location from your source to the Party-Location entity.
    • Map ph_no, contact_type, and other contact details to the Contact entity.
  • This method povides a more accurate representation of the relationships in your data and helps avoid duplicates and ensures contacts are correctly linked to parties and locations.

 

Cheers!

Vysakh Indrasenan


Hi @ivysakh 
I see what you are suggesting. This would give me a M:N relation between party and contact. What I was thinking of is this-
the contact entity can have contact_type as {location}_PHONE and contact_value as whatever is the value. As I mentioned, this would mean that my party_source_id would have duplicate values. 
There are two reasons for me to have it this way:

  1. The party entity already has the primary location, if I bring in a location entity, it might confuse the business users.
  2. The contact entity has multiple locations only in one source system, there are other source system which do not have a location or even duplicates for phone number values.

Please let me know if my approach makes sense, or is there any way apart from creating a M:N relationship between party and contact. In theory, I don’t think it is a 1:N relationship as a party has N contacts but the reverse cannot be true. 


Hi @aish_TF ,

I agree, if location isn't a major analytical dimension in your model, a separate entity might be overkill. It will keep the model simpler and potentially easier for business users to understand. Your approach of having the location information captured in the  contact_type. attribute would work, or rather add a new attribute called ph_location directly to the contact entity. This attribute would store the location code ("A", "B", "C", etc.) for each phone number. This would mean:

  • Explicit location information: The ph_location attribute clearly stores the location of each phone number.
  • Simpler than M:N: Avoids the complexity of an intermediate "Party-Location" entity.
  • More flexible for future analysis: You can easily query and analyze contacts based on their location.

Combining location and  contact_type might make the contact_type less intuitive and harder to manage.


Hi  @ivysakh 
Will try this approach.

Thank you :)


Reply