Can we have duplicate source_id column values in mdm full load?
Adding some more information,
I have a1:N relationship between party and contact entities. i.e. a party record has multiple contact values associated with it in a source system.
Example
Party
party_id | party_name |
---|---|
01 | abc |
02 | fgh |
Contact for party_id 01
party_id | ph_id | ph_no | ph_location |
---|---|---|---|
01 | 01 | 9387782823 | A |
01 | 02 | 2738467389 | B |
01 | 03 | 9384677283 | C |
I want all the contact values to be a part of the master contact record(based on ph location- one for each location. And hence, I am defining contact_type to represent each location. However, this situation associates multiple party_source_id for each ph number record and hence party_source_id will have duplicates in load for contact.(see source column mapping SS above). I am concerned if this will break the load operation.
PS: The model defined has a 1:N relation between party and contact entities
Hi
The party_source_id, being a FK in the contact entity should not cause you any duplicate errors in the load operation. However, in order to simplify the model complexity, you can do below:
-
Introduce an intermediate entity to represent the "Party-Location" relationship. This entity would have a 1:N relationship with both Party and Contact.
- Party 1:N Party-Location 1:N Contact
-
Key Definition:
- Party-Location: The primary key would be a combination of
party_source_id
andph_location
. - Contact: The primary key would be
ph_no
(assuming phone number is unique within a location).
- Party-Location: The primary key would be a combination of
-
Mapping:
- Map
party_source_id
andph_location
from your source to the Party-Location entity. - Map
ph_no
,contact_type
, and other contact details to the Contact entity.
- Map
- This method povides a more accurate representation of the relationships in your data and helps avoid duplicates and ensures contacts are correctly linked to parties and locations.
Cheers!
Vysakh Indrasenan
Hi
I see what you are suggesting. This would give me a M:N relation between party and contact. What I was thinking of is this-
the contact entity can have contact_type as {location}_PHONE and contact_value as whatever is the value. As I mentioned, this would mean that my party_source_id would have duplicate values.
There are two reasons for me to have it this way:
- The party entity already has the primary location, if I bring in a location entity, it might confuse the business users.
- The contact entity has multiple locations only in one source system, there are other source system which do not have a location or even duplicates for phone number values.
Please let me know if my approach makes sense, or is there any way apart from creating a M:N relationship between party and contact. In theory, I don’t think it is a 1:N relationship as a party has N contacts but the reverse cannot be true.
Hi
I agree, if location isn't a major analytical dimension in your model, a separate entity might be overkill. It will keep the model simpler and potentially easier for business users to understand. Your approach of having the location information captured in the contact_type
. attribute would work, or rather add a new attribute called ph_location
directly to the contact
entity. This attribute would store the location code ("A", "B", "C", etc.) for each phone number. This would mean:
- Explicit location information: The
ph_location
attribute clearly stores the location of each phone number. - Simpler than M:N: Avoids the complexity of an intermediate "Party-Location" entity.
- More flexible for future analysis: You can easily query and analyze contacts based on their location.
Combining location
and contact_type
might make the contact_type
less intuitive and harder to manage.
Hi
Will try this approach.
Thank you :)
Reply
Login to the Ataccama Community
No account yet? Create an account
For Ataccama Customers and Partners
Enter your E-mail address. We'll send you an e-mail with instructions to reset your password.