Skip to main content

Hi All,

I have a few observations regarding the source_id column in matching step of an entity in a MDM project.

For context, source_id is a column with unique values (Primary Key in most cases) that we define to uniquely identify each record in a source system, connected to mdm hub.

I was recently trying to test the match logic on OneDesktop using plan files. I used a very small subset of records from each source.

Load Component

In the project, source_id column(defined in load component) for one of these source system is “SOURCE_SYSTEM_NAME~^~pk_id”
where SOURCE_SYSTEM_NAME is as string value and pk_id is the primary key. 

match component

When we defined a match component in mdm project, we do not define standalone bindings but assumably “Id Column” would be mapped to source_id column if we are testing on OneDesktop as a plan.

So, when I created a plan to test the match logic, I mapped “Id Column” to source_id (with source_id having string values as mentioned above). But when I run the plan with this Matching step, I get IllegalNumberFormat Error, because the source_id has string value.

It looks like this happend because the matching step converts the Id Column value to integer/long. However, we do not encounter this issue when mdm-hub takes care of match component without standalone bindings.


Can someone give their insights into why this happens? Is mdm engine handling this?

adding References to 

  1.  Matching Step
  2. Standalone bindings

Hi @aish_TF ,

When you create a test plan for matching components, map the standalone bindings in this fashion:
 

In our MDM system, each incoming record is assigned a unique 'Id' value. This 'Id' acts as the primary key within the instance layer, which is where we store and manage individual records.

It's important to differentiate 'Id' from 'source_id'. While both serve as identifiers, they have distinct purposes:

  • Id: A system-generated unique identifier for each record within our MDM system. It's essential for internal operations and data management.
  • source_id: The unique identifier assigned for records within the source system from which the record originated. This helps us track the record's origin but doesn't function as the primary key in our MDM database.

When you test the matching plan locally using the standalone bindings, make sure the Id column is mapped to a similar unique column, which is on LONG datatype to avoid any datatype mismatch issues. 


Hi @ivysakh 
Thanks for your response.

I did exactly what you mentioned here using the sequence() function to generate a unique id for all the records.
This explanation clears my doubt.

Once again thank you for your explanation. 

 

Regards
Aishwarya
 


Reply