Skip to main content

hello everyone , I am unable to understand how to use representative creator step can you please explain it in detail with an example so that I am able to use it . 

Thank you.

Hi @Yasharth Misra,  the Representative Creator step in Ataccama is a key component used in the Master Data Management (MDM) processes, specifically during data consolidation and deduplication. This step is designed to create a single, representative record from multiple duplicate records, ensuring data consistency and accuracy.

Key Functions of the Representative Creator Step

  1. Identify Duplicate Records:
    • The Representative Creator step works with data that has been flagged as potential duplicates. These duplicates are identified through matching rules defined in the MDM process (usually grouped using a common filed such as the master id)
  2. Create Representative Records:
    • For each group of duplicate records, the Representative Creator step generates a single representative record. This record is intended to serve as the "best" version of the data, combining the most accurate and complete information from the duplicates.
  3. Rule-Based Attribute Selection:
    • The step uses predefined rules to determine which attributes from the duplicate records should be included in the representative record. These rules can prioritize attributes based on factors such as completeness, accuracy, and source reliability.
    • For example, if one record has a more complete address while another has a more accurate phone number, the representative record will include the complete address from the first and the accurate phone number from the second.
  4. Data Consolidation:
    • The Representative Creator step consolidates the selected attributes into a single, unified record. This process involves merging the data fields according to the rules and resolving any conflicts between duplicate records.
  5. Output Representative Records:
    • The final output of the Representative Creator step is a set of representative records that replace the original duplicates in the MDM system. These records are used for subsequent data operations, ensuring that the MDM repository maintains high-quality, deduplicated master data.

Let’s look at an example from the CDI Example MDM project:
 From all the available attributes for entity party, a logical group of below attributes are created under ‘Rules’:

 

We assign the cleansed version of each attribute to its corresponding mastered column. Expression supports the usage of aggregate functions such as max, min, first, last, count etc.
Next, we move to the 'Selection Rules' tab, where we define the logical rules used to selectively choose the record values for these attributes:

 

Each expression will be resolved to a numerical value. The following is a brief explanation of the rules used in the above example:

  • Active instance records are prioritized: Records with eng_active = 1 are given preference, hence they are sorted in descending order.
  • Source system weighting: A numerical weight is assigned to each source system to prioritize values from certain sources. For example, 'crm' is given the highest rank, and records are sorted in ascending order based on these weights.
  • Data quality score: Each instance record has an associated score value indicating its data quality. A higher score value means lower data quality.
  • Instance record ID: The unique identifier of the instance record is also considered in the ordering.

 

Grouping of Duplicate Records:

Duplicate records are grouped based on a common field, typically the master ID, as part of the merge plan in MDM.

 

Benefits of Using the Representative Creator Step

  • Improved Data Quality: Ensures that the most accurate and complete data is retained in the MDM repository.
  • Efficiency: Automates the process of deduplication and data consolidation, saving time and reducing manual effort.
  • Consistency: Maintains consistency across the master data by eliminating duplicates and standardizing records.
  • Customizability: Allows for the creation of custom rules to meet specific business requirements and data governance policies.

 

 


Hello @Yasharth Misra I’m closing this thread, please feel free to follow up here with any of your questions or create a new post 🙋‍♀️


Reply