Skip to main content

I need to create a complex duplicate check rule.  I am familiar with the aggregation rule; however, in this case it won’t work for me.  Here is the logic to determine duplicates:

If the name, unit of measure, location, and type are all the SAME but the code (not a PK) is DIFFERENT then it is considered a duplicate.


These would be considered duplicates:

Code Name UOM Location Type
123 Trek LBS Houston Bike
234 Trek LBS Houston Bike
         

 

This is not considered a duplicate because we can extend products to different locations instead of creating a new row:

 

Code Name UOM Location Type
678 Razor LBS Houston Scooter
678 Razor LBS San Antonio Scooter
         

 

When I write the rule, I use the aggregation rule to select name, uom, location type is not unique; however, that can give me false positives for the scenario above (Code = 678).  I don’t see a way to say these fields are the same while this field is different.  I am doing this in one web; however, I’m open to writing something in the desktop if there is a way to do this.  Any help would be appreciated.

I have been thinking through this and could be an easier way to catch these records rather than creating a complex data quality rule. You could create a custom sql catalog item that will only bring in records where duplicates exist and then apply a rule that checks for record counts. This way whenever this appears the rule will immediately fail, and you will be able to view those records.

Example query you could use in your catalog:
 

SELECT *

FROM your_table t

JOIN (

    SELECT Name, UOM, Location, Type

    FROM your_table

    GROUP BY Name, UOM, Location, Type

    HAVING COUNT(*) > 1

) dup

ON t.Name = dup.Name

   AND t.UOM = dup.UOM

   AND t.Location = dup.Location

   AND t.Type = dup.type