How to check duplicate records in ONE Desktop, and using fuzzy logic?

Question

Hello. i have a general material data csv file, and in that i have to check for duplicates based on material data. Currently i'm trying with the matching step by creating partitions and key rules but still i'm not sure about it. So is there anyone knows about solving this process?? Thanks!

AKislyakov · Accepted Answer

Hi Suryakanth,You can find a tutorial on how to configure the Matching step in the Tutorials project > 09 Match and merge > 09.01 Match and merge.plan.If you're new to this, I suggest trying the Canopy Clustering step first. It requires less configuration and is more user-friendly. You can find an example usage in the Tutorials project > 07 Analyze > 07.10 Clustering.plan.The Canopy Clustering step allows you to setup components which will be used to search for duplicates, their weights and thresholds when to consider two records belonging to the same group.

Radziah · Answer

Hi @AKislyakov, I am following the guide on this tutorial however, it’s quite hard for me to understand each functionalityalthough I used the help button to understand what is it all about. What I am looking for is to detect duplicates on several columns like Name, Address, etc. After following the guide, I found that my result turned up to a be a bit confusing whereby different Name values belong to one same cluster. For example Yayasan A and Kementerian in one cluster ID while I am expecting Name like Yayasan A, Yysn, Yayasn A belong to the same cluster and Kementerian belong to a different cluster ID. Where should I change in the Canopy Clustering the configuration to get the correct result?

Reply

Sign up

Login to the Ataccama Community

Scanning file for viruses.

This file cannot be downloaded