Difference between Duplicate and Non-Unique


#1

Hi Everyone,

I am new to ataccama and created my first plan and profile.
Under the basic section for any column I have selected, I see a table with following information:
Null
Non-null 22,075 100.00%
Duplicate 22,061 99.94%
Distinct 14 0.06%
Non-unique 14 0.06%
Unique 0 0.00%

I am confused between Duplicate and Non-Unique?
How are they different?

Thanks,


#2

Hello Tapasya,

The requested information can be found in the help guide of DQA.

Navigate to Help -> Help Contents -> Getting Started with Ataccama DQ Analyzer -> Reading a Profile -> Basic Analyses


#3

Thank you it helped. :slight_smile:


(Kalicharan Khetwal) #4

Duplicate: the number of values that are the same as other values in the list. value counts might exist more than single time, whereas Non-unique : the number of values that have at least one duplicate in the list, :slight_smile:


(Ben Korobkin) #5

Distinct: value appears AT LEAST once.
Unique: value appears ONLY once
Non-unique: values appears MULTIPLE times
Duplicate: repeated value (that already existed in the column).

Non-unique values are the subset of DISTINCT values that appear multiple times in the column.

Every (non-null) value is considered distinct when it first appears in the column.
Once a distinct value appears again, that second appearance is called a duplicate. Same applies for 3rd appearance, 4th, 5th, etc. Those are all each labelled as duplicates.