"Diff" step in DQC

(Ra Li) #1

Hi, can someone from Ataccama explain to us what “Diff” step does and how to use it. We’re not able to find any documentation around it within Ataccama or online.

Thanks in advance,

(Danny Ryan) #2

Hi Ra,

As far as I can tell, the ‘diff’ component works by comparing the primary keys of two separate feeds using join logic.

Left Join - PK is in Feed 1 but not in Feed 2
Right Join - PK is not in Feed 1 but in Feed 2
Full Join - PK is in both Feed 1 & 2.

I’ve knocked up the following example to demonstrate.
diff.plan (6.5 KB)

Hope this helps


Diff is supposed to compare 2 huge somewhat ordered data sets on primary key. Like all transactions in one system to all transactions in other system.
Unlike join step diff is seeking matching pair within rather small window (buffer). So it can provide results faster and using reasonable amount of memory. On the contrary if your data is not ordered diff would fail to find matching records.