Skip to main content

Hi

I would like to ask whether we could do fuzzy check in Ataccama that involves two different datasets let’s say Set 1 and Set 2 from different data sources. Please note that both datasets have different number of columns but fuzzy checks involved only customer name and address. However, these datasets don’t have any unique identifier. Is it achievable in Ataccama and what would be the possible steps to achieve the results after the fuzzy checks? Thanks.

Hello ​@Radziah,

to build a DQ check on top of two data sets, you have to first join those together to create 1 dataset because at the moment the DQ rules do not support cross-table DQ checks. To achieve this, you can use the Virtual Catalog Item (since you mentioned the data are from different sources).

https://docs.ataccama.com/one-desktop/15.4.0/work-with-ataccama-one/virtual-catalog-items.html

Was this what you were asking or are you also interested in the fuzzy DQ checks? If that’s the case, can you provide more details on what this fuzzy DQ check should be? 

Thank you.

Kind regards,

Anna


Hi Anna

When you said DQ rules do not support cross-table DQ checks, does it refer to ONE Desktop or ONE Web or both? If i am not mistaken, to have fuzzy checks done, it can only be done on ONE Desktop, correct? The data comes one from database, another one from flat file.

The details is we want to have the fuzzy checks of name and address between two datasets.


Hi ​@Radziah , 

neither actually, because even when you want to do the DQ rules in the ONE Desktop (through the validation component), it still expects only one input = table. You would have to do the validation completely independently through ONE Desktop only - so in a plan that wouldn’t be connected to the ONE Web, and deploy it on the orchestration server if needed. In that case, you are not able to apply the rule in the monitoring projects and see the results in the web application.

As for the fuzzy check, I am unsure what exactly it means, if it’s about some fuzzy matching functions like e.g. Levenshtein, this is possible also in the UI, if it’s about some more complicated logic, then yes, the validation component might be a better choice, but it will highly depend on the logic of the rule.

Kind regards,

Anna


Reply