Solved

Fuzzy Checks with No Unique Identifier

5 months ago
February 6, 2025
4 replies
48 views

Radziah
Data Voyager
16 replies

Hi

I would like to ask whether we could do fuzzy check in Ataccama that involves two different datasets let’s say Set 1 and Set 2 from different data sources. Please note that both datasets have different number of columns but fuzzy checks involved only customer name and address. However, these datasets don’t have any unique identifier. Is it achievable in Ataccama and what would be the possible steps to achieve the results after the fuzzy checks? Thanks.

Best answer by anna.spakova

Hi @Radziah ,

neither actually, because even when you want to do the DQ rules in the ONE Desktop (through the validation component), it still expects only one input = table. You would have to do the validation completely independently through ONE Desktop only - so in a plan that wouldn’t be connected to the ONE Web, and deploy it on the orchestration server if needed. In that case, you are not able to apply the rule in the monitoring projects and see the results in the web application.

As for the fuzzy check, I am unsure what exactly it means, if it’s about some fuzzy matching functions like e.g. Levenshtein, this is possible also in the UI, if it’s about some more complicated logic, then yes, the validation component might be a better choice, but it will highly depend on the logic of the rule.

Kind regards,

Anna

View original

Did this topic help you find an answer to your question?

+3

anna.spakova
Ataccamer
165 replies
5 months ago
February 6, 2025

Hello @Radziah,

to build a DQ check on top of two data sets, you have to first join those together to create 1 dataset because at the moment the DQ rules do not support cross-table DQ checks. To achieve this, you can use the Virtual Catalog Item (since you mentioned the data are from different sources).

https://docs.ataccama.com/one-desktop/15.4.0/work-with-ataccama-one/virtual-catalog-items.html

Was this what you were asking or are you also interested in the fuzzy DQ checks? If that’s the case, can you provide more details on what this fuzzy DQ check should be?

Thank you.

Kind regards,

Anna

R

Radziah
Author
Data Voyager
16 replies
5 months ago
February 7, 2025

Hi Anna

When you said DQ rules do not support cross-table DQ checks, does it refer to ONE Desktop or ONE Web or both? If i am not mistaken, to have fuzzy checks done, it can only be done on ONE Desktop, correct? The data comes one from database, another one from flat file.

The details is we want to have the fuzzy checks of name and address between two datasets.

+3

anna.spakova
Ataccamer
165 replies
Answer
5 months ago
February 7, 2025

Hi @Radziah ,

neither actually, because even when you want to do the DQ rules in the ONE Desktop (through the validation component), it still expects only one input = table. You would have to do the validation completely independently through ONE Desktop only - so in a plan that wouldn’t be connected to the ONE Web, and deploy it on the orchestration server if needed. In that case, you are not able to apply the rule in the monitoring projects and see the results in the web application.

As for the fuzzy check, I am unsure what exactly it means, if it’s about some fuzzy matching functions like e.g. Levenshtein, this is possible also in the UI, if it’s about some more complicated logic, then yes, the validation component might be a better choice, but it will highly depend on the logic of the rule.

Kind regards,

Anna

+3

Cansu
Community Manager
703 replies
4 months ago
February 13, 2025

Hi @Radziah, I’m closing this thread for now. If you have any follow up questions please don’t hesitate to share them in the comments or create a new post🙋🏻‍♀️

Check out our Quick Start Guide to get started on the community 🙋‍♀️

Fuzzy Checks with No Unique Identifier

4 replies

Reply

Most Liked this week

Cookie policy

Cookie settings

Reply

Related topics

Version 15.4 LTS is here! Updates to Data Observability, ONE AI, MDM & RDM

Duplicate instance record IDs with different contact values

Connecting Azure AD to Keycloak (SAML)

Introduction to Aggregation Rules

DQ rule using complex lookupicon

Most Liked this week

Sign up

Login to the Ataccama Community

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings