Skip to main content

Hello,

I was wondering if someone has an idea to help my question about one of my DQ check. Basically let me explain what i have and what i want to achieve with them.

I have; one list consisting 2 columns. Column A has unique values having 2 or 3 digits with a mixed combination of letters and numbers. Column B has names according to the column A. Here is the example: 

Column A Column B
abc name1
abd name1
abe name1
abf name2
xy name3
1a1 name4
1a2 name4
1a3 name4
yz name5
yz1 name6

 

As you see, there is no corelation between column A and column B. I can’t assume if the value of column A is yz and value of column B is name5 then i have name5 for yz1 of column A.

And I have a field in my data lets call it as column C. Column C has values with a mixed combination of digits and letters and with different length. Such as abc1231422ad , 1a3577afdg , yz4564sdfsfs can be example. 

And I have another field which is Column D having some other values. 

I want to check values of column C and find their respective column B values and bring that values. 

At this point, I need a DQ check for doing following steps :

  1. to check first 2 or 3 digits of Column C values.
  2. find these 2 or 3 digits combination of C values in the list of Column A. 
  3. bring the respective values of Column B.
  4. compare that value with a value of Column D.

Normally I was going to create a lookup item to find those values but it is not so easy. Here are some of challanges :

  1. Since my values of column D could be shorter version of column B, i need to have an approach to find not exact match but the nearest match. (like vlookup in excel)
  2. Some values of column A look so similar but they have some other values of column B. If I select first 2 digits of the value of Column C, it might be wrong value as there is another value of column B for a first three digits of column C. 

Please let me know if you need more clarification, I hope someone has an idea to help me 🤓

Thanks !

Hello @sumeyra ,

usually, when we need to work with complex lookups (more than one column) and we need to stick to the UI, we create a concatenated lookup column, so basically e.g. A|B, so you would have one lookup column with values like abc|name1, abd|name1 etc. Then when searching for those combinations, you would do the same with your columns C and D.

This won’t work for your use case because of the exact match as you mentioned. Only idea I can come up with quickly is to use component rules https://docs.ataccama.com/one/latest/data-quality/validation-components.html. Here you have more options, you can use standard lookup step to bring the column B based on match for column A and then compare the column D with column B using e.g. functions like substring(), left() etc. Would this be an option?

I hope this helps a little.

Kind regards,

Anna

 


Hello @anna.spakova 

Thanks for your bright idea! I had no idea about that component rule, i will take a look. But I am not sure I understand this part ‘Here you have more options, you can use standard lookup step to bring the column B based on match for column A and then compare the column D with column B using e.g. functions like substring(), left() etc. Would this be an option?’ Do you suggest to have another new column based on the left(column A,2) -or maybe 3- and use this new column in the norma lookup or in the component rule? Also i don’t know how can say which one is true expression , taking first two left or first three left ? 


Hi @sumeyra,

The easiest approach will be to perform two lookup searches in your component. First using left(ColumnC,3) as a key lookup value and then using left(ColumnC,2). Then you can narrow down your results by using the Match Condition expression to perform comparison between Column D and Column B values. 

There is an example how to use Match Conditions in the Tutorials project. Tutorials > 04 Using lookup > 04.06 Multiplicative Lookup.plan


Hi @AKislyakov thanks for your support ! I will definitely try this approach and follow tutorial materials. Thanks !


Reply