Skip to main content
Question

Ranking numeric values in code query

  • February 14, 2025
  • 7 replies
  • 60 views

Lazov
Data Voyager

Dear members,

I am struggling with a step that rather reflects query/array logic than the logic of steps available in component editor.

Task is to find the maximum from a range of values. I tried to retrieve it through a dynamic query written in a Representative Creator step, see details below:

L1:= round((1-(levenshtein(W_CUA1,W_CUA1_MATCH)/(max(length(W_CUA1),length(W_CUA1_MATCH)))))*100,0);
L2:= round((1-(levenshtein(W_CUA1,W_CUA2_MATCH)/(max(length(W_CUA1),length(W_CUA2_MATCH)))))*100,0);
L3:= round((1-(levenshtein(W_CUA1,W_CUA3_MATCH)/(max(length(W_CUA1),length(W_CUA3_MATCH)))))*100,0);
L4:= round((1-(levenshtein(W_CUA1,W_CUA4_MATCH)/(max(length(W_CUA1),length(W_CUA4_MATCH)))))*100,0);
L5:= round((1-(levenshtein(W_CUA2,W_CUA1_MATCH)/(max(length(W_CUA2),length(W_CUA1_MATCH)))))*100,0);
L6:= round((1-(levenshtein(W_CUA2,W_CUA2_MATCH)/(max(length(W_CUA2),length(W_CUA2_MATCH)))))*100,0);
L7:= round((1-(levenshtein(W_CUA2,W_CUA3_MATCH)/(max(length(W_CUA2),length(W_CUA3_MATCH)))))*100,0);
L8:= round((1-(levenshtein(W_CUA2,W_CUA4_MATCH)/(max(length(W_CUA2),length(W_CUA4_MATCH)))))*100,0);
L9:= round((1-(levenshtein(W_CUA3,W_CUA1_MATCH)/(max(length(W_CUA3),length(W_CUA1_MATCH)))))*100,0);
L10:= round((1-(levenshtein(W_CUA3,W_CUA2_MATCH)/(max(length(W_CUA3),length(W_CUA2_MATCH)))))*100,0);
L11:= round((1-(levenshtein(W_CUA3,W_CUA3_MATCH)/(max(length(W_CUA3),length(W_CUA3_MATCH)))))*100,0);
L12:= round((1-(levenshtein(W_CUA3,W_CUA4_MATCH)/(max(length(W_CUA3),length(W_CUA4_MATCH)))))*100,0);
L13:= round((1-(levenshtein(W_CUA4,W_CUA1_MATCH)/(max(length(W_CUA4),length(W_CUA1_MATCH)))))*100,0);
L14:= round((1-(levenshtein(W_CUA4,W_CUA2_MATCH)/(max(length(W_CUA4),length(W_CUA2_MATCH)))))*100,0);
L15:= round((1-(levenshtein(W_CUA4,W_CUA3_MATCH)/(max(length(W_CUA4),length(W_CUA3_MATCH)))))*100,0);
L16:= round((1-(levenshtein(W_CUA4,W_CUA4_MATCH)/(max(length(W_CUA4),length(W_CUA4_MATCH)))))*100,0);

max(L1,L2,L3,L4,L5,L6,L7,L8,L9,L10,L11,L12) // and here comes the trouble: Ataccama has no formula to pick maximum from an N number range but from 2 sources at a time.

Independently from above logic, the case can be simplified as below:

L1:= levenshtein(W_CUA1,W_CUA1_MATCH)   //everything else is just custom calculation, we can restrict the case to basics

L2:= levenshtein(W_CUA1,W_CUA2_MATCH)

L3:= levenshtein(W_CUA1,W_CUA3_MATCH)

etc.. to L16.

It is already annoying that there is no loop function in code writer function, but having no chance to set a simple rank or choose a maximum NOT from a certain column but across different variables is something I cannot believe.

Could anyone help me out with a solution?

What I want finally is:

  1. to get the maximum match
  2. to get the name of two fields that provide the highest match, such as ‘W_CUA3,W_CUA1_MATCH’.

Thank you in advance!

Forum|alt.badge.img+1

Hi ​@Lazov.

Thanks for the question!

With Ataccama plans/components it’s important to understand that the majority of steps are multi-threaded. In this case there is no guarantee of the order that records will traverse through the plan/component.

There are some steps that force the flow to wait until records are available. This includes the Representative Creator step that you mentioned.

I’m not 100% clear on the requirement here but am hoping that the attached example plan might help (see community_dq_1627.zip).

Feel free to explain the requirement further or ask any follow up questions if this isn’t what you were trying to achieve.

Kind regards,
Adrian

 


Lazov
Data Voyager
  • Data Voyager
  • February 17, 2025

Hi Adrian,

thank you indeed for your kind answer!
You are on the right track, despite my explanation bleeding from many wounds. :-)
It is a little more complex, as I am playing hide-and-seek with a value that can be stored in four fields: W_CUA1, W_CUA2, W_CUA3, W_CUA4.
So each W_CUA* must be matched to the other W_CUA* in each pair of records.

The whole picture is as follows:
1. I join a dataset back  to itself, based on GROUPING_ID.
GROUPING_ID covers an n number of records, belonging together by identical value in a certain field - it is indifferent what field it is, my point is that I have already done the grouping successfully.
So now, in the same row, I have the in_left.W_CUA1... to in_left.W_CUA4 and in_right.W_CUA1... to in_right.W_CUA16. For simplification's sake I apply name in_right.W_CUA1 -> W_CUA1_MATCH etc. to W_CUA4_MATCH.

2. I remove the pairs where in_left.ID = in_right.ID where ID is the record ID within the group.

3. In a Representative Creator, I remove the combinations where the pairs are the same i.e. left ID appears on right join and vice versa, eg. 4 vs 5, and 5 vs 4.

4. In the Representative Creator, I set a dynamic query to check the similarity between the all 16 combinations - that is the code in my first post.
Then the desired result would be to find the highest match, indicating which fields are attached - data comes from W_CUA1 vs. W_CUA3_MATCH etc.

 

First screenshot demonstrates the component flow I built, while second one is the case of the field matching through a life-like example.

And I still have no clue how to present in what form I want to see the results i.e. how to mark which fields bear the highest rank combination. Another field should store this information eg. ‘W_CUA1|W_CUA3_MATCH’.

And yes, your question is relevant how to rank the results when we have eg. 3 pairs with the same level match - I don’t know yet, but learning the database, there is zero chance of such case, due to the nature of the data inspected here.

Is this clear I wrote or I just made it a bigger mess? I hope I did not.

 

Thank you, Laszlo


Lazov
Data Voyager
  • Data Voyager
  • February 18, 2025

Hi Adrian, there’s a mistake I made here, but the page does not save the text again when I edit and send it.

Instead of this:

‘So now, in the same row, I have the in_left.W_CUA1... to in_left.W_CUA4 and in_right.W_CUA1... to in_right.W_CUA16

I meant this:

‘So now, in the same row, I have the in_left.W_CUA1... to in_left.W_CUA4 and in_right.W_CUA1... to in_right.W_CUA4

 

Sorry for confusion,

Laszlo


Forum|alt.badge.img+1

Hi Laszlo.

Apologies for not responding sooner. There are a couple of issues with the Community site at the moment and I’m not able to see the images shared previously.

Could you perhaps try sharing them again and/or attaching a copy of any shareable plans/components?

Thanks,
Adrian


Lazov
Data Voyager
  • Data Voyager
  • February 19, 2025

Hi ​@Adrian Anderson,

no need to apologize, I am grateful that you are dealing with this challenge I have.

I rather embed the same here in the text box - sharing the whole thing would not meet my agreement with my customer on data handling. Building it in a demonstrative test component would take more time than I have at the moment.

But certainly I will do so if your feedback was that you could not see a thing again. :-)

So now, let’s try again.

First screenshot demonstrates the component flow I built, while second one is the case of the field matching through a life-like example.

 

 

 

Please let me know whether you can see the images this time.

Thank you, Laszlo


Lazov
Data Voyager
  • Data Voyager
  • March 26, 2025

Hi ​@Adrian Anderson,

I finally managed to build a hard coded line, similar to XML structure, in a Representative Creator that does all the calculations and gets the valid data for each case.

Too bad that it is still 16 hard coded lines because I have 16  combinations in total and Ataccama is not able to handle a single loop with iterator.

BR, Laszlo


Cansu
Community Manager
Forum|alt.badge.img+3
  • Community Manager
  • March 31, 2025

Hi ​@Lazov, thank you for sharing your implementation. If you’d like please share your use case and feedback in our ideas section where our product team has direct access to review 👈


Reply


Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie settings