Question

Ranking numeric values in code query

3 months ago
February 14, 2025
7 replies
72 views

Lazov
Data Voyager

Dear members,

I am struggling with a step that rather reflects query/array logic than the logic of steps available in component editor.

Task is to find the maximum from a range of values. I tried to retrieve it through a dynamic query written in a Representative Creator step, see details below:

L1:= round((1-(levenshtein(W_CUA1,W_CUA1_MATCH)/(max(length(W_CUA1),length(W_CUA1_MATCH)))))*100,0);
L2:= round((1-(levenshtein(W_CUA1,W_CUA2_MATCH)/(max(length(W_CUA1),length(W_CUA2_MATCH)))))*100,0);
L3:= round((1-(levenshtein(W_CUA1,W_CUA3_MATCH)/(max(length(W_CUA1),length(W_CUA3_MATCH)))))*100,0);
L4:= round((1-(levenshtein(W_CUA1,W_CUA4_MATCH)/(max(length(W_CUA1),length(W_CUA4_MATCH)))))*100,0);
L5:= round((1-(levenshtein(W_CUA2,W_CUA1_MATCH)/(max(length(W_CUA2),length(W_CUA1_MATCH)))))*100,0);
L6:= round((1-(levenshtein(W_CUA2,W_CUA2_MATCH)/(max(length(W_CUA2),length(W_CUA2_MATCH)))))*100,0);
L7:= round((1-(levenshtein(W_CUA2,W_CUA3_MATCH)/(max(length(W_CUA2),length(W_CUA3_MATCH)))))*100,0);
L8:= round((1-(levenshtein(W_CUA2,W_CUA4_MATCH)/(max(length(W_CUA2),length(W_CUA4_MATCH)))))*100,0);
L9:= round((1-(levenshtein(W_CUA3,W_CUA1_MATCH)/(max(length(W_CUA3),length(W_CUA1_MATCH)))))*100,0);
L10:= round((1-(levenshtein(W_CUA3,W_CUA2_MATCH)/(max(length(W_CUA3),length(W_CUA2_MATCH)))))*100,0);
L11:= round((1-(levenshtein(W_CUA3,W_CUA3_MATCH)/(max(length(W_CUA3),length(W_CUA3_MATCH)))))*100,0);
L12:= round((1-(levenshtein(W_CUA3,W_CUA4_MATCH)/(max(length(W_CUA3),length(W_CUA4_MATCH)))))*100,0);
L13:= round((1-(levenshtein(W_CUA4,W_CUA1_MATCH)/(max(length(W_CUA4),length(W_CUA1_MATCH)))))*100,0);
L14:= round((1-(levenshtein(W_CUA4,W_CUA2_MATCH)/(max(length(W_CUA4),length(W_CUA2_MATCH)))))*100,0);
L15:= round((1-(levenshtein(W_CUA4,W_CUA3_MATCH)/(max(length(W_CUA4),length(W_CUA3_MATCH)))))*100,0);
L16:= round((1-(levenshtein(W_CUA4,W_CUA4_MATCH)/(max(length(W_CUA4),length(W_CUA4_MATCH)))))*100,0);

max(L1,L2,L3,L4,L5,L6,L7,L8,L9,L10,L11,L12) // and here comes the trouble: Ataccama has no formula to pick maximum from an N number range but from 2 sources at a time.

Independently from above logic, the case can be simplified as below:

L1:= levenshtein(W_CUA1,W_CUA1_MATCH) //everything else is just custom calculation, we can restrict the case to basics

L2:= levenshtein(W_CUA1,W_CUA2_MATCH)

L3:= levenshtein(W_CUA1,W_CUA3_MATCH)

etc.. to L16.

It is already annoying that there is no loop function in code writer function, but having no chance to set a simple rank or choose a maximum NOT from a certain column but across different variables is something I cannot believe.

Could anyone help me out with a solution?

What I want finally is:

to get the maximum match
to get the name of two fields that provide the highest match, such as ‘W_CUA3,W_CUA1_MATCH’.

Thank you in advance!

Did this topic help you find an answer to your question?

Adrian Anderson
Ataccamer
2 months ago
February 17, 2025

Hi @Lazov.

Thanks for the question!

With Ataccama plans/components it’s important to understand that the majority of steps are multi-threaded. In this case there is no guarantee of the order that records will traverse through the plan/component.

There are some steps that force the flow to wait until records are available. This includes the Representative Creator step that you mentioned.

I’m not 100% clear on the requirement here but am hoping that the attached example plan might help (see community_dq_1627.zip).

Feel free to explain the requirement further or ask any follow up questions if this isn’t what you were trying to achieve.

Kind regards,
Adrian

1 Attachments

community_dq_1627.zip

Lazov
Data Voyager
2 months ago
February 17, 2025

Hi Adrian,

thank you indeed for your kind answer!
You are on the right track, despite my explanation bleeding from many wounds. :-)
It is a little more complex, as I am playing hide-and-seek with a value that can be stored in four fields: W_CUA1, W_CUA2, W_CUA3, W_CUA4.
So each W_CUA* must be matched to the other W_CUA* in each pair of records.

The whole picture is as follows:
1. I join a dataset back to itself, based on GROUPING_ID.
GROUPING_ID covers an n number of records, belonging together by identical value in a certain field - it is indifferent what field it is, my point is that I have already done the grouping successfully.
So now, in the same row, I have the in_left.W_CUA1... to in_left.W_CUA4 and in_right.W_CUA1... to in_right.W_CUA16. For simplification's sake I apply name in_right.W_CUA1 -> W_CUA1_MATCH etc. to W_CUA4_MATCH.

2. I remove the pairs where in_left.ID = in_right.ID where ID is the record ID within the group.

3. In a Representative Creator, I remove the combinations where the pairs are the same i.e. left ID appears on right join and vice versa, eg. 4 vs 5, and 5 vs 4.

4. In the Representative Creator, I set a dynamic query to check the similarity between the all 16 combinations - that is the code in my first post.
Then the desired result would be to find the highest match, indicating which fields are attached - data comes from W_CUA1 vs. W_CUA3_MATCH etc.

First screenshot demonstrates the component flow I built, while second one is the case of the field matching through a life-like example.

And I still have no clue how to present in what form I want to see the results i.e. how to mark which fields bear the highest rank combination. Another field should store this information eg. ‘W_CUA1|W_CUA3_MATCH’.

And yes, your question is relevant how to rank the results when we have eg. 3 pairs with the same level match - I don’t know yet, but learning the database, there is zero chance of such case, due to the nature of the data inspected here.

Is this clear I wrote or I just made it a bigger mess? I hope I did not.

Thank you, Laszlo

2 Attachments

Lazov
Data Voyager
2 months ago
February 18, 2025

Hi Adrian, there’s a mistake I made here, but the page does not save the text again when I edit and send it.

Instead of this:

‘So now, in the same row, I have the in_left.W_CUA1... to in_left.W_CUA4 and in_right.W_CUA1... to in_right.W_CUA16’

I meant this:

‘So now, in the same row, I have the in_left.W_CUA1... to in_left.W_CUA4 and in_right.W_CUA1... to in_right.W_CUA4’

Sorry for confusion,

Laszlo

Adrian Anderson
Ataccamer
2 months ago
February 18, 2025

Hi Laszlo.

Apologies for not responding sooner. There are a couple of issues with the Community site at the moment and I’m not able to see the images shared previously.

Could you perhaps try sharing them again and/or attaching a copy of any shareable plans/components?

Thanks,
Adrian

Lazov
Data Voyager
2 months ago
February 19, 2025

Hi @Adrian Anderson,

no need to apologize, I am grateful that you are dealing with this challenge I have.

I rather embed the same here in the text box - sharing the whole thing would not meet my agreement with my customer on data handling. Building it in a demonstrative test component would take more time than I have at the moment.

But certainly I will do so if your feedback was that you could not see a thing again. :-)

So now, let’s try again.

First screenshot demonstrates the component flow I built, while second one is the case of the field matching through a life-like example.

Please let me know whether you can see the images this time.

Thank you, Laszlo

Lazov
Data Voyager
1 month ago
March 26, 2025

Hi @Adrian Anderson,

I finally managed to build a hard coded line, similar to XML structure, in a Representative Creator that does all the calculations and gets the valid data for each case.

Too bad that it is still 16 hard coded lines because I have 16 combinations in total and Ataccama is not able to handle a single loop with iterator.

BR, Laszlo

Cansu
Community Manager
1 month ago
March 31, 2025

Hi @Lazov, thank you for sharing your implementation. If you’d like please share your use case and feedback in our ideas section where our product team has direct access to review 👈

Check out our Quick Start Guide to get started on the community 🙋‍♀️

Reply

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

1 Attachments

2 Attachments

Reply

Related topics

FME Options - use Proxyicon

S3Connector Erroricon

Failing to connect to Esri ARCGIS Portalicon

Can't connect to API.icon

Couldn't connect to server with HTTPCallericon

Sign up

Login to the Ataccama Community

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings