Skip to main content

I have data coming from a SQL database. For most queries, the structure of these queries remains the same, as shown in the picture, with only the attribute name changing. Now, I want to classify these queries as unique based on their structure rather than the attribute name.

Demo Data

Thank you.

Regards,
Dhaval

 

Hi Dhaval.

Is this data going into DQ&C?

If so, I would suggest making a single, reusable rule (e.g. column_completeness), and mapping each relevant column to the single rule. This rule is even premade in the platform by default

From that point, you could go to the rule usage tab, and see all the times it has been applied.

No SQL required!

 

Best,

Oli


Hi ​@OGordon100 ,

Thank you for your suggestion! However, I think there might have been a misunderstanding of the concept I was aiming for. I’ve managed to implement a solution according to my requirements, as I needed to mark queries as unique or matching. So here’s what I did:

 

Create a Sequence Column:

I used the following logic to create a sequence number for each query in an incremental order (and 0 for blank rows):
iif(
    Query is not null, sequence(0) div 3 + 1, 0
)
Example :

Query     Seq
CASE WHEN COLUMN_1 IS NULL OR TRIM(COLUMN_1) = '' THEN 0 1
WHEN COLUMN_1 IS NOT NULL AND TRIM(COLUMN_1) != '' THEN 1  1
ELSE 2 END as COLUMN_1_Completeness  1
  0

 

 

Combine Queries:

After creating the sequence column, I used a representative creator to concatenate these queries into a single line, taking Seq as the grouping strategy.

 

Mark Matching Queries:
Using a matching step on the Query column, I identified whether the queries were similar or not.

This approach allowed me to achieve the desired result.

 

Best regards,
Dhaval


Reply