Is there a way to quickly select only distinct rows in Ataccama One Desktop?
(I now used a record descriptor and filter with Regex find(“.:.:1”, rd_column)
Is there a way to quickly select only distinct rows in Ataccama One Desktop?
(I now used a record descriptor and filter with Regex find(“.:.:1”, rd_column)
Hi
Using the Record Descriptor Builder step is certainly the best practice approach to solve this use case.
An alternative to using the Filter step with Regular Expression is to use the built-in expression function word().
I have attached a quick example using the expression function word().
Thanks so much. Great how you documented it! Will certainly be added to our cookbook ;)
Hi
I downloaded your plan and ran it but unfortunately I could not see the results as it turned to be blank for both columns. What would be the the input and expected results look like on your end? As I have similar use case to replicate the select distinct values from Snowflake in Ataccama.
Dear @Radziah,
Thank you for reaching out and for trying out the plan. I understand you're experiencing an issue with blank results in both columns, and I'm happy to help you troubleshoot this.
To better understand the situation, would you mind sharing a screenshot of your plan's configuration and the execution results? This will give me valuable context.
The original plan is designed to generate 100 random records, and it should function correctly. Given the randomized nature of the input data, it's worth noting that running the plan multiple times can produce varied datasets, which may affect the distinct values reflected in the output.
It's also possible that there might have been modifications to the plan, potentially impacting the flow of the 100 randomly generated records into the RecordDescriptorBuilder step. Sharing your plan file would allow us to review it together and pinpoint any potential discrepancies.
Please feel free to attach the screenshot and your plan file to your reply. I look forward to assisting you in resolving this issue.
Hi
Thanks for prompt reply, I did not change anything in the plan and ran it as it is. Please find my attached plan and the screenshot of the output.
i tried to change the 100 random records generation to 50 also gave me blank output.
Hi
I've made some tweaks to the plan and attached a new version for you. This one should now show both all the records and the filtered ones, which I hope will be much clearer!
Here's a breakdown of what I've changed:
Let's talk about the expression change in the Filter/Extract Filter step:
word(rd_value,2) = '1'
word(rd_value,1,':') = '1'
Here's why:
I really hope this updated plan works well for you and helps you get a better grasp of:
Please don't hesitate to reach out if you have any more questions or need further clarification. I'm happy to help!
Best regards,
Danny
Thanks
I ran the latest plan couple of times only got the expected output which I would say only Community will be out in the output_distinct since the group size is 1? Based on the rd_value of 3:1:1.
My understanding of select distinct is let’s say we have generated 49 rows of ‘Data’, 50 rows of ‘People’ and 1 row of ‘Community’, isn’t it supposed to have three rows of output in total; 1 row ‘Data’, 1 row ‘People’ and 1 row ‘Community’? In your plan after I ran multiple times, the only output_distinct that I managed to get is like in the image below:
No account yet? Create an account
Enter your E-mail address. We'll send you an e-mail with instructions to reset your password.