Replicate LISTAGG(DISTINCT columnA, ' | ') OVER (PARTITION BY columnB) in ONE Desktop

Question

HiI have a sub-query in Snowflake that is using the function of LISTAGG and PARTITION BY in multiple SELECT DISTINCT statements for example LISTAGG(DISTINCT TRIM(SUBJECT_REF), ' | ') OVER (PARTITION BY IDENTIFIER) AS EN_NUM_AGG, kindly see attachment to see the full SQL statement. Example of one of the subqueries: …en AS(SELECT DISTINCT     IDENTIFIER,    ID,    LISTAGG(DISTINCT ID, ' | ') OVER (PARTITION BY IDENTIFIER) AS ID_AGG,    TRIM(SUBJECT_REF) AS EN_NUM,    LISTAGG(DISTINCT TRIM(SUBJECT_REF), ' | ') OVER (PARTITION BY IDENTIFIER) AS EN_NUM_AGG,    DQ_EN_CARD,    LISTAGG(DISTINCT DQ_EN_CARD, ' | ') OVER (PARTITION BY IDENTIFIER) AS DQ_EN_CARD_AGG,    DATA_SOURCE,    ORIGINAL_SOURCEFROM    db.schema.view1    WHERE     subj_type = 'English') ...I wanted to replicate the same in Ataccama ONE Desktop using the Group Aggregator step, whereI allocate each sub-query in one aggregation set with a condition of subj_type = '<value>' in the When condition.	For each column in the SELECT DISTINCT subquery if it is a non-aggregated value, I used first() function to get a distinct value,  for example first(ID). I tried using distinct() but it was not working with an error of 'unknown column' when the column is clearly available there. Why is that so?	For each column in the SELECT DISTINCT subquery if it is aggregated value, I used this expression to replicate the same as LISTAGG.. concatenate(distinct(DQ_EN_CARD),'|'), however some columns are not producing any results but some have results like 1|1|0 for column like DQ_EN_CARD_AGG when the actual result should only show distinct value of 1 and 0 so it should be 1|0. I believe because in Ataccama it behaves slightly different compared to the Snowflake as Snowflake read and group line by line, but Ataccama seems to read all aggregations sets first then only group them?	Based on this query for example,  LISTAGG(DISTINCT TRIM(SUBJECT_REF), ' | ') OVER (PARTITION BY IDENTIFIER) AS EN_NUM_AGG I define the Group By in the step to IDENTIFIER to replicate the PARTITION BY. I bet the behaviour is different here since I did not get the expected results.	Finally, I joined the right output with the left output of the Group Aggregator because to my understanding the right join only will produce the columns that we add in the aggregation set, other columns that are not used in the aggregation set shall be retrieved from the left output.So I am not sure whether this step is best to fit in this use case or not. Appreciate your assistance. Attached is the full SQL statement.

Phil Holbrook · Accepted Answer

Hi @RadziahThere are two steps in ONE Desktop that you could use to tackle this problem: the group aggregator and the representative creator. Group aggregator is easier to understand coming from SQL - so we'll stick with that. The way you are using the group aggregator generally looks OK.Your main problem is that concatenate and distinct are very different types of expression in ONE Desktop.Concatenate is an aggregating function - it is only meaningful in the context of a grouping operation (like the group aggregator) and it operates across each set of rows that are being grouped. If you use the statement,concatenate(DQ_EN_CARD, '|')you are saying,"Take the value of DQ_EN_CARD from every row and concatenate them separated by the '|' character"... exactly as you would expect from the SQL equivalent.The "distinct" function is not an aggregating function and doesn't work like SQL. It is not valid in an aggregating context, which is why you got the "Unknown column" error. Although the column exists, it is not single-valued.Distinct is actually a word-set operation - fully specified asset.distinct(,)but you can omit the "set." prefix. There are many word set operators which can be very useful. They operate on a single string value, from a single row, and treat the string as a set of values with the specified separator. The separator defaults to a space, hence "word set" operator - it was designed to handle sentences.So, if you use the statement:distinct('1|3|2|3|1', '|')you will get the result '1|3|2'.So - to achieve the concatenated set of distinct values you want in the group aggregator step, you need to specifydistinct(concatenate(DQ_EN_CARD, '|'), '|')It has to be this way round, because you need to convert your multiple rows into a single string with the concatenate function, then deduplicate it with distinct. You have to specify the separator in the context of both functions.Similarly, there is a "sort" word set operation, sosort(distinct('1|3|2|3|1', '|'), '|')is '1|2|3'(Watch out, though - it’s an alpha-sort!)It is a common use case to group values using an aggregating function and a separator, and then process the resulting string with word set operations.You can read up on both aggregating functions and word-set expressions in the ONE Desktop built-in help underHelp -> Help Contents -> Ataccama ONE Desktop -> ONE Basics -> Expressions

Replicate LISTAGG(DISTINCT columnA, ' | ') OVER (PARTITION BY columnB) in ONE Desktop

1 reply

Reply

Most Liked this week

Cookie policy

Cookie settings

Reply

Related topics

How do I get to the underlying error code?icon

Credentials issue on android after 5 daysicon

Android Flutter Credentials Issueicon

PlatformException: Invalid package nameicon

Issue with credentialsicon

Most Liked this week

Sign up

Login to the Ataccama Community

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings