Match & Merge - Concatenate column values


(Danny Ryan) #1

Hi All,

Does anyone know how you merge a list of different string values from a column with a matching key into a single merged record?

example input file;
column_1, column_2
1, one
1, two
1, three
2, four
2, five
2, six

Match and then merge output file
1, one two three
2, four five six

thanks in advance.

Danny


(Goran Zhelyazkov) #2

Hi Danny,

There are several different steps you can use to acomplish that:

  1. You can use the Group Aggregator step. Inside, in the Group By -> Expression type the name of your Key and give it a Name. After that, in the Aggregations -> Expressions you have to set what the output for that column will be. Something like first(key) + “,” + concatinate(column,9999).

  2. The other step you can use is Representative Creator. It is a bit more resource intensive, but it allows more customization. In the Grouping Strategy use Key Grouping Strategy and then Group By: Key. After that set up your rule - in the Attribute Sets -> Representative Attributes set a name for your output and give it an Expression. The one above should work if you remove the first function(it is not needed here).

I would recommend the first approach unless you require several different Groupings and Rules.

Cheers,
Goran


(Danny Ryan) #3

Hi Goran,

Thank you for your options. I was able to successfully configure option 2) using the Representative Creator and the expression +concatenate(field, 9999).

Many thanks
Danny


(Srinivas Gade) #4

I’m in a same situation but what I’m trying to achieve is slight different.

Example:
column_1, column_2
1, one
1, one
1, two
1, three
2, four
2, five
2, six

Match and then merge output file
1, one two three
2, four five six

Here I don’t want “one” to be repeated.

Thanks,
Srinivas


(Danny Ryan) #5

Hi Srinivas,

I used the following expression to resolve the duplicity of “one” if it was present on greater than 1 record.

DISTINCT(+concatenate(column_2,9999))

It is worth noting that the DISTINCT function delimits on the space character, so if your value in column_2 contains a space- the function will distinct that portion of that attribute value.
I got around this, by either removing spaces from the attribute value or by pre-cleansing the space characters to another character I was confident was not used elsewhere - such as the underscore _.

distinct
string distinct(string srcStr, null)
Returns a string which contains concatenated first occurrence of parts of the original string srcStr. If the string srcStr is NULL or empty, the function returns NULL.

Hope this helps

Kind Regards
Danny