Splitter Transform on DQS for splitting string

Question

I am trying to use Splitter to transform the string

Example : Input: BeWhoYouAre

Splitter output - ‘e ho ou re’ - skipping the first letter as ‘Upper case’ is the separator

Desired output - ‘Be Who You Are’

How do I achieve this? Also if the string is all lower how to split it?

Example: Input: ‘bewhoyouare’

desired output - ‘Be Who You Are’

Lisa Kovalskaia · Accepted Answer

Hi @sgilla!The Tokenizer step works nicely for first use case where you have uppercase letters as delimiters. Here I used ; as theseparator but you can leave it blank to have words separated by spaces.Speaking of strings that don't have any identifiable delimiters, that's going to require additional tools. One of the easier options is using a dictionary / lookup file to provide aninterpretation for each concatenation. A step like Apply Replacements would then help transform the source strings into the correct interpretation.If you have a large number of distinct concatenations and no way to predict in advance all possible variation then a dictionary won’t do a perfect job - you may need to work withlanguage model to parse such strings and only then pass the data to Ataccama.I'll see if I can get any additional tips from the team on the latter - and please let us know if you can share more details on the use case. thanks!

Lisa Kovalskaia · Answer

@sgillaawesome, glad it helped!I don't think the Tokenizer can handle something like USA Sports example. A lookup is a good solution --orif you don't know the exact all-capitals elements that may come up but you know the pattern is always similar, e.g. the element is always at the beginning, you might also handle it with some expressionafter the Tokenizer:Of course the more variation there is in the source data, the more interesting it gets! :)

Reply

Sign up

Login to the Ataccama Community

Scanning file for viruses.

This file cannot be downloaded