Multiplicative Pattern Parser - unable to parse /

Question

Hello,I am trying to use MPP step to derive some values from some free text field.I am struggling to parse any regex pattern for string like “A/B”.It seems to work but whenever there is / sign it just doesn’t parse it.I tried to escape it using \/ and also some other different options but nothing seems to work. What am I doing wrong?EDIT:Tried to parse similar REGEX against Regex Matching step and here it seems to work: Could it be possible that MPP step is parsing incorrectly?EDIT 2:I might have found the issue. I took Tokenizer part from our another component and it started working correctly. However, I still don’t understand exactly how Tokenizer works and the docs aren’t providing with great examples on how to understand it. Szymek

AKislyakov · Accepted Answer

Hi @Szymon OlejniczakPattern parser description:The input text is first split into tokens using the defined tokenizer. Tokens are then matched against defined patterns and their components.Tokenizer Config:This element is used for demarcating/splitting input text string into particular components (tokens) depending on defined rules. Every token type is specified using two sets of characters: tokenStartCharacters and tokenCharacters.Within the tokenization process, the input string is analyzed one character at a time, and when any character corresponding to tokenStartCharacters is found, this character is considered the beginning of a new token of the defined type. Any other characters found and corresponding to tokenCharacters are then included into this new token.In other words Pattern parser first splits your string into tokens (you can think of them as “words”) and they tries to match these tokens (“words”) with components of your patterns.In your configuration all patterns consist of a single token, thus for Pattern parser to be able correctly match string to a pattern tokenizer config should include/ as a token character. Another solution might be to alter Patterns to include multiple tokens, e.g. a {LETTER}/{LETTER?} will match A/ and A/B inputs.You can find more detailed example in the:Tutorials project > 06 Transform > 06.04 Pattern Parser.plan (This one also includes non-default Tokenizer config)Tutorials project > 06 Transform >06.09 Multiplicative Pattern Parser.plan

Reply

Sign up

Login to the Ataccama Community

Scanning file for viruses.

This file cannot be downloaded