How to do Matching on different alphabets


(Daniel Teixeira) #1

Hello,

How to do matching on different Alphabets?
If I have the same string in Latin,Arabic,Chinese - is there a way of matching them?

Thanks in advance


(Katrin Popova) #2

Hello Daniel,

Would it be possible to give us more information regarding the strings? Are they some kind of names in different languages that should be matched? Also, are they composed of one or more words?

Could you please give us more details about your use case and what you are trying to achieve?

Best regards,
Katrin Popova


(Daniel Teixeira) #3

Hi Katrin,

The use case is not entirely defined but it would be something like this:
Customer data: Name, surname, email, address info and we try to create candidate/client groups based on this attributes.

Assuming we have multiple data sources, where one has Latin Alphabet, while other can have Cyrillic Alphabet (or any other different kind - Chinese/Hebrew etc)

How can we ensure that the candidate/client group rules work? If I define the candidate/client rule to be an exact match on Name - how can this work in different alphabets.

I hope I was clear.
Thanks in advance
Regards


(Katrin Popova) #4

Hello Daniel,

We would suggest transliterating from a given language to Latin. After that the matching to be performed on Latin language. For instance, there already is a component Transliterate Cyrillic step, which could be used as an example how transliteration is done from Cyrillic language to Latin. Please note that transliteration is a complicated process which could lead to some transliterate issues. We recommend purchasing it in Ataccama like a solution/component development.

Best regards,
Katrin Popova


(Daniel Teixeira) #5

I have dqs version 10.6 which only has the step Transliterate

Could you elaborate a bit more on this?

Thanks


(Katrin Popova) #6

Hello Daniel,

I have sent you the component via email. If there is need of additional component development please contact Afshin Lotfi.

Best regards,
Katrin Popova