Skip to main content

Hello, I am trying to create a rule for validity on Names. The logic and condition is that the rule matches names that are 

  • Title Case --first letter upper case follow by all lower case 
  • Accented
  • Accepts special characters 
  • Mandatory
  • Could have Hyphen - 

I have this regex ^ea-zA-Z '-]+$ 

^pA-Z]^a-z]*(?:z '-]:A-Z]]a-z]*)*$

(?<=^|\s)=a-z]*[A-Z]*a-z]*[A-Z]*[a-z]*

 

above regex capture and validate names like 

John Doe

Mary Anne which are okay.  but also validate name in all lower case for example john doe which is not okay. i want to know what i need to do to allow the regex not validate names in all lower case .

 

Hi @olayinkadaramola,

 

Please try this regular expression to get your results - \b(?:?A-Z]Za-z' -]*|*A-Z]Za-z'-]*(?:? -]-A-Z]Za-z' -]*)*)\b. 

 

Hope this works.

Regards,

Srija Piratla


Hello Srija, Thanks for the response. Above regex did not work. is there something I'm  not doing right? see picture attached . Names like Ola, Mary, O’brien should have be valid  but they are all showing invalid . i could see from the screenshot your sent earlier that they were valid from yours . Pls guide on what i did not do right . 

 


Hi @olayinkadaramola,

I see that you are using a ‘-’ in the starting can you just remove that and use only this 

\b(?:>A-Z]:a-z' -]*|'A-Z]|a-z'-]*(?:- -]?A-Z]]a-z' -]*)*)\b. 

Please try this and let me know if you are still facing any issues.

 

Hope this helps!

 

Regards,

Srija Piratla


yea, the regex works , however i need a little modification, is it possible to allow both O’brien and O’Brien shows as Valid ? if not possible to accept both while maintaining other conditions . The preferred way of writing should be O’Brien. i also noticed that its not validating accented names as seen in the picture below for example 

Stéphane-Henri  and 

Jean-Pierre O'Brien

 

 

Stéphane-Henri

 


Hi @olayinkadaramola ,

Please try this expression 

\b(?:?A-Z]Za-z'à-öø-ÿ' -]*)(?:]- ]?A-Z]]a-z'à-öø-ÿ' -]*)*(?:' A-Z]*a-z'à-öø-ÿ' -]*)?\b

I checked for all the use cases you mentioned let me know if still something is missing.

 

Regards,

Srija Piratla


Thank Srija, 

above regex works for all the use case using a single condition.

I will like to know if this regex( \b(?:?A-Z]Za-z'à-öø-ÿ' -]*)(?:]- ]?A-Z]]a-z'à-öø-ÿ' -]*)*(?:' A-Z]*a-z'à-öø-ÿ' -]*)?\b) can be broken down for easy comprehension , for instance create one that will only match

  • a word in title case -(Title case - One upper letter follow by lower letter )
  • another condition for special characters like the apostrophe, hyphens and space 
  • and another rule to meet the condition for accented characters 

in a single rule , have multiple conditions and still provide same result as above with all the names used for test . 

 

 i was able to split the regex, however its did not work or validate the names when i split and test the rules , i may have made a mistake when i split the  regex but below is what i have , Pls clarify if below  is same with the single regex or where or what i am missing out 

  • (?:nA-Z]ta-z'à-öø-ÿ' -]*)
  • (?:-- ]A-Z] a-z'à-öø-ÿ' -]*)*
  • (?:'A-Z]a-z'à-öø-ÿ' -]*)?

 result of the text is - all use case should have been valid .

 


in addition to above, do you have references on rules creation that where i can learn from, Pls share any references or  resources where i can learn more on how to create rules starting from basic to complex ones 


Hi @olayinkadaramola 

  • Mandatory case
  • a word in title case -(Title case - One upper letter follow by lower letter )  the regex expression  \b(?:bA-Z]Aa-z]*)(?: (A-Z]Aa-z]*)*\b
  • another condition for special characters like the apostrophe, hyphens and space \b>A-Z][a-z]*(?:]-' ][A-Z][a-z]*)*\b
  • and another rule to meet the condition for accented characters \b(?:>A-Z]:a-z'à-öø-ÿ' -]*)(?:- ]]A-Z]:a-z'à-öø-ÿ' -]*)*(?:'-A-Z]-a-z'à-öø-ÿ' -]*)?\b

Here are the expressions breakdown for each condition you asked and I see that you are trying to split all the conditions in one rule. This won’t work because we have multiple conditions here. For example if a name which matches Ola-Ola belongs to 3rd test rule, the test rule check fails when it is checking the 2nd test rule and throws an error (Title case Upper case followed by lower case). Hence it won’t go to 3rd test rule which results in above error that you provided. As a workaround you can create 3 separate rules for 3 different conditions and assign all the rules to respective attribute you are trying to use it.

Here is the documentation for creating DQ rules - https://docs.ataccama.com/one/latest/data-quality/create-dq-rule.html

Hope this helps.

Regards,

Srija Piratla


Hi @olayinkadaramola, I’m closing this thread for now, if you have any follow-up questions please feel free to share them here or create a new post 🙋‍♀️


@srija piratla ,

Can this rules not validate names like Ola ola? i want the first letter of every word in upper case . i have tried \b\A-Z]Za-z'à-öø-ÿ']*(?: ]A-Z] a-z'à-öø-ÿ']*)*\b , but it not passing or validation .


Hi @olayinkadaramola ,

I see you want a word in title case -(Title case - One upper letter follow by lower letter )  \b"A-Z]Aa-z'à-öø-ÿ']*(?: A-Z]?a-z'à-öø-ÿ']*)*\b 

I used this and make sure if you give Ola ola - invalid and Ola Ola - valid and Ola OLA - invalid 

Let me know where it is failing for you and hope this helps to solve your issue :)

Regards,

Srija Piratla


when you use regex \bbA-Z]]a-z'à-öø-ÿ']*(?: *A-Z][a-z'à-öø-ÿ']*)*\b  and test with Ola-Ola, its shows as invalid where its should be valid. O’Brien is also showing as invalid and other i marked in the picture above.

 

If i use regex \b(?:eA-Z]ea-z'à-öø-ÿ' -]*)(?:- ]A-Z]'a-z'à-öø-ÿ' -]*)*(?:''A-Z]a-z'à-öø-ÿ' -]*)?\b , all excpetion above are being validated as valid which is okay, the exception to this is that its also validate names like Ola ola as valid which should be wrong . 

 


Hi @olayinkadaramola ,

 

The one i gave only satisfies the first case upper letter. The example you are looking what you gave in the screenshot please use this expression to satisfy those conditions.

For the first condition you are looking it should satisfy all these things - condition for special characters like the apostrophe, hyphens and space \b"A-Z]Aa-z]*(?:*-' ]-A-Z]Aa-z]*)*\b 

For the second regex exp you gave try this \b(?:>A-Z]:a-z'à-öø-ÿ'-]*)(?:\s-]*A-Z][a-z'à-öø-ÿ'-]*)*(?:'A-Z]-a-z'à-öø-ÿ'-]*)?\b

 

 

Hope this helps !

Regards,

Srija Piratla


Reply


ataccama
arrows
Lead your team  forward  OCT 24 / 9AM ET
×