Regular Expressions to match one or multiple words separated by space


(Jayant Singh) #1

Dear colleagues,
I am trying to write regex, that can match one word with uppercase letters or multiple words, separated by whitespace. Here is my attempt:

matches('[[A-Z]+(\s)?]+',string)

My attempt is to make space optional, if we only have one word. But it is failing for multiple words, which is undesirable. Help is greatly appreciated.thanks


(Maksim Zhelyazkov) #2

Hello Jayant,

If I understand you correctly you want to match one word with uppercase letters or multiple words with uppercase letters, separated by whitespace? In that case you can use the following regex:

matches('[[A-Z]+[\\s]?]+',string)

Please, let me know if this was helpful.

Regards,
Maksim


(Jayant Singh) #3

That helps. may I know why do we need to escape the whitespace character, \s
In my knowledge, when we put \s inside square brackets, it is automatically treated as a literal character.

kindly correct me if I am wrong.


(Maksim Zhelyazkov) #4

Dear Jayant,
In a lot of programming languages like Java the “\” is used as an escape character and thus requires two backslash characters after each other in order to insert a single backslash character into the string. If you write a single “\” your regex would be invalid. So if you use characters like “\”, “-”, “^” you have to escape them.

Kind Regards,
Maksim


(Jayant Singh) #5

Thanks for your feedback. Sorry if I could not clarify my question.
Actually, my question is about the expression [\\s]?. It seems we are escaping white space here. In my knowledge, \s denotes white space and we have a backslash \ preceding it.

Can you kindly clarify why we are escaping a whitespace character here? Will just putting \s be wrong?

Thanks in advance.


(Maksim Zhelyazkov) #6

Dear Jayant,

Our regular expression syntax follows the rules used in Java. In Java when you have two backslashes Java interprets the two backslashes as an escaped Java String Character, after which only one backslash is left -> “\\” means “\”. If you have only one backslash it means you are escaping the symbol that follows.

Regards,

Maksim