Make Your Regex Clearer

Understanding regular expressions in an existing code base is always a daunting task. While regular expressions provide a concise way to describe a search pattern to the regex processor and allow it to efficiently search and match the text, The cryptic syntax of the regular expressions makes them very hard to understand by humans.

In this article, we will see several techniques to improve the readability of regular expressions and make them more understandable by humans.

The Cryptic Nature of Regular Expressions

Regular expressions are a powerful and concise way to describe a search pattern. Therefore, It is not surprising that the majority of text editors, IDE and programming languages, supports searching and replacing the text with regular expressions. Among the programming languages that support regular expressions out of the box, we can find JavaScript, Java, C#, PHP, and python. However, the syntax of regular expressions is hard to read and understand by humans.

For example, Suppose you have read the following Javascript code:

regular expressions are cryptic and understand by humans
 if (!value.match(/(0[1-9]|[12][0-9]|3[01])(?:-|\.)(0[1-9]|1[012])(?:-|\.)(19\d\d|20\d\d)/) {
   console.log('invalid date');
   ...
 }

From the context, you understand that this code check that the value is a valid date. However, even a regular expression guru, will need to invest some time to read and understand the above regular expression.

As the original author of the above code did not leave any comments what is the acceptable format of the date - you are left on your own to parse and understand what the regex means.

In the next part we see how RegExper can help us understand undocumented regex