Understanding regular expressions in an existing code base is always a daunting task. While regular expressions provide a concise way to describe a search pattern to the regex processor and efficiently search and match the text, The cryptic syntax of the regular expressions makes them very hard to understand by humans. Therefore, You should try to improve regular expressions readability and make your regular expressions clearer. In this article, we will see several techniques to improve the readability of regular expressions and make them more understandable by humans.
The Cryptic Nature of Regular Expressions
Regular expressions are a powerful and concise way to describe a search pattern. Therefore, It is not surprising that most text editors, IDE, and programming languages, support searching and replacing the text with regular expressions. Among the programming languages that support regular expressions out of the box, we can find JavaScript, Java, C#, PHP, and python. However, the syntax of regular expressions is hard to read and understand by humans.
Suppose you have read the following regular expression in JavaScript source file:
Regex Are Cryptic
if (!value.match(/(0[1-9]|[12][0-9]|3[01])(?:-|\.)(0[1-9]|1[012])(?:-|\.)(19\d\d|20\d\d)/) {
console.log('invalid date');
...
}
From the context, you understand that this code check that the value
is a valid date. However, even a regular expression guru, will need to invest some time to read and understand the above regular expression.
As the original author of the above code did not leave any comments what is the acceptable format of the date – you are left on your own to parse and understand what the regex means.
How To Understand Regular Expressions?
When we need to understand cryptic regular expression, we can use RegExper. RegExper is nice online tool which transform regular expression to a nice diagram. This graphical representation of the regex is much easier to understand. When you submit Javascript-based regular expression, the diagram of the regular expression is generated and displayed.
Using this tool, you can export this diagram (the graphical representation of the regular expression) as an image (SVG or PNG format) and embed it in your code documentation. You can also generate a permanent link and shared it with others ( within your code, email … ).
Let’s see what is the meaning of the regex we see in above code snippet with this tool:
The Regex Expression
/(0[1-9]|[12][0-9]|3[01])(?:-|\.)(0[1-9]|1[012])(?:-|\.)(19\d\d|20\d\d)/
Improve Regular Expressions – Adding Comments Inside Regex
Some modern regex processors and their regular expression flavors support free-spacing mode. This mode allows the programmer to write regular expressions which are much easier for people to read and understand.
Whitespaces In Free-Spacing mode
The main property of the free-spacing mode is that whitespaces (spaces, tabs, and line breaks – line feed and carriage return) inside the regular expression are not taken into account by the regular expression processor. If you want to match a space in the free-spacing mode, you can use \
, [ ]
or the more readable form \x20
In the following example, the 2 regex are equal:
The Regex Expression
a b* ( c | d )
Is Equal To:
ab*(c|d)
Comments In Free-Spacing mode
Another property of the free-spacing mode is embedding comments in the regular expression. For example, the character # starts a new comment which ends at the end of the line. In other words, The content between #
and the next newline character is ignored by the regex processor. Thus, if you want to match #
character, you can use \#
.
In the following example, the 2 regex are equal:
The Regular expression
a # This is a comment which run until the end of the line
b # This is another comment
is equal to:
ab
Improve Regular Expressions With XRegExp
Although JavaScript does not support free-spacing mode outside the box, we can use the library XRegExp, which supports this form. We will use those properties of the free-spacing mode with the help of XRegExp to Improve Regular Expressions and make our code more readable.
Putting it all together, We can refactor the code and make regular expression clearer:
The Expression
if (!value.match(/(0[1-9]|[12][0-9]|3[01])(?:-|\.)(0[1-9]|1[012])(?:-|\.)(19\d\d|20\d\d)/) {
console.log('invalid date');
...
}
The Clarified Expression With XRegExp
let reValidDate = new XRegExp(`
# match a date.
# The date must be from 20 or 21 century.
# format dd-mm-yyyy. seprator may be - or .
(0[1-9]|[12][0-9]|3[01]) # group 1 - day
(?:-|\.) # seprator
(0[1-9]|1[012]) # group 2 - month
(?:-|\.) # seprator
(19\d\d|20\d\d) # group 3 - year
`);;
if (!reValidDate.test(value)) {
console.log('invalid date');
...
}
Regular Expressions Do Not Need To Be Cryptic
Regular expressions do not need to be cryptic. In this tutorial, we see how to improve regular expressions readability and how to make regular expressions clearer to reader of the code.
First, With the help of RegExper, we have generate a nice diagram which help us understand what is the meaning of the regular expression.
We also see how XRegExp – a JavaScript regex processor which support free-spacing mode – can improve regular expressions with embedding comments inside the expressions.
Using those observations and with the help XRegExp , we have added comments inside the expression and make the regular expression clearer.