An Introduction to Regex in RL3
Regular expressions are sequences of characters defining how a computer should match or search for a particular piece of information (a pattern) inside of any piece of data (e.g. in a text). You may also come across the terms regex or regexp which means exactly the same.
It is often said that regular expressions are extremely handy and powerful. Why so?
- regex can handle almost any text or number pattern;
- they are a short-way and straightforward notation for data;
- they are not that hard to learn: you just need to know a few tricks;
- regex are time-savers (which means you don't need to search again and again for identical data items).
RL3 is based heavily on regular expressions, therefore it can make use of all the above-mentioned advantages.
Usage Example
Suppose you need to get all 'str' and 'street' from such a text:
Please pay attention that the word 'street' may have the 'str' equivalent. Here's the text: I once lived on Baker street. His postal address is: Mr. John Smith 4 Main str Toronto, Ontario M9B5R2 Canada 10 Downing str
Try Quickstart with this example text and the following regex:
(str|street)
In this search pattern we're looking for all mentions of either 'str' or 'street' in the text. Two search words are wrapped in (
and )
round brackets divided by a |
pipe which means: match either this or that word.
In fact, we could define as many variants as needed here: e.g. if we had three search words, we would have an (a|b|c)
notation, and so on.
One more way to search is with the following pattern:
str(eet)?
You might have noticed that both 'str' and 'street' start with the same sequence of literals. However, while 'str' is a shortened form, 'street' is followed by the 'eet' part.
Hence, we can make a pattern where the 'str' part is obligatory and 'eet' is optional (which is expressed by means of (
and )
round brackets and a ?
question mark).
But what if we want to make sure that 'str' and 'street' are not just words from the text but are preceded with a street name?
Here's the solution:
[A-Z][a-z]+\sstr(eet)?
We already know what str(eet)?
means. Let's explain the rest:
[A-Z]
- matches any uppercase letter;[a-z]
- matches any lowercase letter;+
- ensures that the preceding[a-z]
appears one or more times;\s
- matches a space character.
Thus, we define that before str(eet)?
we have a word that:
- starts with an uppercase;
- contains at least one or more lowercase letters;
- is followed by space and the street indicator.
Now, let's make the task a bit more complex and find only street names that have preceding numbers:
\d{1,3}\s[A-Z][a-z]+\sstr(eet)?
The \d{1,3}\s
part added to the pattern contains a \d
digit symbol standing for a number repeated {1,3}
(i.e. from 1 to 3) times and followed by \s
space.
For more details refer to RL3 Patterns documentation.