UNIX comes to Oracle in the form of regular expressions to
increase the power of searching.
In the last article, I wrote on case insensitivity I made
the point that the only way to extend an application to be completely case
insensitive was to get away from the LIKE comparison and begin to use regular
expression matching using the REGEXP_LIKE function. If you missed this article,
you can get to it at the following URL. https://www.databasejournal.com/features/oracle/article.php/3494646.
I thought it would be beneficial to dedicate an article on regular expression
matching so we, myself included, get acquainted or re-acquainted with the power
these expression can give us.
I was introduced to regular expressions a long time ago when
programming in Perl or UNIX shell scripting with sed and awk commands. In
addition, if you use the vi editor you may be familiar with regular expressions
and pattern matching. Regular expressions are in effect a small programming
language that is specific to matching character patterns in a text string. To
be frankly honest when I started using pattern matching in my Perl days it was
quite confusing. Mostly because there seemed to be so many alterations and options,
I never knew where to start. After a while, though I did get the hang of it.
And you will too. Take some time to go through this article and learn how
powerful this pattern matching can be.
As stated, regular expressions are designed to match
character strings. Stated another way, and possibly a bit clearer, is that
regular expressions are templates that character strings are compared against
to see if the character string matches or fits the template. About the closest
example, or comparison, that I can give is an example that uses the LIKE comparison,
which does some forms of pattern or template matching. For example, if we
wanted to search a character string for the existence of ‘Deborah’ we could
issue the following SQL.
SELECT text FROM my_text WHERE text LIKE ‘%Deborah%’;
This in effect will return any row where ‘Deborah’ is within
the character string text. To get a quick comparison on how to do this with
regular expression matching you would issue the following SQL that contains the
new function REGEXP_LIKE.
SELECT text FROM my_text WHERE REGEXP_LIKE (text, ‘Deborah’);
So why would we want to use regular expressions instead of
just using the LIKE comparison that we are all too familiar with? Mostly
because as soon as you start asking higher level questions about the “likeness”
of a string to a pattern you will most likely end up with a WHERE condition
that will have multiple ‘OR’ and ‘LIKE’ matches to weed through to find what
you are truly looking for. With regular expressions, you can typically just
provide one pattern to the comparison. Take for example the previous example.
Suppose now that we need to search a novel and the main character Deborah is
also referred to as ‘Debbie’. Now we could modify our condition to look for
text where it is LIKE ‘%Deb%’ and we might be able to solve the problem. However,
what happens if this particular novel also discusses finances of our high-class
society girl. We might end up finding certain sentences that contained words
such as ‘Debt’, or ‘Debutante’. We now have a very complex issue. Therefore, to
validate that we actually find those text that actually refer to Deborah in the
script we should use a regular expression. Here is the SQL you should issue. It
now finds all of the sentences where the novel actually talks about Deborah.
SELECT text FROM my_text WHERE REGEXP_LIKE (text, ‘Deb(bie|orah)’);
You can see that within the pattern matching we basically
programmed the option for the text to contain ‘Debbie’ or the alternative ‘Deborah’
by giving the option (bie|orah) in the expression. This is a basic form of
providing a template to match a string against.
Another complex example where you can only get by with
expressions is finding email addresses. You could quickly do this by providing
the following expression. Basically, you are building the portions of the email
before the ‘@’ sign and after. Do take a close look at this pattern as it
requires the three distinct pieces of an email address and allows for periods ‘.’
as well as underscores ‘_’ in the name. All of which are quite valid.
SELECT text FROM my_text
WHERE REGEXP_LIKE (text, ‘[A-Z0-9._%-]+@[A-Z0-9._%-]+.[A-Z]{2,4}’);
Therefore, if you now only wanted to search for ‘.com’ email
addresses you could change the SQL to the following.
SELECT text FROM my_text WHERE REGEXP_LIKE (text, ‘[A-Z0-9._%-]+@[A-Z0-9._%-]+.com’);
Another often-difficult item is to find if there is a number
in a string of text or to validate if a character string is numeric. To find if
a string contains a number that follows the pattern of containing a decimal
place, you could use this method. Keep in mind that numbers that have a decimal
point should have a valid number after the decimal point, and I have specified
a ‘+’ to indicate one or more numbers.
SELECT text FROM my_text WHERE REGEXP_LIKE (text, ‘[0-9]?.[0-9]+’);
Regular expressions are quite
powerful when looking for texts patterns or validating a specific pattern such
as a phone number, email address, IP numbers, or just about any pattern that
needs to be checked. On the flip side of this is checking for invalid formats
and patterns of characters. In our previous example, we could easily have searched
for text where ‘Debbie’ is misspelled as ‘Debbbie’ or ‘Debie’.
SELECT text FROM my_text WHERE REGEXP_LIKE (text, ‘(Debbbie)’);
On the other hand, to get those
with really heavy fingers where the ‘b’ may have been keyed in more than 3
times we could use the following
SELECT text FROM my_text WHERE REGEXP_LIKE (text, ‘(Deb)(b){2,}’);
Regular expressions may seem very convoluted at first. It
even took me a bit of effort to relearn these patterns, as I have not used them
for a bit. However, after taking the time to learn how to construct them you
will be very thankful you did when confronted with difficult search criteria.
So where do you learn how to construct regular expressions?
This really is not as difficult as you might think. Regular expressions have
been around for a long time in the UNIX world and thus there is a lot of
information on the net. The Oracle manuals have a few pages, about 5, that are
dedicated to regular expressions so your best bet is to start there and
supplement heavily from the net. You can easily search on Yahoo, Google, or
your favorite search engine for tutorials and web sites strictly dedicated to
regular expressions.
Here are a few web sites I found useful
Perl Regular
Expression Tutorial