Oracle and Regular Expressions

May 5, 2005

UNIX comes to Oracle in the form of regular expressions to increase the power of searching.

In the last article, I wrote on case insensitivity I made the point that the only way to extend an application to be completely case insensitive was to get away from the LIKE comparison and begin to use regular expression matching using the REGEXP_LIKE function. If you missed this article, you can get to it at the following URL. http://www.databasejournal.com/features/oracle/article.php/3494646. I thought it would be beneficial to dedicate an article on regular expression matching so we, myself included, get acquainted or re-acquainted with the power these expression can give us.

I was introduced to regular expressions a long time ago when programming in Perl or UNIX shell scripting with sed and awk commands. In addition, if you use the vi editor you may be familiar with regular expressions and pattern matching. Regular expressions are in effect a small programming language that is specific to matching character patterns in a text string. To be frankly honest when I started using pattern matching in my Perl days it was quite confusing. Mostly because there seemed to be so many alterations and options, I never knew where to start. After a while, though I did get the hang of it. And you will too. Take some time to go through this article and learn how powerful this pattern matching can be.

As stated, regular expressions are designed to match character strings. Stated another way, and possibly a bit clearer, is that regular expressions are templates that character strings are compared against to see if the character string matches or fits the template. About the closest example, or comparison, that I can give is an example that uses the LIKE comparison, which does some forms of pattern or template matching. For example, if we wanted to search a character string for the existence of 'Deborah' we could issue the following SQL.

SELECT text FROM my_text WHERE text LIKE '%Deborah%';

This in effect will return any row where 'Deborah' is within the character string text. To get a quick comparison on how to do this with regular expression matching you would issue the following SQL that contains the new function REGEXP_LIKE.

SELECT text FROM my_text WHERE REGEXP_LIKE (text, '\Deborah\');

So why would we want to use regular expressions instead of just using the LIKE comparison that we are all too familiar with? Mostly because as soon as you start asking higher level questions about the "likeness" of a string to a pattern you will most likely end up with a WHERE condition that will have multiple 'OR' and 'LIKE' matches to weed through to find what you are truly looking for. With regular expressions, you can typically just provide one pattern to the comparison. Take for example the previous example. Suppose now that we need to search a novel and the main character Deborah is also referred to as 'Debbie'. Now we could modify our condition to look for text where it is LIKE '%Deb%' and we might be able to solve the problem. However, what happens if this particular novel also discusses finances of our high-class society girl. We might end up finding certain sentences that contained words such as 'Debt', or 'Debutante'. We now have a very complex issue. Therefore, to validate that we actually find those text that actually refer to Deborah in the script we should use a regular expression. Here is the SQL you should issue. It now finds all of the sentences where the novel actually talks about Deborah.

SELECT text FROM my_text WHERE REGEXP_LIKE (text, '\Deb(bie|orah)\');

You can see that within the pattern matching we basically programmed the option for the text to contain 'Debbie' or the alternative 'Deborah' by giving the option (bie|orah) in the expression. This is a basic form of providing a template to match a string against.

Another complex example where you can only get by with expressions is finding email addresses. You could quickly do this by providing the following expression. Basically, you are building the portions of the email before the '@' sign and after. Do take a close look at this pattern as it requires the three distinct pieces of an email address and allows for periods '.' as well as underscores '_' in the name. All of which are quite valid.

SELECT text FROM my_text
WHERE REGEXP_LIKE (text, '[A-Z0-9._%-]+@[A-Z0-9._%-]+\.[A-Z]{2,4}');

Therefore, if you now only wanted to search for '.com' email addresses you could change the SQL to the following.

SELECT text FROM my_text WHERE REGEXP_LIKE (text, '[A-Z0-9._%-]+@[A-Z0-9._%-]+\.com');

Another often-difficult item is to find if there is a number in a string of text or to validate if a character string is numeric. To find if a string contains a number that follows the pattern of containing a decimal place, you could use this method. Keep in mind that numbers that have a decimal point should have a valid number after the decimal point, and I have specified a '+' to indicate one or more numbers.

SELECT text FROM my_text WHERE REGEXP_LIKE (text, '[0-9]?\.[0-9]+');

Regular expressions are quite powerful when looking for texts patterns or validating a specific pattern such as a phone number, email address, IP numbers, or just about any pattern that needs to be checked. On the flip side of this is checking for invalid formats and patterns of characters. In our previous example, we could easily have searched for text where 'Debbie' is misspelled as 'Debbbie' or 'Debie'.

SELECT text FROM my_text WHERE REGEXP_LIKE (text, '(Debbbie)'); 

On the other hand, to get those with really heavy fingers where the 'b' may have been keyed in more than 3 times we could use the following

SELECT text FROM my_text WHERE REGEXP_LIKE (text, '(Deb)(b){2,}');

Regular expressions may seem very convoluted at first. It even took me a bit of effort to relearn these patterns, as I have not used them for a bit. However, after taking the time to learn how to construct them you will be very thankful you did when confronted with difficult search criteria.

So where do you learn how to construct regular expressions? This really is not as difficult as you might think. Regular expressions have been around for a long time in the UNIX world and thus there is a lot of information on the net. The Oracle manuals have a few pages, about 5, that are dedicated to regular expressions so your best bet is to start there and supplement heavily from the net. You can easily search on Yahoo, Google, or your favorite search engine for tutorials and web sites strictly dedicated to regular expressions.

Here are a few web sites I found useful

Regular Expressions HOWTO

Perl Regular Expression Tutorial

Regular-Expressions.info

Regular Expressions From Wikipedia

» See All Articles by Columnist James Koopmann








The Network for Technology Professionals

Search:

About Internet.com

Legal Notices, Licensing, Permissions, Privacy Policy.
Advertise | Newsletters | E-mail Offers