Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS Access
Database Tools
SQL Scripts & Samples
» Database Forum
» Slideshows
» Sitemap
Free Newsletters:
News Via RSS Feed

follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2

Featured Database Articles


Posted November 9, 2015

MySQL Pattern Matching using Regular Expressions

By Rob Gravelle

The Beyond the LIKE Operator: Advanced Pattern Matching with MySQL article introduced MySQL's implementation of regular expressions using the REGEXP and RLIKE alias operators.  It demonstrated how to enhance the LIKE operator’s capabilities for more complex pattern matching.  In today’s follow-up, we’re going to take things up a notch and use REGEXP and RLIKE to find some very specific strings, containing alphanumeric and a combination of upper and lower case characters.  We’ll also explore the role of meta-characters while comparing some differences between standard regular expression syntax and that of MySQL.

The Role of Meta-characters

The power of regular expressions comes from the ability to include alternatives and repetitions in the pattern. These are represented in the pattern by meta-characters, which are a special type of character or character combinations that are not interpreted literally, but as having a particular meaning.

Meta-characters outside of square brackets differ from those within square brackets.  Here are meta-characters that may be used anywhere in a pattern outside of square brackets.  Note that the characters may still be used inside the square brackets, but they will be interpreted literally – i.e. a period (.) represents a period.

  •   . match any character
  •   ? match zero or one
  •   * match zero or more
  •   + match one or more
  •   {n} match exactly n times
  •   {m,n} match m through n times
  •   {n,} match n or more times
  •   ^ beginning of line
  •   $ end of line | separates alternatives such as “cat|dog|mouse|fish”.  
    Note that, with the lowest precedence of all regex operators, the alternation operator tells the regex engine to match either everything to the left of the vertical bar, or everything to the right of the vertical bar. If you want to limit the reach of the alternation, you need to use parentheses for grouping. For instance, to improve the first example to match “dog” or “cat” only, as opposed to “bearcat” or “dogma”, we would need to write “\b(cat|dog)\b”. 

Inside square brackets, [abc] matches any one of enclosed chars, unless the ^ meta-character is placed immediately after the opening square bracket.  (Yes, the same one that matched the beginning of the line outside of the square brackets!)  Here, it tells the REGEX engine to match any character other than the ones contained within the square brackets so that [^xyz] would match any character other than x, y, or z.  Hence, it’s a negating meta-character. 

Matching Word Variations

Since each instance of square bracket pairs represents one character match, much like a|b|c, word variations may be matched using successive square brackets.  For example, to match the word “Withdrawal” as well as a number of possible misspellings, we can employ a pattern such as:


That will match “Withdrawal”, “widrawal”, “widrowal”, and “withdrawals”, but not something as far off as "withd tytyt rawal".

When examining the above pattern, bear in mind that:

  • REGEXP is not case sensitive, except when used with binary strings, so there was no need to check the case of the last three letters. 
  • The question mark makes the trailing “s” optional.

POSIX Character Classes

Several common character combinations have been assigned names for easier use.  Known as POSIX character classes, these may be used inside a bracket expression like [x-z[:digit:]] to represent a class of characters.

The POSIX character class names must be written all lowercase, as shown in the table below:



Character Sets


Alphanumeric characters



Alphabetic characters



ASCII characters



Space and tab

[ \t]


Control characters






Visible characters (i.e. anything except spaces, control characters, etc.)



Lowercase letters



Visible characters and spaces (i.e. anything except control characters, etc.)



Punctuation and symbols.



All whitespace characters, including line breaks

[ \t\r\n\v\f]


Uppercase letters



Word characters (letters, numbers and underscores)



Hexadecimal digits


POSIX Word Boundaries

The POSIX standard also defines [[:<:]] as a start-of-word boundary, and [[:>:]] as an end-of-word boundary. A word is a sequence of alphanumeric characters and/or underscore(s) (_) that is not preceded by or followed by non-word characters. The word boundary marker is usually a whitespace character, but any word character that is not an alphanumeric character or an underscore (_) will work. Word boundary tokens cannot be used inside square brackets.

POSIX Word Boundaries in Action

The following stored procedure code returns the ID field of every row where the post_title field contains a complete word that matches the @search_value input parameter.  Hence a @search_value of “Diner” would match “Ed’s Easy Diner”, “Wellington Diner”, but not “Diners’ Choice” (due to the “s”).


FROM wp_posts 
WHERE post_title REGEXP CONCAT('[[:<:]]', @search_value, '[[:>:]]') 
AND post_type   = 'restaurant' 
AND post_status = 'publish';

Escaping Meta-characters

A common source of confusion when escaping meta-characters is that the same character that is used to escape control characters within strings - the backslash (“\”) – is also used within regexes.  For that reason, you actually need to use four backslashes to represent a real backslash character because the string parser will remove two of them when "de-escaping" it for the string.  That will leave two backslashes to represent the escaped regex backslash.



is interpreted as...

regex("\\")  - escaped backslash followed by a second escaped backslash

Meanwhile, passing a double backslash (“\\”) to the regex() function is interpreted as…

Regex("\") - a single backslash

To bring the point home, I added a backslash character to the beginning of the word “Teacher” in the description field of one row of the public sakila database’s film table:

An Epic Drama of a Feminist And a Mad Scientist who must Battle a \Teacher in The Canadian Rockies

To match the backslash as part of the “Teacher” word, we can’t use the [[:<:]] POSIX class because the backslash character does not count as part of a word.  For that reason, we have to use the [[:space:]] class instead, followed by four backslashes to represent the literal backslash character:

WHERE description REGEXP '[[:space:]]\\\\Teacher[[:>:]]'


Keep an eye out for performance when using MySQL REGEX operator.  Whenever possible, try to narrow down the range of suitable rows using more generic criteria.  For instance, the following query employs an inner SELECT with the LIKE function:

   SELECT * FROM people
   WHERE dob_year LIKE '197%'
) WHERE dob_year regexp '^197\d$'

An index can be used to narrow down likely rows because LIKE '179%' is sargable, meaning that the DBMS engine can take advantage of an index to speed up the execution of the query.   The term sargable is derived from a contraction of “Search ARGument ABLE”!

See all articles by Rob Gravelle

MySQL Archives

Comment and Contribute


(Maximum characters: 1200). You have characters left.



Latest Forum Threads
MySQL Forum
Topic By Replies Updated
MySQL in high availability and transction secure banking application klamor 1 August 28th, 10:24 AM
MySQL rollback UAL225 1 August 28th, 10:15 AM
Browsing a DB file that uses MySql finleytech 1 July 26th, 10:51 AM
php cookie won't pass variable to next page ITdevGirl 0 June 22nd, 12:13 PM