Using Fulltext Indexes in MySQL – Part 2, Boolean searches


Part 1
of this article looked at the fulltext index, and how to search on it
using an ordinary MATCH() AGAINST(). Even more powerful, (although only available on the newer
MySQL version 4), is the ability to do a boolean search. Part 2 of this article
examines the possibilities.

You’ll use the same table you used in
Part 1
. The full list of records is:

mysql> SELECT * FROM fulltext_sample;
+---------------------------+
| copy                      |
+---------------------------+
| It appears good from here |
| The here and the past     |
| Why are we here           |
| An all-out alert          |
| All you need is love      |
| A good alert              |
+---------------------------+

To perform a boolean search, the IN BOOLEAN MODE modifier is used.
The following query demonstrates the new syntax.

mysql> SELECT * FROM fulltext_sample 
WHERE MATCH (copy) AGAINST ('love' IN BOOLEAN MODE);
+----------------------+
| copy                 |
+----------------------+
| All you need is love |
+----------------------+

An important difference between ordinary MATCH() AGAINST() queries
and IN BOOLEAN MODE searches is that the latter does not apply the 50% limit (so the word
can appear in more than half the rows).

There are a number of special operators you’ll need to know in order to make
the most of the boolean search. If no symbol is specified, the appearance of this
word causes the relevance of the row to be higher, similar to an ordinary MATCH() AGAINST().

Boolean Search Operators

+ The word is mandatory in all rows returned.
The word cannot appear in any row returned.
< The word that follows has a lower relevance than other words,
although rows containing it will still match
> The word that follows has a higher relevance than other words.
() Used to group words into subexpressions.
~ The word following contributes negatively to the relevance of
the row (which is different to the ‘-‘ operator, which specifically excludes the
word, or the ‘<‘ operator, which still causes the word to contribute
positively to the relevance of the row.
* The wildcard, indicating zero or more characters. It can only appear
at the end of a word.
Anything enclosed in the double quotes is taken as a whole (so you
can match phrases, for example).

Examples

Let’s see some of this action. The first example returns all rows containing the
word ‘here’, but not the word ‘past’.

mysql> SELECT copy FROM fulltext_sample WHERE MATCH(copy) 
AGAINST ('+here -past' IN BOOLEAN MODE);
+---------------------------+
| copy                      |
+---------------------------+
| It appears good from here |
| Why are we here           |
+---------------------------+

Compare the difference between the next two examples:

mysql> SELECT copy FROM fulltext_sample WHERE MATCH(copy) 
AGAINST ('here past' IN BOOLEAN MODE);
+---------------------------+
| copy                      |
+---------------------------+
| It appears good from here |
| The here and the past     |
| Why are we here           |
+---------------------------+

mysql> SELECT copy FROM fulltext_sample WHERE MATCH(copy) 
AGAINST ('here +past'IN BOOLEAN MODE);
+-----------------------+
| copy                  |
+-----------------------+
| The here and the past |
+-----------------------+

The first works in a similar way to an ordinary MATCH() AGAINST(), albeit without the
50% threshold. The word ‘past’ is mandatory in the second search (by default a word is optional), so the other two records are not returned.

The next example demonstrates a common mistake.

mysql> SELECT copy FROM fulltext_sample 
WHERE MATCH(copy) AGAINST ('+are here' IN BOOLEAN MODE);
+---------------------------+
| copy                      |
+---------------------------+
| It appears good from here |
| The here and the past     |
| Why are we here           |
+---------------------------+

The results may appear surprising compared the previous example, but since ‘are’ contains three or less letters, it is excluded for purposes of the search and is not mandatory.

The next two examples demonstrate a powerful enhancement for searching purposes:

mysql> SELECT copy FROM fulltext_sample 
WHERE MATCH(copy) AGAINST ('aler' IN BOOLEAN MODE);
Empty set (0.01 sec)

mysql> SELECT copy FROM fulltext_sample 
WHERE MATCH(copy) AGAINST ('aler*' IN BOOLEAN MODE);
+------------------+
| copy             |
+------------------+
| An all-out alert |
| A good alert     |
+------------------+
Ian Gilfillan
Ian Gilfillan
Ian Gilfillan lives in Cape Town, South Africa. He is the author of the book 'Mastering MySQL 4', published by Sybex, and has been working with MySQL since 1997. These days he develops mainly in PHP and MySQL, although confesses to starting out with BASIC and COBOL, way back when, and still has a soft spot for Perl. He developed South Africa's first online grocery store, and has developed and taught internet development and other technical courses for various institutions. He has majors in Programming and Information Systems, as well as English and Philosophy. For 5 years he was Lead Developer and IT Manager for Independent Online, South Africa's premier news portal. However, he has now 'retired' from fulltime work, and is hoping that his next book will be more in the style of William Blake and Allen Ginsberg.

Latest Articles