Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

MySQL

Posted Feb 19, 2003

Using Fulltext Indexes in MySQL - Part 2, Boolean searches - Page 2

By Ian Gilfillan

By default only whole words are matched, unless the '*' operator is used.

The '<' and '>' symbols are less commonly used, but they allow a great degree of control for relevance. In the following examples, we return the relevance indicator to demonstrate the difference between the queries.

mysql> SELECT copy, MATCH(copy) 
AGAINST ('appears good alert' IN BOOLEAN MODE) 
AS m FROM fulltext_sample WHERE MATCH(copy) 
AGAINST ('appears good alert' IN BOOLEAN MODE);
+---------------------------+------+
| copy                      | m    |
+---------------------------+------+
| It appears good from here |    2 |
| An all-out alert          |    1 |
| A good alert              |    2 |
+---------------------------+------+

mysql> SELECT copy, MATCH(copy) 
AGAINST ('appears <good alert' IN BOOLEAN MODE) 
AS m FROM fulltext_sample WHERE MATCH(copy) 
AGAINST ('appears <good alert' IN BOOLEAN MODE);
+---------------------------+------------------+
| copy                      | m                |
+---------------------------+------------------+
| It appears good from here |  1.6666667461395 |
| An all-out alert          |                1 |
| A good alert              |  1.6666667461395 |
+---------------------------+------------------+

The '<' operator decreases the relevance of the word 'good', in this case by approximately 0.33.

mysql> SELECT copy, MATCH(copy) 
AGAINST ('appears good >alert' IN BOOLEAN MODE) 
AS m FROM fulltext_sample WHERE MATCH(copy) 
AGAINST ('appears good >alert' IN BOOLEAN MODE);
+---------------------------+------+
| copy                      | m    |
+---------------------------+------+
| It appears good from here |    2 |
| An all-out alert          |  1.5 |
| A good alert              |  2.5 |
+---------------------------+------+

The '>' operator increases the relevance of the word 'alert', in this case by 0.5.

The parentheses group words into a subexpression. In the following example, the '+' symbol applies to the group of words, so that at least one of 'appears' or 'past' must appear.

mysql> SELECT copy, MATCH(copy) 
AGAINST ('+(appears past)' IN BOOLEAN MODE) 
AS m FROM fulltext_sample WHERE MATCH(copy) 
AGAINST ('(appears past)' IN BOOLEAN MODE);
+---------------------------+------+
| copy                      | m    |
+---------------------------+------+
| It appears good from here |    1 |
| The here and the past     |    1 |
+---------------------------+------+

You can also apply the operators to words in the subexpression, as follows:

mysql> SELECT copy, MATCH(copy) 
AGAINST ('+(>appears <past)' IN BOOLEAN MODE) 
AS m FROM fulltext_sample WHERE MATCH(copy) 
AGAINST ('+(>appears <past)' IN BOOLEAN MODE);
+---------------------------+------------------+
| copy                      | m                |
+---------------------------+------------------+
| It appears good from here |              1.5 |
| The here and the past     | 0.66666668653488 |
+---------------------------+------------------+
The '~' operator contributes negatively to the relevance, but does not bar the word from appearing, as the '-' operator does. It also does not do the same as the '<' operator, which is to make a reduced, yet still positive, difference to the relevance. These subtle difference allows for powerfully tuned searches, but can cause confusion. Look at the difference between the following. First, the word 'here' is compulsory, and the word 'past' optional:
mysql> SELECT copy, MATCH(copy) 
AGAINST ('+here past' IN BOOLEAN MODE) 
AS m FROM fulltext_sample WHERE MATCH(copy) 
AGAINST ('+here past' IN BOOLEAN MODE);
+---------------------------+-----------------+
| copy                      | m               |
+---------------------------+-----------------+
| It appears good from here |               1 |
| The here and the past     | 1.3333333730698 |
| Why are we here           |               1 |
+---------------------------+-----------------+
Next, a reduced relevance for the word 'past'. The same results are returned, but the record with both words ('The here and the past') has less weighting, though still more than the other records.
mysql> SELECT copy, MATCH(copy) 
AGAINST ('+here <past' IN BOOLEAN MODE) 
AS m FROM fulltext_sample WHERE MATCH(copy) 
AGAINST ('+here <past' IN BOOLEAN MODE);
+---------------------------+-----------------+
| copy                      | m               |
+---------------------------+-----------------+
| It appears good from here |               1 |
| The here and the past     | 1.2222222089767 |
| Why are we here           |               1 |
+---------------------------+-----------------+
Take note of the relevance in the next example:
mysql> SELECT copy, MATCH(copy) 
AGAINST ('+here ~past' IN BOOLEAN MODE) 
AS m FROM fulltext_sample WHERE MATCH(copy) 
AGAINST ('+here ~past' IN BOOLEAN MODE);
+---------------------------+------------------+
| copy                      | m                |
+---------------------------+------------------+
| It appears good from here |                1 |
| The here and the past     | 0.83333331346512 |
| Why are we here           |                1 |
+---------------------------+------------------+
The important difference is that the relevance is negatively affected, and is now lower than the other two records. This means if you asked for only '~past', you'd get nothing back, as the relevance would be lower than 0 (and therefore not returned)
mysql> SELECT copy, MATCH(copy) 
AGAINST ('~past' IN BOOLEAN MODE) 
AS m FROM fulltext_sample WHERE MATCH(copy) 
AGAINST ('~past' IN BOOLEAN MODE);
Empty set (0.00 sec)
Finally, '-' simply removes the second record from the result set, as we've seen before.
mysql> SELECT copy, MATCH(copy) 
AGAINST ('+here -past' IN BOOLEAN MODE) 
AS m FROM fulltext_sample WHERE MATCH(copy) 
AGAINST ('+here -past' IN BOOLEAN MODE);
+---------------------------+------+
| copy                      | m    |
+---------------------------+------+
| It appears good from here |    1 |
| Why are we here           |    1 |
+---------------------------+------+
The final operator, the double quotes ("), is useful to group phrases together. So, if you were interested in returning 'good from here' but none of those words if they did not appear in the phrase, you'd use:
mysql> SELECT copy, MATCH(copy) 
AGAINST ('"good from here"' IN BOOLEAN MODE) 
AS m FROM fulltext_sample WHERE MATCH(copy) 
AGAINST ('"good from here"' IN BOOLEAN MODE);
+---------------------------+------+
| copy                      | m    |
+---------------------------+------+
| It appears good from here |    1 |
+---------------------------+------+

For many applications, boolean searches are one of MySQL 4's most useful features. With careful use, you can save yourself substantial of development time, and add many useful enhancements. Good luck!

» See All Articles by Columnist Ian Gilfillan



MySQL Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
MySQL Forum
Topic By Replies Updated
MySQL in high availability and transction secure banking application klamor 1 August 28th, 10:24 AM
MySQL rollback UAL225 1 August 28th, 10:15 AM
Browsing a DB file that uses MySql finleytech 1 July 26th, 10:51 AM
php cookie won't pass variable to next page ITdevGirl 0 June 22nd, 12:13 PM


















Thanks for your registration, follow us on our social networks to keep up-to-date