Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

MySQL

Posted February 7, 2013

Performing Full-text Searches in MySQL 5.6

By Rob Gravelle

A few years ago, I had the opportunity to work on a system that performed full-text searches on stored documents in combination with some database key fields.  The full-text searches were accomplished by a third-party product that shall remain nameless. There were many problems with it, from crashing, to input errors. Ultimately, it proved to be a hindrance as much as a benefit.   

Luckily for you, there are many more options available now. For MySQL users, version 5.6 includes native full-text support.  If you’re not using MySQL, maybe this will be just the impetus you need to make the switch.  In today’s article, we’re going to learn how to take advantage of MySQL 5.6’s Full-text search capabilities.

What Full-text Search is…and What It Isn’t

Full-text searching is commonly performed by specialized search engines whose aim is to extract interesting and nontrivial information from freeform text. Full-text searches tend to produce better search results by the use of specialized techniques such as pattern recognition, natural language processing, and text analysis in order to provide quick retrieval of information that otherwise would remain concealed.

In all full-text engines, there is a trade-off between "precision" and "recall". High precision means that fewer irrelevant results are presented (no false positives); while high recall means that fewer relevant results are missing (no false negatives).

In contrast to full-text search, queries using the LIKE keyword work on character patterns only. The SQL LIKE operator can also be extremely inefficient. If you apply it to an un-indexed column, a full scan will be used to find matches (just like any query on an un-indexed field). If the column is indexed, matching can be performed against index keys, but with far less efficiency than most index lookups. The slowest LIKE patterns tend to be those with leading wildcards that require every index key to be examined.  A LIKE query against millions of rows of text data can take minutes to return compared to a few seconds or less a for a full-text query, depending on the number of rows that are returned.

The MySQL InnoDB Full-Text Search (FTS) Engine

The MySQL InnoDB FTS engine lets users build FULLTEXT indexes on InnoDB tables to represent text-based content and speed up searches for words and phrases. Combining full-text search with InnoDB tables expands text capability in transactional operations, especially where the textual data is frequently inserted, updated, and deleted.

Creating an FTS Index on an InnoDB Table

FULLTEXT indexes can be created on any CHAR, VARCHAR, or TEXT column.  The MySQL documentation recommends loading the table with data first, then creating the FULLTEXT index. This is a lot faster and therefore preferable to creating the table with an empty FULLTEXT index, then inserting documents into it. For large data sets, it’s quickest to load your data into a table that has no FULLTEXT index and then create the index after that, than to load data into a table that has an existing FULLTEXT index.

The FTS_DOC_ID Column

Each InnoDB table with a FULLTEXT index includes a column called FTS_DOC_ID. Specifying this column in the original table definition saves a rebuild of the entire table when the FULLTEXT index is created. To add it, include the column FTS_DOC_ID (all uppercase) to the table with the FTS index. The column must be of BIGINT UNSIGNED NOT NULL data type. Although it doesn’t have to be, making it an auto_increment column may make the loading easier:

use test; 
 drop table if exists quotes;
create table quotes  
 (   id int unsigned auto_increment primary key    
   , author varchar(64)        
   , quote varchar(400)    
   , source varchar(64)  
 --  You can also create fulltext index at create table time
 --  , fulltext (quote)
 ) engine=innodb;


insert into quotes (author, quote, source) values  
 ('Stephen Colbert', 'Is that a pizza or did an angel just give birth in my mouth?', null),
 ('Sir Paul McCartney', 'Animation is not just for children. It\'s also for adults who take drugs.', null),
 ('Michael Caine', 'I have never seen [Jaws: The Revenge], but by all accounts it was terrible. However, I have seen the house that it built, and it is terrific.', null),
 ('Professor Rohl', 'What we feel isn\'t important...The only question is what we do.', '2008 movie "The Reader"'),
 ('Steve McConnell', 'When someone says, "This is really tricky code," I hear them say, "This is really bad code."', null),
 ('Charlie Sheen', 'I have a disease? Bulls**t! I cured it ... with my mind.', null),
 ('Anonymous', 'Amateurs practice something until they get it right.  Professionals practice something until they almost never get it wrong.', null);
 

create fulltext index idx on quotes(quote);
 

Searching Against the Index

Once the data is loaded, you can run queries against it using the MATCH (columns) AGAINST (search expression) statement. Adding the WHERE clause and other filtering clauses can further refine your SELECT statements:

select quote 
 from   quotes  
 where  match(quote) against ('have' in natural language mode);
 
quote
 ------------------------------------------------------------------------------------
 I have never seen [Jaws: The Revenge], but by all accounts it was terrible. However, I have seen the house that it built, and it is terrific.
 ------------------------------------------------------------------------------------
 I have a disease? Bulls**t! I cured it ... with my mind.

The MATCH() method takes a comma-separated list that names the columns to be searched. AGAINST() takes a string to search for, and an optional modifier that indicates what type of search to perform. Note that the search string must be a string value that is constant during query evaluation. This would preclude table columns because they can differ for each row.

There are three types of full-text searches:

search_modifier:
{
    IN NATURAL LANGUAGE MODE
  | IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION
  | IN BOOLEAN MODE
  | WITH QUERY EXPANSION
}

Natural Language Mode

Full-text searches default to Natural Language searches if no modifier is given.  Having said that, there is nothing preventing you from including the IN NATURAL LANGUAGE MODE modifier.

In Natural Language mode, MATCH() looks for rows or documents, which are relevant to the free-text natural human language query, such as “I have seen the house that it built”. For each row in the table, MATCH() returns a positive floating-point relevance value, whereby a relevance of zero would mean no similarity and one a perfect match. MySQL computes the relevance number based on the number of words in the row, the number of unique words in that row, the total number of words in the collection, and the number of documents (rows) that contain a particular word.

Boolean Mode

In Boolean mode, certain characters have special meaning at the beginning or end of words in the search string. In the following query, the + and - operators indicate that a word must be present or absent, respectively, for a match to occur:

select quote 
 from   quotes 
 WHERE  MATCH (quote) AGAINST ('+have - disease' IN BOOLEAN MODE);
 
 quote
 
 ------------------------------------------------------------------------------------
 I have a disease? Bulls**t! I cured it ... with my mind.

Blind Query Expansion

Full-text search supports a variant of query expansion called “blind query expansion” (or “automatic relevance feedback”). It’s useful when the search expression is under four characters long - the minimum word length - or relies on implied knowledge that people would possess but that the full-text search engine would not. For example, someone searching for “car” probably wants to see matches for “Porsche”, “Bentley”, “Jaguar”, and “Pinto” as well. All of these automobile brands are “implied” by the search term.

Blind Query Expansion works by performing the search twice, where the search phrase for the second search is the original search phrase concatenated with the few most highly relevant documents from the first search. Thus, if one of these documents contains the word “car” and the word “Corvette”, the second search finds the documents that contain the word “Corvette” even if they do not contain the word “car”.

Include the WITH QUERY EXPANSION or IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION modifiers to turn on Blind Query Expansion:

select quote 
 from   quotes 
 WHERE  MATCH (quote) AGAINST ('have' WITH QUERY EXPANSION);
 
 quote
 ------------------------------------------------------------------------------------
 I have never seen [Jaws: The Revenge], but by all accounts it was terrible. However, I have seen the house that it built, and it is terrific.
 ------------------------------------------------------------------------------------
 I have a disease? Bulls**t! I cured it ... with my mind.

Conclusion

It’s been a long time coming, but true MySQL full-text searching is finally here.  I only wish it had been around when I was writing my full-text system. L

See all articles by Rob Gravelle



MySQL Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
MySQL Forum
Topic By Replies Updated
MySQL in high availability and transction secure banking application klamor 1 August 28th, 10:24 AM
MySQL rollback UAL225 1 August 28th, 10:15 AM
Browsing a DB file that uses MySql finleytech 1 July 26th, 10:51 AM
php cookie won't pass variable to next page ITdevGirl 0 June 22nd, 12:13 PM