Full Text Search on SQL 2000 Part 2

Introduction

In last month’s
article
, Full Text Searching was introduced as a way to query strings with
more refinement than the usual TSQL "like" or equal operator
statements. With Full Text Searching, a new file system structure is created,
storing key words from selected fields into Catalogs. In addition to storing
typical character fields from databases, Microsoft Office documents that have
been saved as binaries can also be entered into the Catalogs. Once created,
these catalogs can be interrogated for key words that are near each other’s
proximity, the singular and plural versions of a word, or the noun and verb
variations of a word can all be searched for. In addition, search result
rankings, or weights, that usually accompany internet search engine returns can
also be requested.

In Part 1, a simple Catalog of the pubs database was
created. The Catalog, called demo, contained the title and note fields from
the titles table. The Catalog was populated, and a query returned
successfully. This article continues with the sample data from Part 1.

CONTAINS

At its simplest level, the key word CONTAINS behaves similar
to LIKE. To begin, we will use CONTAINS for precise string matching. The differences
between CONTAINS and LIKE will not be very evident here. Using the pubs
Catalog created in Part 1, issue the query:

SELECT title_id, title, notes
FROM titles
WHERE CONTAINS (notes, ' "computer" ')

The statement returns three rows, each containing the word "computer"
in the notes field.

The Catalog created on the pubs database contained two
fields: notes, and title. These are the only two fields allowed for CONTAINS
searching. If a query is issued on a field not included in the Catalog, SQL
will return an error. Evidenced by searching on the "type" field
below:

SELECT title_id, title, notes
FROM titles
WHERE CONTAINS (type, ' "computer" ')

This statement returns:

Server: Msg 7601, Level 16, State 3, Line 1
Cannot use a CONTAINS or FREETEXT predicate on column 'type' because it is not full-text indexed.

The syntax of the CONTAINS is the Cataloged field name,
followed by the word or phrase to search for. Although only fields Cataloged
can be searched on, any field in the table can be returned. Our first
statement returned the title_id field. The same is true of joining a CONTAINS
statement to another table. It follows the usual TSQL JOIN statement rules.
This next query joins the CONTAINS statement to the titleauthor table, useful
if author names were needed.


SELECT titles.title_id, titles.title, titles.notes, titleauthor.au_id
FROM titles INNER JOIN titleauthor ON
titles.title_id = titleauthor.title_ID
WHERE CONTAINS (titles.notes, ‘ “computer” ‘)

The statement functions like a standard join returning the
following.

The first query seemed very similar to a regular statement
that could have been rewritten using the LIKE key word. However, if we run a
LIKE statement, some differences will appear:


SELECT title_id, title, notes
FROM titles
WHERE notes LIKE ‘%computer%’

This LIKE statement returns five records rather than the three
returned by the CONTAINS:

The difference is the plural "computers," returned
by LIKE. The CONTAINS statement was only looking for an exact text match of "computer."
Plurals were ignored. As a side note, CONTAINS is case insensitive. It sees
no difference between "COmputERS" and "computers." To
include the plurals, the CONTAINS is changed from what is known as a simple
term search, to a prefix term search. This is accomplished by using the *
sign. Change the original search statement to:


SELECT title_id, title, notes
FROM titles
WHERE CONTAINS (notes, ‘”computer*” ‘)

The query will now return the same five records as the LIKE
statement. This leads to a discussion about the use of single and double
quotes. The double quotes indicate the asterisk should look for zero, one, or
more characters after the root word. Without the double quotes, such as:


SELECT title_id, title, notes
FROM titles
WHERE CONTAINS (notes, ‘computer*’)

The asterisk will now be treated as literal character to be
found, rather than a wild card. Only the three records with the singular use of
"computer" will be returned with this statement.

FORMSOF

Another way to find variants of a word is change our prefix
term search into a generation term search by the use of a new key word:
FORMSOF. FORMSOF is used in conjunction with CONTAINS to look for variants of
the search word. To use FORMSOF, the original statement is changed to:


SELECT title, notes
FROM titles
WHERE CONTAINS(notes, ‘FORMSOF (INFLECTIONAL, computer)’)

This will return all five records, those with the word "computer,"
and the plural "computers." Again, we could obtain the same results
by using a LIKE statement, but the real power of FORMSOF becomes evident when
the word "good" is searched for. This next LIKE statement will not
return any matching records:


SELECT title, notes
FROM titles
WHERE notes LIKE ‘%good%’

The word "good" does not appear in the notes
field. However, this next FORMSOF statement will return one record:


SELECT title, notes
FROM titles
WHERE CONTAINS(notes, ‘FORMSOF (INFLECTIONAL, good)’)

The difference is that in addition to returning singular and
plural forms of a word, FORMSOF will also include gender and neutral forms of
nouns, verbs, and adjectives. In the above example, "best" is
returned as a variant of "good."

NOISE WORDS

If we change the above statement around, and look for
variants of the word "and" by using the following statement:


SELECT title, notes
FROM titles
WHERE CONTAINS(notes, ‘FORMSOF (INFLECTIONAL, and)’)

An error will be returned rather than the previously
returned record:

Server: Msg 7619, Level 16, State 1, Line 1
Execution of a full-text operation failed. A clause of the query contained only ignored words.

The word "and" is considered a NOISE WORD. These
are common words that MS Search feels should be ignored, and not be included in
any Catalog. The full list of NOISE WORDS is located at \Program
Files\Microsoft SQL Server\MSSQL\FTDATA\SQLServer\Config . The file
representing your language can be opened and modified with a text editor. Any
changes made will be reflected on the next Catalog population.

Conclusion

Full Text Searching goes beyond simple LIKE string searching
to provide new query possibilities. Next month, we will continue with advanced
searching, weights, searching Microsoft Office documents, and maintenance
issues.

»


See All Articles by Columnist
Don Schlichting

Don Schlichting
Don Schlichting
Don Schlichting is a Microsoft MSCE, MCDBA and an Oracle Certified Associate. Working at Professional Control Corporation as the IT Director,.

Latest Articles