Full Text Search on SQL 2000 Part 1
December 2, 2004
Full Text Searching is a free, optional component of MS SQL 2000. When installed, it offers a vast array of additional string querying abilities. Full Text Searching allows for string comparisons similar to internet search engines, returning both results and a matching score or weight. With regular TSQL, string matching is usually limited to an exact match, or a wildcard match with the keyword "LIKE." Full Text Searching exceeds this by searching for phrases, groups of words, words near one another, or different tenses of words, such as run, running, and ran. In addition, if Microsoft Office Word or Excel documents are saved in the database, their contents can be searched like a typical varchar field. Full Text Searching is accomplished by installing a new service (Microsoft Search), and using key words in TSQL designed specifically for text searching. This article will demonstrate installing, configuring and using the Full Text Search engine.
Unlike other SQL operations, Full Text Searching is not managed by the MSSQL Server service. Instead, a new service, Microsoft Search, must be installed and started. A primary reason for this is that Full Text Searching is done against a special index contained in a Catalog. Catalogs that are used to support searching are not part of the regular SQL server installation; they are kept outside the MDF, and are stored in separate physical files. The Microsoft Search service has two primary jobs, indexing support and querying support. Indexing support includes the tasks of defining the Catalogs, and indexes they contain, creating them and keeping the data up to date as changes take place. When search queries are issued, the second job of the Microsoft Search service begins, determining which Catalog index meets the query requirements. Once determined, Microsoft Search returns the identity of selected rows, along with a ranking value if requested, back to the SQL Server service for query completion. This brings up two requirements that the Microsoft Search service imposes. First, there must be a unique index on the base table to which the Catalog refers. This is required because the Search service only reports the identity of the selected rows back to the SQL service, not the record set itself. Second, Full Text Searching is only supported on Windows Server. Either NT 4 or 2000 Server can be used, but not Workstation, Windows 98, or XP. These other clients can issue quires, but not host the Search service.
Check in the services list for "Microsoft Search." If not listed, insert the SQL Server media, and select "Upgrade, Add or Remove Components." Click next for "Add Components." On the next screen, select "Server Components," Full-Text Search, and then continue with the rest of installation.
Creating a Catalog
In this first example, we will be creating a Full Text Search on the pubs database. To begin, open Enterprise Manager and highlight the local server.
1. From the top menu, select Tools, Full Text Indexing.
2. Click next on the opening screen.
3. On the "Select a Database" screen, choose pubs.
4. For the next screen, select "titles" as the table and click next.
5. Keep the default unique index; there is only one on the pubs table.
6. In the next screen, select Title and Notes as the fields we would like indexed.
7. On the "Select a Catalog" screen, because this is the first Catalog on the server, we are prompted for a name. In this example, "demo" is used as the Catalog name.
8. This next screen "Populate Catalogs," leave blank and click next.
9. Click Finish, and the following success screen should appear:
10. From Enterprise Manager, expand Full-Text Catalogs, the "demo" catalog should appear in the right window.
11. Right click and select "Start Full Population."
12. The catalog is now complete.
13. To verify in creation, open Query Analyzer and select the pubs database.
14. Execute sp_help_fulltext_catalogs, the new Catalog should appear.
15. Issue a test query of:
SELECT title, notes FROM titles WHERE CONTAINS (notes, ' "quick easy" ')
This should return one record with a title of "Silicon Valley Gastronomic Treats."
One of the first items to notice about Full Text Searching is that it is not automatic. Even after Microsoft Search is installed, it still must be configured for each database, table, and field we want to search. If a field will need to be searched, it must first be included in a Catalog. Steps 1 through 4 and 6 accomplished this.
Step 5 demonstrates one the Full Text Search requirements, that each table made available for searching contains a unique index. When a search query is issued, Microsoft Search will determine which records meet the query, and will report their unique index key back to the MSSQL Server service for completion of the query.
The result of Step 7, "Select a Catalog," is the creation of an entire file structure to support the Catalog. To find the name of structure from Enterprise Manager, expand Full-Text Search, then right click the "demo" Catalog and select properties. By default, Catalogs will be saved under the default server directory, MSSQL, FTDATA.
For the "demo" Catalog, a new directory structure named "SQL00000500005" as shown in the "Physical catalog" field, has been created. This structure will store the indexes and data files needed by the Search service. They are not text readable. The "Item count" field shows the sum of the tables and rows indexed.
Full Text Searching brings advanced string querying support to SQL. In this first article, a new Catalog containing an index to search on two fields of the pubs database has been created. Future articles will expand this Catalog to include searching for words near each other, returning the rank or weight of the search, and indexing Microsoft Office documents. A number of housekeeping tasks will also be introduced, such as keeping the Catalog up to date when data on the base table changes, and determining whether a word should be indexed or not (Noise words). Although Full Text Searching is far from automatic, the results it provides make it worth the effort.