Full Text Index Implementation in SQL Server 2005 Express Edition

November 16, 2009

In the previous installment of our series dedicated to the most important features of SQL Server 2005 Express Edition, we have presented the basic concepts and characteristics of Full Text Indexing. We have also described the initial steps leading to creation of a file system-based catalog, intended for hosting index-related data along with additional components that affect indexing and searching behavior (such as noise word and thesaurus files). In this article, we will cover the remaining tasks necessary to implement a full-text index.

Let's start by verifying the existence and configuration of the full-text catalog. ftcAdventureWorks, of the AdventureWorks database, referenced in our earlier example. This can be accomplished by querying sys.fulltext_catalogs system view, whose rows represent individual catalogs in the current database. Based on its content, you should be able to determine their name, path, default status (is_default column), as well as accent sensitivity (is_accent_sensitivity_on column). Assuming that the verification is successful, you should be able to create a full-text index, providing that all other prerequisites are satisfied. In particular (as mentioned before), in order to proceed, the target column or columns need to be of char, nvarchar, varchar, image, or varbinary(max) data type and the table they are part of must have an existing unique (preferably clustered), single column, and non-nullable index (known as the key index). For the purpose of our demonstration, we will use Production.Document table in the AdventureWorks database, targeting its Document and DocumentSummary columns for indexing with PK_Document_DocumentID serving as its key. At this point, we can employ either of the following methods to implement a full text index:

  • Execute the CREATE FULLTEXT INDEX statement while connected to the target database. You will need to specify the name of the table where the indexed column or columns reside (along with their names), as well as the name of an existing, unique, non-null index that will serve as the basis for indexing. You also have to include the TYPE COLUMN clause when indexing columns of varbinary(max) or image datatypes, since parsing them requires the use of specialized filters, whose selection depends on the format of documents stored in them. (That format should be identifiable by the content of a column of char, nchar, varchar, or nvarchar datatype, whose name is designated by the TYPE COLUMN parameter).

    In addition, you have the option to specify a catalog that will host an index (via the ON fulltext_catalog_name clause). If you do not want to rely on the default (which you can assign when the catalog is created), choose the mechanism used to maintain index updates using the WITH CHANGE_TRACKING clause, and identify the language of text stored in the indexed columns. This should be set to 1033 for US English and to 0x0 for language neutral content (you can find a comprehensive listing of localization identifiers in a chart published on the msdn Web site although it might be easier to simply extract it by querying the content of the sys.fulltext_language system view). When the change tracking option is set to either AUTO or MANUAL mode, individual modifications (affecting either columns that are part of the full text index or the one on which this index is based) are recorded and used to update content of a corresponding full text index. Specifics of this update mechanism depend on the selected mode. In particular, with AUTO, changes to indexed columns are queued and propagated to the index automatically (note, however, that this is a background process, operating non-deterministically, so it is not possible to predict when an update will actually be reflected by the index). MANUAL option requires an independent, external mechanism for triggering index population (since scheduling via SQL Server Agent jobs is not possible in SQL Server 2005 Express Edition). This can be accomplished by invoking (interactively or via a Scheduled Task) either sp_fulltext_catalog stored procedure (which, incidentally, has been deprecated and will be phased out in future versions of SQL Server), with 'start_full' or 'start_incremental' parameter (the latter requires the presence of the timestamp data type column in the full text index) or an equivalent ALTER FULLTEXT INDEX statement with START (FULL, INCREMENTAL, or UPDATE) POPULATION parameters (which we will discuss in more detail shortly).

    In our example, we will create a full text index on DocumentSummary (of nvarchar(max) data type) and Document column (of varbinary(max) type, whose content format is identified by the corresponding entry in the FileExtension column). We will also designate the language (US English), use the primary key based on the Document ID column (PK_Document_DocumentID), and store the index in our default catalog (ftcAdventureWorks). We will also enable automatic change tracking (by assigning the AUTO parameter to CHANGE_TRACKING parameter):

    USE AdventureWorks
    GO
    CREATE FULLTEXT INDEX ON Production.Document
    	(DocumentSummary, Document TYPE COLUMN FileExtension
    		LANGUAGE 1033)
    	KEY INDEX PK_Document_DocumentID
    		ON ftcAdventureWorks
    	WITH CHANGE_TRACKING AUTO
    GO
    
  • Use SQL Server Management Studio Express graphical interface. Start by locating the target database (AdventureWorks) and drilling down to the list of its tables to locate Production.Document. Choose the Design entry from its context sensitive menu to display its details. Right-click on the empty area of the Designer pane and select the Fulltext Index... option. In the resulting Full-text Index dialog box, click on the Add command button. This action will trigger the creation of a new index named FullText_for_[Document]. Configure it by assigning appropriate values to each of relevant properties starting with Columns. Clicking on its ellipsis button will display the Full-text Index Columns dialog box. Use the drop-down lists in each of its sections to set Document as Column, FileExtension as Typed by Column, and English (United States) as Language. Repeat the same steps for Document Summary (although note that in this case it is not necessary to specify value of Typed by Column entry). Confirm your selections by clicking on the OK command button. Once you are back to the Full-text Index dialog box, ensure that ctAdventureWorks appears in the Full-text Catalog Name entry. Finally, set the Change Tracking Setting to Automatic and Unique Index Name to PK_Document_DocumentID. After you click on the Close command button, the full-text index with settings identical to the one described above (configured with CREATE FULLTEXT INDEX statement) will be created. As expected, at this point, the Add command button in the Full-text Index dialog box becomes grayed out, since you are limited to a single full-text index per table. (Similarly, trying to execute the CREATE FULLTEXT INDEX statement targeting the same table will trigger an error).

To verify the status of the newly created index, examine the output of the SELECT * FROM sys.fulltext_indexes T-SQL statement. The queried system view provides such information as whether the index is_enabled (set to either 1 or 0 designating, respectively, enabled or disabled), its change_tracking_state (set to either M for manual, A for auto, or O for off), as well as the crawl-related statistics, such as crawl_type (F for full, I for incremental, timestamp-based, U for updates based on notifications, or P for paused full), as well as crawl_start_date and crawl_end_date (identifying, respectively, start and end of the current or most recent crawl).

If you want to modify the properties of an existing full-text index, you can accomplish this by employing the ALTER FULLTEXT INDEX T-SQL statement, which is considerably more flexible than its CREATE counterpart. In particular, in addition to options presented earlier, it allows you to manipulate such properties as index status (via ENABLE and DISABLE parameters, which dictate whether crawling and index population activities should be taking place), to designate columns to be added to or removed from the index (via ADD and DROP parameters, where the former automatically includes new columns in the index population mechanism, unless WITH NO POPULATION clause is specified), as well as to start or stop full or incremental population (with START and STOP parameters followed by FULL, INCREMENTAL, or UPDATE keywords). Concerning that last set of choices, keep in mind that incremental population is applicable only if an indexed table has a column of timestamp datatype.

In the upcoming articles of our series, we will discuss other methods of managing full-text indexes in SQL Server 2005 Express Edition and present a sample scenario demonstrating benefits offered by this technology.

» See All Articles by Columnist Marcin Policht








The Network for Technology Professionals

Search:

About Internet.com

Legal Notices, Licensing, Permissions, Privacy Policy.
Advertise | Newsletters | E-mail Offers