Full Text Index Implementation in SQL Server 2005 Express Edition

In the previous
installment
of our series dedicated to the most important features of SQL
Server 2005 Express Edition, we have presented the basic concepts and
characteristics of Full Text Indexing. We have also described the initial steps
leading to creation of a file system-based catalog, intended for hosting
index-related data along with additional components that affect indexing and
searching behavior (such as noise word and thesaurus files). In this article,
we will cover the remaining tasks necessary to implement a full-text index.

Let’s start by verifying the existence and configuration of the full-text
catalog. ftcAdventureWorks,
of the AdventureWorks
database, referenced in our earlier example. This can be accomplished by
querying sys.fulltext_catalogs
system view, whose rows represent individual catalogs in the current database.
Based on its content, you should be able to determine their name, path,
default status (is_default
column), as well as accent sensitivity (is_accent_sensitivity_on
column). Assuming that the verification is successful, you should be able to
create a full-text index, providing that all other prerequisites are satisfied.
In particular (as mentioned before), in order to proceed, the target column or
columns need to be of char, nvarchar, varchar, image,
or varbinary(max) data type
and the table they are part of must have an existing unique (preferably
clustered), single column, and non-nullable index (known as the key index). For
the purpose of our demonstration, we will use Production.Document table in the AdventureWorks database, targeting its Document and DocumentSummary columns for indexing
with PK_Document_DocumentID
serving as its key. At this point, we can employ either of the following
methods to implement a full text index:

  • Execute the CREATE FULLTEXT
    INDEX
    statement while connected to the target database. You will
    need to specify the name of the table where the indexed column or columns
    reside (along with their names), as well as the name of an existing, unique,
    non-null index that will serve as the basis for indexing. You also have to
    include the TYPE COLUMN
    clause when indexing columns of varbinary(max)
    or image datatypes, since
    parsing them requires the use of specialized filters, whose selection depends
    on the format of documents stored in them. (That format should be identifiable
    by the content of a column of char,
    nchar, varchar, or nvarchar datatype, whose name is
    designated by the TYPE COLUMN
    parameter).

    In addition, you have the option to specify a
    catalog that will host an index (via the ON
    fulltext_catalog_name
    clause). If you do not want to rely on the
    default (which you can assign when the catalog is created), choose the
    mechanism used to maintain index updates using the WITH CHANGE_TRACKING clause, and
    identify the language of text stored in the indexed columns. This should be set
    to 1033 for US English and to 0x0 for language neutral content (you
    can find a comprehensive listing of localization identifiers in a chart
    published on the msdn
    Web site although it might be easier to simply extract it by querying the content
    of the sys.fulltext_language
    system view). When the change tracking option is set to either AUTO or MANUAL mode, individual modifications (affecting either
    columns that are part of the full text index or the one on which this index is
    based) are recorded and used to update content of a corresponding full text
    index. Specifics of this update mechanism depend on the selected mode. In
    particular, with AUTO,
    changes to indexed columns are queued and propagated to the index automatically
    (note, however, that this is a background process, operating
    non-deterministically, so it is not possible to predict when an update will
    actually be reflected by the index). MANUAL
    option requires an independent, external mechanism for triggering index
    population (since scheduling via SQL Server Agent jobs is not possible in SQL
    Server 2005 Express Edition). This can be accomplished by invoking
    (interactively or via a Scheduled Task) either sp_fulltext_catalog stored procedure (which,
    incidentally, has been deprecated and will be phased out in future versions of
    SQL Server), with 'start_full'
    or 'start_incremental'
    parameter (the latter requires the presence of the timestamp data type column in the full
    text index) or an equivalent ALTER
    FULLTEXT INDEX
    statement with START
    (FULL, INCREMENTAL, or UPDATE) POPULATION parameters (which we will discuss in more
    detail shortly).

    In our example, we will create a full text index on
    DocumentSummary (of nvarchar(max) data type) and Document column (of varbinary(max) type, whose content
    format is identified by the corresponding entry in the FileExtension column). We will also
    designate the language (US English),
    use the primary key based on the Document
    ID
    column (PK_Document_DocumentID),
    and store the index in our default catalog (ftcAdventureWorks).
    We will also enable automatic change tracking (by assigning the AUTO parameter to CHANGE_TRACKING parameter):


    USE AdventureWorks
    GO
    CREATE FULLTEXT INDEX ON Production.Document
    (DocumentSummary, Document TYPE COLUMN FileExtension
    LANGUAGE 1033)
    KEY INDEX PK_Document_DocumentID
    ON ftcAdventureWorks
    WITH CHANGE_TRACKING AUTO
    GO
  • Use SQL Server Management Studio Express graphical interface.
    Start by locating the target database (AdventureWorks)
    and drilling down to the list of its tables to locate Production.Document. Choose the Design entry from its context sensitive
    menu to display its details. Right-click on the empty area of the Designer pane
    and select the Fulltext Index...
    option. In the resulting Full-text Index
    dialog box, click on the Add
    command button. This action will trigger the creation of a new index named FullText_for_[Document]. Configure it by
    assigning appropriate values to each of relevant properties starting with Columns. Clicking on its ellipsis button
    will display the Full-text Index Columns
    dialog box. Use the drop-down lists in each of its sections to set Document as Column, FileExtension as Typed
    by Column
    , and English (United
    States)
    as Language.
    Repeat the same steps for Document Summary
    (although note that in this case it is not necessary to specify value of Typed by Column entry). Confirm your
    selections by clicking on the OK
    command button. Once you are back to the Full-text
    Index
    dialog box, ensure that ctAdventureWorks
    appears in the Full-text Catalog Name
    entry. Finally, set the Change Tracking
    Setting
    to Automatic
    and Unique Index Name to PK_Document_DocumentID. After you click
    on the Close command button,
    the full-text index with settings identical to the one described above
    (configured with CREATE FULLTEXT INDEX
    statement) will be created. As expected, at this point, the Add command button in the Full-text Index dialog box becomes
    grayed out, since you are limited to a single full-text index per table. (Similarly,
    trying to execute the CREATE FULLTEXT
    INDEX
    statement targeting the same table will trigger an error).

To verify the status of the newly created index, examine the output of the SELECT * FROM sys.fulltext_indexes T-SQL
statement. The queried system view provides such information as whether the
index is_enabled (set to
either 1 or 0 designating, respectively, enabled or
disabled), its change_tracking_state
(set to either M for manual,
A for auto, or O for off), as well as the crawl-related
statistics, such as crawl_type
(F for full, I for incremental, timestamp-based, U for updates based on notifications, or
P for paused full), as well
as crawl_start_date and crawl_end_date (identifying, respectively,
start and end of the current or most recent crawl).

If you want to modify the properties of an existing full-text index, you can
accomplish this by employing the ALTER
FULLTEXT INDEX
T-SQL statement, which is considerably more
flexible than its CREATE
counterpart. In particular, in addition to options presented earlier, it allows
you to manipulate such properties as index status (via ENABLE and DISABLE parameters, which dictate whether crawling and
index population activities should be taking place), to designate columns to be
added to or removed from the index (via ADD
and DROP parameters, where
the former automatically includes new columns in the index population
mechanism, unless WITH NO POPULATION
clause is specified), as well as to start or stop full or incremental
population (with START and STOP parameters followed by FULL, INCREMENTAL,
or UPDATE keywords). Concerning
that last set of choices, keep in mind that incremental population is
applicable only if an indexed table has a column of timestamp datatype.

In the upcoming articles of our series, we will discuss other methods of
managing full-text indexes in SQL Server 2005 Express Edition and present a
sample scenario demonstrating benefits offered by this technology.

»


See All Articles by Columnist
Marcin Policht

Marcin Policht
Marcin Policht
Being a long time reader, I'm proud to join the crowd of technology gurus gathered here. I have a fair share of Microsoft exams behind me, a couple of acronyms after my name - MCSE, MCSD, and MCT, decent familiarity with majority of MS BackOffice products (SMS, SQL, Exchange, IIS), programming and scripting languages (VB, C++, VBScript with wsh) and several years of practical experience with Windows environment administration and engineering. My focus these days is on SQL Server and Windows 2000, and I'll attempt to share the most interesting experiences with these products.

Latest Articles