In the recent installment of our series dedicated to the most important
features of SQL Server 2005 Express Edition, we have been focusing on its Full
Text Indexing capabilities. So far, we have covered its basic
characteristics as well as described the steps involved in implementing file-based
catalog and the indexes hosted by it (the latter was accomplished by leveraging
CREATE FULLTEXT INDEX
and ALTER FULLTEXT INDEX
statements). In
this article, we will cover other full text index management methods, leaving
discussion about a unique syntax of index-based searches until our next
encounter.
SQL Server 2005 Express Edition provides an alternative approach to creating
or modifying full text indexes from the one we have presented earlier, which
involves the use of sp_fulltext_table
stored procedure. (You should, however, keep in mind that this stored procedure
is being deprecated in favor of CREATE
,
ALTER
, and DROP
FULLTEXT
constructs), whose behavior is controlled by its
INDEX@action
parameter (the other three identify,
respectively, names of the indexed table, full-text catalog hosting it, and a
unique, non-null column on which the full-text index is based). Depending on
its value, you can use it to:
-
create metadata for a new full-text index (with
@action='Create'
). Note that in this
case, all parameters are mandatory. It is important to note that no index-based
searches can be conducted until you add a column or columns to be indexed and
populate the index. The first of these tasks is typically accomplished by
invokingsp_fulltext_column
stored procedure, which takes as its arguments the names of a target table (@tabname
), a column to be indexed (@colname
), the desired action to be
carried out (@action
set to
'add'
), language code (@language
), and a column identifying the
document type when referencingimage
orvarbinary(max)
data types.
(For example, in our case the resulting syntax would take the formsp_fulltext_column @tabname='Production.Document', @colname='Document',
).
@action='add', @language='1033', @type_colname='FileExtension'
In order to determine the list of data formats (and their version) supported by
your SQL Server installation, examine the output ofSELECT * FROM sys.fulltext_document_types
query. Carrying out the second task can be done by taking advantage of other
action types of thesp_fulltext_table
stored procedure. -
drop an existing full-text index (with
@action='Drop'
). The only required
parameter is the table name (since there is a limit of a single full-text index
per table). -
activate or deactivate an existing full text index (respectively,
with@action='Activate'
and@action='Deactivate'
). The current state
can be viewed by examining theis_enabled
column in thesys.fulltext_indexes
system view. Activation is typically not necessary, since it takes place
automatically when the first column is added to an index, however, it might be
needed if that index has been previously deactivated. Deactivation allows you
to prevent an index from being used without the need for dropping it. -
start or stop change tracking (respectively, with
@action='start_change_tracking'
and@action='stop_change_tracking'
). The
former sets the change tracking state toMANUAL
and keeps tab of all future changes to both indexed and key columns. The
latter, deletes the record of these changes and resets the change tracking
state toOFF
. Change
tracking is beneficial when indexed data is moderately dynamic. -
update an index (with
@action='update_index'
).
The update is triggered immediately following invocation of the stored
procedure and takes advantage of the change tracking mechanism (which must be
enabled to successfully execute this statement). -
start or stop background update (respectively, with
@action='start_background_updateindex'
and@action='stop_background_updateindex'
).
This initiates or terminates change tracking and background index population.
The first of these options sets the change tracking toAUTO
, while the latter, reverts it toMANUAL
. -
start or stop full or incremental population of an index
(respectively with@action='start_full'
,
@action='start_incremental'
,
and@action='stopl'
),
without regard for change tracking mechanism. Note that the ability to initiate
incremental population is dependent on the presence of atimestamp
column in the indexed table.
Full population (with change tracking disabled) is typically a preferred
mechanism if changes to indexed columns are frequent and widespread. On the
other hand, an incremental approach (without change tracking) is recommended
when the volume of modifications is low.
As we have mentioned earlier, the algorithm that is used to populate
full-text indexes takes into consideration additional unique factors, not
included in the standard SQL Server indexing methodology. Among them, the one
that most profoundly impacts the efficiency of searches involves the use of
noise words. This term designates common, primarily supportive words (such as pronouns,
prepositions, conjunctions, or interjections), which tend to occur so
frequently and without particular distinguishing significance that their
indexing would be essentially meaningless, contributing only to bloated catalog
size and decreased performance. Noise words are language specific and are stored
in a file named noiseXXX.txt
,
where XXX
designates
a three letter language code (for example, noiseENU.txt
is intended for US English entries), residing in Program FilesMicrosoft SQL ServerMSSQL.xMSSQLFTData
folder, where x
designates an integer assigned to a local SQL Server 2005 Express Edition
instance. (This location can be identified by checking the value of the NoiseFile
registry entry of the REG_SZ
datatype under HKLMSOFTWAREMicrosoftMicrosoft SQL ServerMSSQL.xMSSearchLanguageeng
key).
Sometimes it might make sense to alter the default content of the noise
words file (for example, in order to avoid indexing terms customary to your
business, which tend to appear more frequently across your data). In order to
accomplish this, simply add new (or, if appropriate, remove any present) words
by editing the text file. Keep in mind, however, that such changes have to be
followed by full repopulation of existing indexes. This can be accomplished on a
per index-basis (by leveraging the methods described above) or en masse for all
indexes contained within a given catalog by invoking the ALTER FULLTEXT CATALOG
statement with REBUILD
parameter. In addition, note
that the same set of noise words is shared across all catalogs associated with
the same instance of SQL Server 2005 Express Edition (this restriction has been
eliminated in SQL Server 2008 through introduction of stop lists).
Another factor that affects full-text search behavior is the use of
thesaurus. This feature, relying on an XML formatted document named tsXXX.xml
(where XXX
designates a three letter
language code) along with its tsGLOBAL.xml
counterpart (both residing in the same Program
folder as
FilesMicrosoft SQL ServerMSSQL.xMSSQLFTData
the noise word files) gives you the ability to define synonyms, which in turn
facilitate more intelligent queries, giving you the ability to locate phrases
with an equivalent meaning but different wording. In order to customize its
content, you need to first familiarize yourself with its syntax. By default,
the majority of it is commented out. Before removing the comments, decide
whether thesaurus-based searches should be accent sensitive or not (this
involves setting the value of <diacritics_sensitive>
element to either 1
or 0
). Next, consider that synonyms are
specified by employing either <expansion>
or <replacement>
sets.
The first of them associates similar terms (enclosed within <sub>
elements) with each other,
such that searches yield content that includes any of them. For example, by
implementing the sample tsENU.xml
file, which contains <sub>
elements within <expansion>
for Internet Explorer
, IE
, and IE5
, a search for any one of them would return every row
that contains at least one of them (as long as the query syntax indicates that
thesaurus should be used). <replacement>
on the other hand, uses a value within the <sub>
element to substitute every pattern specified within the <pat>
elements (effectively, only
rows containing the former are included in the result set). For example, if you
refer to the same sample file tsENU.xml
,
you will find there a single <sub>
element with Windows 2000
assigned to it, along with two <pat>
elements, set to NT5
and W2K
. This implies that any query for
either of these two values would actually return rows containing their
substitute. Note that, as with the noise word modifications, any changes to
thesaurus files do not take effect until you fully rebuild all existing
indexes. In addition, keep in mind that such changes should be saved in Unicode
when using a text editor.
In our next article of this series, we will conclude our discussion of
Full-Text Index technology implemented in SQL Server 2005 Express Edition by
presenting syntax of search queries and providing a few examples demonstrating
their power and flexibility.