In the previous
installment of our series dedicated to the most important features of SQL
Server 2005 Express Edition, we have presented the basic concepts and
characteristics of Full Text Indexing. We have also described the initial steps
leading to creation of a file system-based catalog, intended for hosting
index-related data along with additional components that affect indexing and
searching behavior (such as noise word and thesaurus files). In this article,
we will cover the remaining tasks necessary to implement a full-text index.
Let’s start by verifying the existence and configuration of the full-text
catalog. ftcAdventureWorks,
of the AdventureWorks
database, referenced in our earlier example. This can be accomplished by
querying sys.fulltext_catalogs
system view, whose rows represent individual catalogs in the current database.
Based on its content, you should be able to determine their name
, path
,
default status (is_default
column), as well as accent sensitivity (is_accent_sensitivity_on
column). Assuming that the verification is successful, you should be able to
create a full-text index, providing that all other prerequisites are satisfied.
In particular (as mentioned before), in order to proceed, the target column or
columns need to be of char
, nvarchar
, varchar
, image
,
or varbinary(max)
data type
and the table they are part of must have an existing unique (preferably
clustered), single column, and non-nullable index (known as the key index). For
the purpose of our demonstration, we will use Production.Document
table in the AdventureWorks
database, targeting its Document
and DocumentSummary
columns for indexing
with PK_Document_DocumentID
serving as its key. At this point, we can employ either of the following
methods to implement a full text index:
-
Execute the
CREATE FULLTEXT
statement while connected to the target database. You will
INDEX
need to specify the name of the table where the indexed column or columns
reside (along with their names), as well as the name of an existing, unique,
non-null index that will serve as the basis for indexing. You also have to
include theTYPE COLUMN
clause when indexing columns ofvarbinary(max)
orimage
datatypes, since
parsing them requires the use of specialized filters, whose selection depends
on the format of documents stored in them. (That format should be identifiable
by the content of a column ofchar
,
nchar
,varchar
, ornvarchar
datatype, whose name is
designated by theTYPE COLUMN
parameter).In addition, you have the option to specify a
catalog that will host an index (via theON
clause). If you do not want to rely on the
fulltext_catalog_name
default (which you can assign when the catalog is created), choose the
mechanism used to maintain index updates using theWITH CHANGE_TRACKING
clause, and
identify the language of text stored in the indexed columns. This should be set
to1033
forUS English
and to0x0
for language neutral content (you
can find a comprehensive listing of localization identifiers in a chart
published on the msdn
Web site although it might be easier to simply extract it by querying the content
of thesys.fulltext_language
system view). When the change tracking option is set to eitherAUTO
orMANUAL
mode, individual modifications (affecting either
columns that are part of the full text index or the one on which this index is
based) are recorded and used to update content of a corresponding full text
index. Specifics of this update mechanism depend on the selected mode. In
particular, withAUTO
,
changes to indexed columns are queued and propagated to the index automatically
(note, however, that this is a background process, operating
non-deterministically, so it is not possible to predict when an update will
actually be reflected by the index).MANUAL
option requires an independent, external mechanism for triggering index
population (since scheduling via SQL Server Agent jobs is not possible in SQL
Server 2005 Express Edition). This can be accomplished by invoking
(interactively or via a Scheduled Task) eithersp_fulltext_catalog
stored procedure (which,
incidentally, has been deprecated and will be phased out in future versions of
SQL Server), with'start_full'
or'start_incremental'
parameter (the latter requires the presence of thetimestamp
data type column in the full
text index) or an equivalentALTER
statement with
FULLTEXT INDEXSTART
(FULL
,INCREMENTAL
, orUPDATE
)POPULATION
parameters (which we will discuss in more
detail shortly).In our example, we will create a full text index on
DocumentSummary
(ofnvarchar(max)
data type) andDocument
column (ofvarbinary(max)
type, whose content
format is identified by the corresponding entry in theFileExtension
column). We will also
designate the language (US English
),
use the primary key based on theDocument
column (
IDPK_Document_DocumentID
),
and store the index in our default catalog (ftcAdventureWorks
).
We will also enable automatic change tracking (by assigning theAUTO
parameter toCHANGE_TRACKING
parameter):
USE AdventureWorks
GO
CREATE FULLTEXT INDEX ON Production.Document
(DocumentSummary, Document TYPE COLUMN FileExtension
LANGUAGE 1033)
KEY INDEX PK_Document_DocumentID
ON ftcAdventureWorks
WITH CHANGE_TRACKING AUTO
GO
-
Use SQL Server Management Studio Express graphical interface.
Start by locating the target database (AdventureWorks
)
and drilling down to the list of its tables to locateProduction.Document
. Choose theDesign
entry from its context sensitive
menu to display its details. Right-click on the empty area of the Designer pane
and select theFulltext Index...
option. In the resultingFull-text Index
dialog box, click on theAdd
command button. This action will trigger the creation of a new index namedFullText_for_[Document]
. Configure it by
assigning appropriate values to each of relevant properties starting withColumns
. Clicking on its ellipsis button
will display theFull-text Index Columns
dialog box. Use the drop-down lists in each of its sections to setDocument
asColumn
,FileExtension
asTyped
, and
by ColumnEnglish (United
as
States)Language
.
Repeat the same steps forDocument Summary
(although note that in this case it is not necessary to specify value ofTyped by Column
entry). Confirm your
selections by clicking on theOK
command button. Once you are back to theFull-text
dialog box, ensure that
IndexctAdventureWorks
appears in theFull-text Catalog Name
entry. Finally, set theChange Tracking
to
SettingAutomatic
andUnique Index Name
toPK_Document_DocumentID
. After you click
on theClose
command button,
the full-text index with settings identical to the one described above
(configured withCREATE FULLTEXT INDEX
statement) will be created. As expected, at this point, theAdd
command button in theFull-text Index
dialog box becomes
grayed out, since you are limited to a single full-text index per table. (Similarly,
trying to execute theCREATE FULLTEXT
statement targeting the same table will trigger an error).
INDEX
To verify the status of the newly created index, examine the output of the SELECT * FROM sys.fulltext_indexes
T-SQL
statement. The queried system view provides such information as whether the
index is_enabled
(set to
either 1
or 0
designating, respectively, enabled or
disabled), its change_tracking_state
(set to either M
for manual,
A
for auto, or O
for off), as well as the crawl-related
statistics, such as crawl_type
(F
for full, I
for incremental, timestamp-based, U
for updates based on notifications, or
P
for paused full), as well
as crawl_start_date
and crawl_end_date
(identifying, respectively,
start and end of the current or most recent crawl).
If you want to modify the properties of an existing full-text index, you can
accomplish this by employing the ALTER
T-SQL statement, which is considerably more
FULLTEXT INDEX
flexible than its CREATE
counterpart. In particular, in addition to options presented earlier, it allows
you to manipulate such properties as index status (via ENABLE
and DISABLE
parameters, which dictate whether crawling and
index population activities should be taking place), to designate columns to be
added to or removed from the index (via ADD
and DROP
parameters, where
the former automatically includes new columns in the index population
mechanism, unless WITH NO POPULATION
clause is specified), as well as to start or stop full or incremental
population (with START
and STOP
parameters followed by FULL
, INCREMENTAL
,
or UPDATE
keywords). Concerning
that last set of choices, keep in mind that incremental population is
applicable only if an indexed table has a column of timestamp
datatype.
In the upcoming articles of our series, we will discuss other methods of
managing full-text indexes in SQL Server 2005 Express Edition and present a
sample scenario demonstrating benefits offered by this technology.