In the recent installments of our series dedicated to the most important
features of SQL Server 2005 Express Edition, we have been covering its
implementation of Full Text Indexing. After we described its basic
characteristics and presented examples illustrating the implementation of file
system-resident catalogs and indexes, which they provide storage for, we turned
our attention to queries that facilitate full-text searches. In the previous
article, we focused on the FREETEXT
predicate and its functional equivalent FREETEXTTABLE
.
Now it is time for the remaining two constructs CONTAINS
and CONTAINSTABLE
.
As we pointed out earlier, FREETEXT
and FREETEXTTABLE
are
relatively easy to use, but they also lack flexibility, which might be needed
when performing searches that are more granular. This need is addressed by the CONTAINS
predicate and the corresponding
CONTAINSTABLE
function.
Unlike their previously discussed counterparts, the CONTAINS
predicate and CONTAINSTABLE
function locate exact,
rather than approximate matches, by default. At the same rate, they accept a
wide range of parameters that can be used to alter and customize this behavior.
In particular, CONTAINS
(in
addition to accepting the name of a column or columns on which the match will
be evaluated, as well as a language identifier that will provide context for
inflections and thesaurus rules) is capable of handing any of the following
search conditions:
-
simple terms – a sequence of characters (typically a word or
group of words) enclosed in single quotes (and an additional set of double
quotes when using multiple word terms) that is to be used to locate exact
matches in target columns (punctuation marks are ignored). A target can be
further refined by employing logical operators, such asAND
,OR
,
andAND NOT
. For example, to
locate all rows that containbicycle
or
componentmountain
,
you would includeCONTAINS (DocumentSummary,
in the
'"bicycle component" OR mountain')WHERE
clause. -
prefix terms – a sequence of characters (one or more words)
followed by a wildcard (enclosed in a pair of single and double quotes) that
are matched against target columns. (Note that if double quotes were missing,
this would become a simple terms condition, resulting in a search for the exact
match, which is bound to fail since asterisks are excluded from indexing). The
search is conducted without taking into account inflections or thesaurus (which
makes it equivalent to a typicalLIKE
query). For example, to identify all rows with words starting withappl
, you would specifyCONTAINS (DocumentSummary, '"appl*"')'
in theWHERE
clause. -
generation terms – a sequence of characters forming the base for
a search that implements inflectional rules or thesaurus entries (indicated by
the presence ofFORMSOF
,INFLECTIONAL
, andTHESAURUS
entries). This facilitates
approximate – rather than exact – matches, equivalent to those performed when
leveragingFREETEXT
andFREETEXTTABLE
constructs. For example,
to locate all rows that contain any inflection of the wordinclude
, you would appendCONTAINS (DocumentSummary, N'FORMSOF (INFLECTIONAL,
in the
include)')WHERE
clause. Similarly, with our already customizedtsENU.xml
thesaurus file, you could useCONTAINS (DocumentSummary, N'FORMSOF (THESAURUS,
in the
bike)')WHERE
clause to find every row containing eitherbike
orbicycle
. -
proximity terms – two or more sequences of characters designating
words or phrases (joined by theNEAR
operator) that are supposed to appear close to each other within the same
column entry. This approach allows you to evaluate degree of proximity, rather
than mere presence (search will not return a match if targeted words are too
far apart). For example, to find rows whererepair
andinstructions
are located
near each other, you would includeCONTAINS
in the
(DocumentSummary, 'repair NEAR instructions')WHERE
clause. -
weighed terms – assigns an arbitrary preference value (a weighed
number between 0 and 1) that affects ranking calculated when executingCONTAINSTABLE
function. While this
search condition is allowed inCONTAINS
predicate, it does not impact its outcome. Its syntax utilizes theISABOUT
operator along withWEIGHT
keywords associating individual
terms with their respective weighted values. For example, when searching for
information on different bicycle parts, you would be able to indicate your
preferences by specifying'ISABOUT (reflectors
as part of
WEIGHT (.6), pedals WEIGHT (.5), seats WEIGHT (.4))'
theCONTAINS
predicate orCONTAINSTABLE
function.
CONTAINSTABLE
is a function, which utilizes the same set of search conditions implemented by the
CONTAINS
predicate. At the
same rate, it shares its basic syntax and format of its result set (i.e. a
table consisting of KEY
and RANK
columns for each row where a match
to a search string has been found) with FREETEXTTABLE
function. For example, a query equivalent to the one we posted in our previous
article that returns all rows in the DocumentSummary
column that contain the word bike
(including its inflections and synonyms defined in the thesaurus file), along
with their respective rank and key values would take the form below. (Note that
in this case, you also have the ability to limit the result set to a specific
number of top matches only, by specifying that number as the last parameter of CONTAINSTABLE
function).
SELECT f.*, d.DocumentSummary
FROM CONTAINSTABLE(Production.Document, DocumentSummary, N’FORMSOF (THESAURUS, bike)’, LANGUAGE 1033) f
INNER JOIN Production.Document d
ON f.[KEY] = d.DocumentID
To further explore the capabilities of CONTAINSTABLE
,
let’s take a look at an example demonstrating the use of weighed terms. Our
first query lists DocumentSummary
entries, which contain reflectors
,
pedals
, or seats
words by leveraging the OR
logical operator. The outcome
includes the content of the DocumentSummary
column, along with corresponding KEY
and RANK
values.
SELECT f.*, d.DocumentSummary
FROM CONTAINSTABLE(Production.Document, DocumentSummary, N’reflectors OR pedals OR seats’) f
INNER JOIN Production.Document d
ON f.[KEY] = d.DocumentID
ORDER BY f.RANK
The ranking values reflect the fact that the first of them appears most
frequently and the last one is least common. (Note that even though the order
of ranking is the same as the one we obtained by running an equivalent query
using FREETEXTTABLE
function
in the previous article of our series, their respective values are different,
since ranking algorithms in both cases are different). This time, we have the ability
to assign arbitrary weight values that reflect our preferences. This is
accomplished by employing the ISABOUT
and WEIGHT
constructs we
described earlier:
SELECT f.*, d.DocumentSummary
FROM CONTAINSTABLE(Production.Document, DocumentSummary,
N’ISABOUT (seats WEIGHT(.8), pedals WEIGHT(.1), reflectors WEIGHT(.3))’) f
INNER JOIN Production.Document d
ON f.[KEY] = d.DocumentID
ORDER BY f.RANK
This concludes our discussion regarding Full Text indexing functionality
available in SQL Server 2005 Express Edition.