Download the .sql files for this article.
This is part 21 of the “SqlCredit – Developing a Complete SQL Server OLTP Database Project” series. If you have not read part 20,
you can find it here,
but this article does not require any knowledge of previous work in this
series.
Please … I Can Read Books-Online
Many articles have been written about index INCLUDEs, so I don’t
want to cover the same things. My specific interests here are to understand how
index INCLUDEs affect performance and provide a set of recommendations for
when to use them.
First, the Basics – Standard Index Restrictions
Standard indexes have two restrictions that affect their
usefulness: the 900-byte limit and the sixteen-column limit. Here is a relatively simple script that illustrates
these limitations.
Interesting Results from the Script
Creating an index on a 901-character (ASCII) or
451-character (UNICODE) column gives a warning but is allowed. If you actually
attempt to write a 901/451-character string into that column, the INSERT will fail.
Creating an index on a 900-character (ASCII) or
450-character (UNICODE) column gives no warning as expected. If you write a
901/451-character string into that column, the INSERT succeeds with a warning
(“String or binary data would be truncated.”).
Attempting to add an index to a varchar(MAX) or nvarchar(MAX)
column will bring this error: “Column ‘ColAsciiMax’ in table ‘dbo.LimitTest1’
is of a type that is invalid for use as a key column in an index.”
The Basics of Index INCLUDEs
Creating an index with an included column is quite simple:
CREATE NONCLUSTERED INDEX <indexName>
on <tableName> (<KeyColumns>)
INCLUDE (<IncludedColumns>)
Throughout this article and the supporting scripts, I will
use “KeyColumn(s)” to indicate the “the column or
columns on which the index is based” and “IncludedColumn(s)” to indicate “non-key columns to
be added to the leaf level of the nonclustered index.”
In the LimitTest script, I built an index with 23 IncludedColumns.
Two of those columns are of a length that would have caused warnings, and two
of the columns are of a type ([n]varchar(MAX)) that would have caused a normal
index to fail. I inserted 1,000,000 characters into each of the MAX columns to
prove they can hold more than 900 bytes as part of an INCLUDE column. The
execution plan for this simple query on a one-row table shows that the engine
chooses the INCLUDE index over the PK to return the two MAX columns:
SELECT
LEN(ColAsciiMax) AS LenColAsciiMax,
LEN(ColUnicodeMax) AS LenColUnicodeMax
FROM dbo.LimitTest2
Duplicating the Clustered Index or a Part of It
You can duplicate the clustered index key or part or all of
a multi-column clustered key in a non-clustered index with IncludedColumns to
cover frequently-used queries. At first, this may seem useless, but it is a
very interesting way to use INCLUDEs.
These tests are based on this Person table in which I
purposely created wide rows:
CREATE TABLE dbo.Person (
PersonID int NOT NULL IDENTITY,
PersonGUID uniqueidentifier NOT NULL DEFAULT newid(),
FirstName nvarchar(40) NOT NULL,
MiddleName nvarchar(40) NULL,
LastName nvarchar(40) NOT NULL,
Address1 nvarchar(40) NOT NULL,
Address2 nvarchar(40) NULL,
Address3 nvarchar(40) NULL,
Address4 nvarchar(40) NULL,
City nvarchar(40) NOT NULL,
StateProvince nvarchar(40) NOT NULL,
CountryCode char(2) NOT NULL,
PostalCode nvarchar(20) NOT NULL,
LoginName nvarchar(40) NOT NULL,
PasswordHash char(43) NOT NULL,
BirthDate datetime NOT NULL,
Age int NOT NULL,
Filler1 nvarchar(4000) NOT NULL,
Filler2 nvarchar(4000) NOT NULL,
Filler3 nvarchar(4000) NOT NULL,
Filler4 nvarchar(4000) NOT NULL,
CONSTRAINT PK_Person
PRIMARY KEY CLUSTERED (
PersonID
)
)
CREATE NONCLUSTERED INDEX
IX_Person_PersonID_FirstName_LastName_Age_WithIncl
ON dbo.Person (
PersonID
)
INCLUDE (
FirstName,
LastName,
Age
)
For the tests, the table is populated with one million rows.
Say you have an often-used query that selects FirstName, LastName,
and Age based on the PersonID (the clustered index key). Adding an index on PersonID
with IncludedColumns of FirstName, LastName, and Age will allow the optimizer
to use a highly-selective, narrow, covering index to fulfill the
query.
When Will This INCLUDE Index Be Used?
Range Queries (WHERE … BETWEEN …)
For single-row queries, the optimizer consistently uses the
clustered primary key. If I force it to use the INCLUDE index, the plan shows
the same cost for both queries, and STATISTICS IO shows exactly the same
results most of the time. Occasionally it shows one fewer logical read for the
INCLUDE index (depending on the number of rows in the table).
I changed the WHERE clause from
WHERE PersonID = @personID
to
WHERE PersonID BETWEEN @personID AND (@personID + 0)
Using index hints, I forced one version of the query to use
the clustered PK and one to use the INCLUDE index, regardless of what the
optimizer might choose. With the switch detailed above, the performance changed
dramatically. In this “plus zero” query, the “Estimated Number of Rows” jumps
from 1 to 90,000 for the query using the clustered PK. The batch showed the
clustered PK query taking 81% and the INCLUDE query taking just 19%. The
STATISTICS IO results, however, matched exactly.
If the range is changed from zero to 999 (returning 1,000
records), the logical read count from STATISTICS IO is 95 (PK) versus 24
(INCLUDE).
The lesson? When querying for a range of rows, the optimizer
will pick the INCLUDE index regardless of the size of the range.
List Queries (WHERE … IN …)
If the query was changed from
WHERE PersonID = @personID
to
WHERE PersonID IN (@personID)
the optimizer continued to use the clustered PK when there was
just one record in the list. If I forced the INCLUDE index, the statistics and
plans matched exactly (except the index used).
As soon as I added a second entry to the list, the optimizer
chose the INCLUDE index instead of the clustered index. This remained true as
the list grew. When I grew the list to 270 entries (1, 101, 201, …), the
statistics looked like this:
Clustered Index:
Scan count 0, logical reads 1532, physical reads 1, read-ahead reads 279,
lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
INCLUDE Index:
Scan count 0, logical reads 1372, physical reads 1, read-ahead reads 270,
lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
This is not a huge win, but it is a definite improvement.
Called enough times, this could be significant.
Create Multiple Indexes With the Same Keys and Different INCLUDEs?
What if you had a static table of reference data with a
cluster on ColID and three different frequently-called queries that return four
columns each, none of which overlaps?
TableReference1
Clustered PK: ColID
Query 1: Returns Col1, Col2, Col3, Col4 (each varchar40)
Query 2: Returns Col4, Col5, Col6, Col4 (each varchar40)
Query 3: Returns Col7, Col8, Col9, Col4 (each varchar40)
Since this is reference data, you should consider covering
each of these queries. The question is, do you cover all three queries with one
index on ColID that INCLUDEs all nine columns, or do you create three separate
columns on ColID that INCLUDE only three of the columns (one to cover each
query)?
Remember that one of the benefits of using a non-clustered
index with INCLUDEs is that more index entries will fit on a single page. You
will have to consider the cost of multiple indexes against the benefits of
having lighter-weight indexes. Either choice is reasonable, but both should be
considered.
Recommendations
- Indexes with INCLUDEs are a big win for frequently-called range
queries. - If access is always per-record on the clustered index, don’t
add indexes with INCUDEs. - For frequently called relatively-static data, consider using
multiple indexes with INCLUDEs to cover different queries even if the KeyColumns
are the same. - Where you have created covering indexes where all the
columns are in the KeyColumns, consider changing these indexes to have the
minimal set of KeyColumns and the other columns in the IncludedColumns.
This will lower the space requirements and speed the retrieval of
records.
References
Download the .sql files for this article.
If you have questions or comments about INCLUDE indexes or the article please use the forum.