Exploring SQL Server's Index INCLUDEs

November 28, 2008

Download the .sql files for this article.

This is part 21 of the "SqlCredit – Developing a Complete SQL Server OLTP Database Project" series. If you have not read part 20, you can find it here, but this article does not require any knowledge of previous work in this series.

Please ... I Can Read Books-Online

Many articles have been written about index INCLUDEs, so I don’t want to cover the same things. My specific interests here are to understand how index INCLUDEs affect performance and provide a set of recommendations for when to use them.

First, the Basics – Standard Index Restrictions

Standard indexes have two restrictions that affect their usefulness: the 900-byte limit and the sixteen-column limit. Here is a relatively simple script that illustrates these limitations.

Interesting Results from the Script

Creating an index on a 901-character (ASCII) or 451-character (UNICODE) column gives a warning but is allowed. If you actually attempt to write a 901/451-character string into that column, the INSERT will fail.

Creating an index on a 900-character (ASCII) or 450-character (UNICODE) column gives no warning as expected. If you write a 901/451-character string into that column, the INSERT succeeds with a warning (“String or binary data would be truncated.”).

Attempting to add an index to a varchar(MAX) or nvarchar(MAX) column will bring this error: “Column 'ColAsciiMax' in table 'dbo.LimitTest1' is of a type that is invalid for use as a key column in an index.”

The Basics of Index INCLUDEs

Creating an index with an included column is quite simple:

    CREATE NONCLUSTERED INDEX <indexName>
    on <tableName> (<KeyColumns>)
    INCLUDE (<IncludedColumns>)

Throughout this article and the supporting scripts, I will use “KeyColumn(s)” to indicate the “the column or columns on which the index is based” and “IncludedColumn(s)” to indicate “non-key columns to be added to the leaf level of the nonclustered index.

In the LimitTest script, I built an index with 23 IncludedColumns. Two of those columns are of a length that would have caused warnings, and two of the columns are of a type ([n]varchar(MAX)) that would have caused a normal index to fail. I inserted 1,000,000 characters into each of the MAX columns to prove they can hold more than 900 bytes as part of an INCLUDE column. The execution plan for this simple query on a one-row table shows that the engine chooses the INCLUDE index over the PK to return the two MAX columns:

    SELECT
        LEN(ColAsciiMax)   AS LenColAsciiMax,
        LEN(ColUnicodeMax) AS LenColUnicodeMax
    FROM dbo.LimitTest2

Duplicating the Clustered Index or a Part of It

You can duplicate the clustered index key or part or all of a multi-column clustered key in a non-clustered index with IncludedColumns to cover frequently-used queries. At first, this may seem useless, but it is a very interesting way to use INCLUDEs.

These tests are based on this Person table in which I purposely created wide rows:

    CREATE TABLE dbo.Person (
        PersonID      int              NOT NULL  IDENTITY,
        PersonGUID    uniqueidentifier NOT NULL  DEFAULT newid(),
        FirstName     nvarchar(40)     NOT NULL,
        MiddleName    nvarchar(40)     NULL,
        LastName      nvarchar(40)     NOT NULL,
        Address1      nvarchar(40)     NOT NULL,
        Address2      nvarchar(40)     NULL,
        Address3      nvarchar(40)     NULL,
        Address4      nvarchar(40)     NULL,
        City          nvarchar(40)     NOT NULL,
        StateProvince nvarchar(40)     NOT NULL,
        CountryCode   char(2)          NOT NULL,
        PostalCode    nvarchar(20)     NOT NULL,
        LoginName     nvarchar(40)     NOT NULL,
        PasswordHash  char(43)         NOT NULL,
        BirthDate     datetime         NOT NULL,
        Age           int              NOT NULL,
        Filler1       nvarchar(4000)   NOT NULL,
        Filler2       nvarchar(4000)   NOT NULL,
        Filler3       nvarchar(4000)   NOT NULL,
        Filler4       nvarchar(4000)   NOT NULL,
        CONSTRAINT PK_Person
        PRIMARY KEY CLUSTERED (
            PersonID
        )
    )
    CREATE NONCLUSTERED INDEX
    IX_Person_PersonID_FirstName_LastName_Age_WithIncl
    ON dbo.Person (
        PersonID
    )
    INCLUDE (
        FirstName,
        LastName,
        Age
    )

For the tests, the table is populated with one million rows.

Say you have an often-used query that selects FirstName, LastName, and Age based on the PersonID (the clustered index key). Adding an index on PersonID with IncludedColumns of FirstName, LastName, and Age will allow the optimizer to use a highly-selective, narrow, covering index to fulfill the query.

When Will This INCLUDE Index Be Used?

Range Queries (WHERE ... BETWEEN ...)

For single-row queries, the optimizer consistently uses the clustered primary key. If I force it to use the INCLUDE index, the plan shows the same cost for both queries, and STATISTICS IO shows exactly the same results most of the time. Occasionally it shows one fewer logical read for the INCLUDE index (depending on the number of rows in the table).

I changed the WHERE clause from

    WHERE PersonID = @personID

to

    WHERE PersonID BETWEEN @personID AND (@personID + 0)

Using index hints, I forced one version of the query to use the clustered PK and one to use the INCLUDE index, regardless of what the optimizer might choose. With the switch detailed above, the performance changed dramatically. In this “plus zero” query, the “Estimated Number of Rows” jumps from 1 to 90,000 for the query using the clustered PK. The batch showed the clustered PK query taking 81% and the INCLUDE query taking just 19%. The STATISTICS IO results, however, matched exactly.

If the range is changed from zero to 999 (returning 1,000 records), the logical read count from STATISTICS IO is 95 (PK) versus 24 (INCLUDE).

The lesson? When querying for a range of rows, the optimizer will pick the INCLUDE index regardless of the size of the range.

List Queries (WHERE ... IN ...)

If the query was changed from

    WHERE PersonID = @personID

to

    WHERE PersonID IN (@personID)

the optimizer continued to use the clustered PK when there was just one record in the list. If I forced the INCLUDE index, the statistics and plans matched exactly (except the index used).

As soon as I added a second entry to the list, the optimizer chose the INCLUDE index instead of the clustered index. This remained true as the list grew. When I grew the list to 270 entries (1, 101, 201, ...), the statistics looked like this:

    Clustered Index:
    Scan count 0, logical reads 1532, physical reads 1, read-ahead reads 279, 
        lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    INCLUDE Index:
    Scan count 0, logical reads 1372, physical reads 1, read-ahead reads 270, 
        lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

This is not a huge win, but it is a definite improvement. Called enough times, this could be significant.

Create Multiple Indexes With the Same Keys and Different INCLUDEs?

What if you had a static table of reference data with a cluster on ColID and three different frequently-called queries that return four columns each, none of which overlaps?

TableReference1
Clustered PK: ColID
Query 1: Returns Col1, Col2, Col3, Col4 (each varchar40)
Query 2: Returns Col4, Col5, Col6, Col4 (each varchar40)
Query 3: Returns Col7, Col8, Col9, Col4 (each varchar40)

Since this is reference data, you should consider covering each of these queries. The question is, do you cover all three queries with one index on ColID that INCLUDEs all nine columns, or do you create three separate columns on ColID that INCLUDE only three of the columns (one to cover each query)?

Remember that one of the benefits of using a non-clustered index with INCLUDEs is that more index entries will fit on a single page. You will have to consider the cost of multiple indexes against the benefits of having lighter-weight indexes. Either choice is reasonable, but both should be considered.

Recommendations

  • Indexes with INCLUDEs are a big win for frequently-called range queries.
  • If access is always per-record on the clustered index, don’t add indexes with INCUDEs.
  • For frequently called relatively-static data, consider using multiple indexes with INCLUDEs to cover different queries even if the KeyColumns are the same.
  • Where you have created covering indexes where all the columns are in the KeyColumns, consider changing these indexes to have the minimal set of KeyColumns and the other columns in the IncludedColumns. This will lower the space requirements and speed the retrieval of records.

References

  • Books-Online
  • CREATE INDEX (2005) (2008)
  • Index with Included Columns (2005) (2008)

Download the .sql files for this article.

If you have questions or comments about INCLUDE indexes or the article please use the forum.

» See All Articles by Columnist Rob Garrison

SqlCredit - Developing a Complete SQL Server OLTP Database Project








The Network for Technology Professionals

Search:

About Internet.com

Legal Notices, Licensing, Permissions, Privacy Policy.
Advertise | Newsletters | E-mail Offers