Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

MS SQL

Posted Jan 25, 2008

SqlCredit - Part 12: Exploring Indexed Persisted Computed Columns

By Rob Garrison

Download files for part 12 of this series.

This is part 12 of a series. If you have not read part 11, you can find it here.

SQL Server 2005 adds the option of “persisting” a computed column. Prior to SQL Server 2005, you could create a computed column, but it would not be stored on disk and could not be indexed. If a computed column was not persisted, it would be recalculated each time it is accessed, so you have the choice between paying the cost once at INSERT, UPDATE, or DELETE time (persisted) or possibly many times at SELECT time (not persisted). Either is a reasonable choice. It depends on your unique situation.

However, if you would like to index a computed column, you have to use SQL Server 2005 (or higher) and you have to persist the computed column.

As is often the case, there is much information available about the basics of persisted computed columns in 2005 but very little that goes deeper. So, I will quickly hit the basics of persisted computed columns and then move on to performance comparisons and other interesting details.

The Basics

The idea here is that we would like to categorize purchases and be able to quickly look up purchases that are in a certain category.

Category

Amount Range

S

$0 to $100

M

>$100 to $1,000

L

>$1,000 to $10,000

XL

>$10,000 to $100,000

2X

>$100,000 up

Here is the structure of the standard Purchase table for reference:

 CREATE TABLE dbo.Purchase (
    PurchaseID   int       NOT NULL  IDENTITY,
    CardID       int       NOT NULL,
    VendorID     int       NOT NULL,
    Amount       money     NOT NULL,
    DateCreated  datetime  NOT NULL
       CONSTRAINT Purchase_DateCreated_DF  DEFAULT GETUTCDATE(),
 
    CONSTRAINT Purchase_PK PRIMARY KEY CLUSTERED (PurchaseID),
    CONSTRAINT Purchase_Card_FK FOREIGN KEY (CardID) 
       REFERENCES dbo.Card (CardID),
    CONSTRAINT Purchase_Vendor_FK FOREIGN KEY (VendorID) 
       REFERENCES dbo.Vendor (VendorID)
 );

There will be two new versions plus the base version. For the basic persisted computed columns table (PurchasePCC), we add this below the DateCreated column:

 PurchaseCategory AS (
  CASE
   WHEN Amount > 100000 THEN '2X'
   WHEN Amount >  10000 THEN 'XL'
   WHEN Amount >   1000 THEN 'L'
   WHEN Amount >    100 THEN 'M'
   ELSE 'S'
  END
 ) PERSISTED NOT NULL,

Since the column is persisted, it can now be indexed. The second table where the column is indexed (PurchasePccIdx) adds this code:

 CREATE NONCLUSTERED INDEX PurchasePccIdx_PurchaseCategory_IDX
 ON dbo.PurchasePccIdx (
  PurchaseCategory
 ) WITH FILLFACTOR = 80
 ON [SqlCredit_FG4];

As you can see, there is no syntax difference between indexing a standard column and a persisted computed column.

Script: CreateAndLoadPurchasePccAndPccIdx.sql (runs after LoadTestData.sql)

Performance Indications

Okay, with that out of the way, let’s load the data and see how it looks. I loaded ten million rows of data into each of the three tables (Purchase, PurchasePCC, and PurchasePccIdx). The data is exactly the same in each table.

Script: LoadTestData.sql

The script loads 1,000 Vendor records, 100 Account records, 200 Card records, and the ten million purchases. The amounts were random within certain ranges. See the “Create Purchase records” section of the script for details.

Category

Amount Range

Count

%

S

$0 to $100

2,830,742

28.31

M

>$100 to $1,000

7,070,939

70.71

L

>$1,000 to $10,000

14,769

0.15

XL

>$10,000 to $100,000

67,633

0.68

2X

>$100,000 up

15,917

0.16

The focus of the queries will be on the “2X” purchases.

I called this section “Performance Indications” because there is no clear winner where everything points to the same query as the best option (though we do see a clear loser). Read through the results, draw your own conclusions, and build your own tests if you want to try something different. Also, if you have any feedback or suggestions, please drop a comment in the forum.

Script: PerformanceIndications.sql

Time Test 1 – SELECT * ... WHERE PurchaseCategory = ‘2X’

For this test, I called both FREEPROCCACHE and DROPCLEANBUFFERS and then ran each of these SELECT statements twice, recording the time required for only the second run. The numbers below indicate the elapsed time in milliseconds for each of four runs.

  1. SELECT * FROM dbo.Purchase WHERE Amount > 100000 ORDER BY Amount;
  2. SELECT * FROM dbo.PurchasePCC WHERE PurchaseCategory = '2X' ORDER BY Amount;
  3. SELECT * FROM dbo.PurchasePccIdx WHERE PurchaseCategory = '2X' ORDER BY Amount;
  4. SELECT * FROM dbo.PurchasePccIdx WITH (INDEX (PurchasePccIdx_PurchaseCategory_IDX)) WHERE PurchaseCategory = '2X' ORDER BY Amount;

Test

Run 1

Run 2

Run 3

Run 4

1. Base

686

670

670

686

2. Persisted computed column

986

1,513

1,513

996

3. Indexed persisted computed column

360

360

360

360

4. Indexed persisted computed column with index hint

390

360

360

343

Time Test 2 – SELECT COUNT(*) ... WHERE PurchaseCategory = ‘2X’

This test was very similar to Time Test 1 except that I am only counting the number of records that are “2X”.

  1. SELECT COUNT(*) AS [Count] FROM dbo.Purchase WHERE Amount > 100000;  
  2. SELECT COUNT(*) AS [Count] FROM dbo.PurchasePCC WHERE PurchaseCategory = '2X';
  3. SELECT COUNT(*) AS [Count] FROM dbo.PurchasePccIdx WHERE PurchaseCategory = '2X';
  4. SELECT COUNT(*) AS [Count] FROM dbo.PurchasePccIdx WITH (INDEX (PurchasePccIdx_PurchaseCategory_IDX)) 
    WHERE PurchaseCategory = '2X';

Test

Run 1

Run 2

Run 3

Run 4

1. Base

390

390

390

390

2. Persisted computed column

1,140

1,233

690

1,266

3. Indexed persisted computed column

16

0

0

0

4. Indexed persisted computed column with index hint

0

0

0

0

Execution Plan 1 – SELECT * ... WHERE PurchaseCategory = ‘2X’

This test uses the same queries as Time Test 1 but runs each query only once. The execution plans show the following percentages:

Test

% of Total

Comments

1. Base

25

Clustered Index Scan

2. Persisted computed column

27

Same as 1

3. Indexed persisted computed column

24

Index Seek using PurchaseCaterogy index plus Key Lookup (the Key Lookup is 98% of overall cost)

4. Indexed persisted computed column with index hint

24

Same as 3

Execution Plan 2 – SELECT COUNT(*) ... WHERE PurchaseCategory = ‘2X’

This test uses the same queries as Time Test 2 but runs each only once. The execution plans show the following percentages:

Test

% of Total

Comments

1. Base

47

Clustered Index Scan

2. Persisted computed column

53

Same as 1

3. Indexed persisted computed column

0

Index Seek using PurchaseCategory index

4. Indexed persisted computed column with index hint

0

Same as 3

Recommendations

Using the simple computation here, it is clear that the persisted computed column without an index is not a performance win.

Also, because the query with the index hint is not a clear winner over the query without the hint, I would recommend not using the hint.

Where would this be more useful? I expect that it would be much more useful where the computation is more complex and there are many more reads (of the computed columns) than there are modifications (INSERTs/UPDATEs/DELETEs) of the data used in the computation.

This topic really deserves more investigation. Watch this space for more on the performance implications of indexed persisted computed columns.

References

Download files for part 12 of this series.

» See All Articles by Columnist Rob Garrison

SqlCredit - Developing a Complete SQL Server OLTP Database Project



MS SQL Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
MS SQL Forum
Topic By Replies Updated
SQL 2005: SSIS: Error using SQL Server credentials poverty 3 August 17th, 07:43 AM
Need help changing table contents nkawtg 1 August 17th, 03:02 AM
SQL Server Memory confifuration bhosalenarayan 2 August 14th, 05:33 AM
SQL Server Primary Key and a Unique Key katty.jonh 2 July 25th, 10:36 AM


















Thanks for your registration, follow us on our social networks to keep up-to-date