Download files for part 12 of this series.
This is part 12 of a series. If you have not read part 11,
you can find it here.
SQL Server 2005 adds the option of “persisting” a computed
column. Prior to SQL Server 2005, you could create a computed column, but it
would not be stored on disk and could not be indexed. If a computed column was
not persisted, it would be recalculated each time it is accessed, so you have
the choice between paying the cost once at INSERT, UPDATE, or DELETE time
(persisted) or possibly many times at SELECT time (not persisted). Either is a
reasonable choice. It depends on your unique situation.
However, if you would like to index a computed
column, you have to use SQL Server 2005 (or higher) and you have to persist the
computed column.
As is often the case, there is much information available
about the basics of persisted computed columns in 2005 but very little that
goes deeper. So, I will quickly hit the basics of persisted computed columns
and then move on to performance comparisons and other interesting details.
The Basics
The idea here is that we would like to categorize purchases
and be able to quickly look up purchases that are in a certain category.
Category |
Amount Range |
S |
$0 to $100 |
M |
>$100 to $1,000 |
L |
>$1,000 to $10,000 |
XL |
>$10,000 to $100,000 |
2X |
>$100,000 up |
Here is the structure of the standard Purchase table for
reference:
CREATE TABLE dbo.Purchase (
PurchaseID int NOT NULL IDENTITY,
CardID int NOT NULL,
VendorID int NOT NULL,
Amount money NOT NULL,
DateCreated datetime NOT NULL
CONSTRAINT Purchase_DateCreated_DF DEFAULT GETUTCDATE(),CONSTRAINT Purchase_PK PRIMARY KEY CLUSTERED (PurchaseID),
CONSTRAINT Purchase_Card_FK FOREIGN KEY (CardID)
REFERENCES dbo.Card (CardID),
CONSTRAINT Purchase_Vendor_FK FOREIGN KEY (VendorID)
REFERENCES dbo.Vendor (VendorID)
);
There will be two new versions plus the base version. For
the basic persisted computed columns table (PurchasePCC), we add this
below the DateCreated column:
PurchaseCategory AS (
CASE
WHEN Amount > 100000 THEN ‘2X’
WHEN Amount > 10000 THEN ‘XL’
WHEN Amount > 1000 THEN ‘L’
WHEN Amount > 100 THEN ‘M’
ELSE ‘S’
END
) PERSISTED NOT NULL,
Since the column is persisted, it can now be indexed. The
second table where the column is indexed (PurchasePccIdx) adds this
code:
CREATE NONCLUSTERED INDEX PurchasePccIdx_PurchaseCategory_IDX
ON dbo.PurchasePccIdx (
PurchaseCategory
) WITH FILLFACTOR = 80
ON [SqlCredit_FG4];
As you can see, there is no syntax difference between
indexing a standard column and a persisted computed column.
Script: CreateAndLoadPurchasePccAndPccIdx.sql (runs after
LoadTestData.sql)
Performance Indications
Okay, with that out of the way, let’s load the data and see
how it looks. I loaded ten million rows of data into each of the three tables
(Purchase, PurchasePCC, and PurchasePccIdx). The data is exactly the same in
each table.
Script: LoadTestData.sql
The script loads 1,000 Vendor records, 100 Account records,
200 Card records, and the ten million purchases. The amounts were random within
certain ranges. See the “Create Purchase records” section of the script for
details.
Category |
Amount Range |
Count |
% |
S |
$0 to $100 |
2,830,742 |
28.31 |
M |
>$100 to $1,000 |
7,070,939 |
70.71 |
L |
>$1,000 to $10,000 |
14,769 |
0.15 |
XL |
>$10,000 to $100,000 |
67,633 |
0.68 |
2X |
>$100,000 up |
15,917 |
0.16 |
The focus of the queries will be on the “2X” purchases.
I called this section “Performance Indications”
because there is no clear winner where everything points to the same query as
the best option (though we do see a clear loser). Read through the results, draw
your own conclusions, and build your own tests if you want to try something
different. Also, if you have any feedback or suggestions, please drop a comment in the forum.
Script: PerformanceIndications.sql
Time Test 1 – SELECT * … WHERE PurchaseCategory = ‘2X’
For this test, I called both FREEPROCCACHE and
DROPCLEANBUFFERS and then ran each of these SELECT statements twice, recording
the time required for only the second run. The numbers below indicate the
elapsed time in milliseconds for each of four runs.
1. SELECT * FROM dbo.Purchase WHERE Amount > 100000 ORDER BY Amount;
2. SELECT * FROM dbo.PurchasePCC WHERE PurchaseCategory = ‘2X’ ORDER BY Amount;
3. SELECT * FROM dbo.PurchasePccIdx WHERE PurchaseCategory = ‘2X’ ORDER BY Amount;
4. SELECT * FROM dbo.PurchasePccIdx WITH (INDEX (PurchasePccIdx_PurchaseCategory_IDX)) WHERE PurchaseCategory = ‘2X’ ORDER BY Amount;
Test |
Run 1 |
Run 2 |
Run 3 |
Run 4 |
1. Base |
686 |
670 |
670 |
686 |
2. Persisted computed column |
986 |
1,513 |
1,513 |
996 |
3. Indexed persisted computed column |
360 |
360 |
360 |
360 |
4. Indexed persisted computed column with index hint |
390 |
360 |
360 |
343 |
Time Test 2 – SELECT COUNT(*) … WHERE PurchaseCategory = ‘2X’
This test was very similar to Time Test 1 except that I am
only counting the number of records that are “2X”.
1. SELECT COUNT(*) AS [Count] FROM dbo.Purchase WHERE Amount > 100000;
2. SELECT COUNT(*) AS [Count] FROM dbo.PurchasePCC WHERE PurchaseCategory = ‘2X’;
3. SELECT COUNT(*) AS [Count] FROM dbo.PurchasePccIdx WHERE PurchaseCategory = ‘2X’;
4. SELECT COUNT(*) AS [Count] FROM dbo.PurchasePccIdx WITH (INDEX (PurchasePccIdx_PurchaseCategory_IDX))
WHERE PurchaseCategory = ‘2X’;
Test |
Run 1 |
Run 2 |
Run 3 |
Run 4 |
1. Base |
390 |
390 |
390 |
390 |
2. Persisted computed column |
1,140 |
1,233 |
690 |
1,266 |
3. Indexed persisted computed column |
16 |
0 |
0 |
0 |
4. Indexed persisted computed column with index hint |
0 |
0 |
0 |
0 |
Execution Plan 1 – SELECT * … WHERE PurchaseCategory = ‘2X’
This test uses the same queries as Time Test 1 but runs each
query only once. The execution plans show the following percentages:
Test |
% of Total |
Comments |
1. Base |
25 |
Clustered Index Scan |
2. Persisted computed column |
27 |
Same as 1 |
3. Indexed persisted computed column |
24 |
Index Seek using PurchaseCaterogy index plus Key Lookup |
4. Indexed persisted computed column with index hint |
24 |
Same as 3 |
Execution Plan 2 – SELECT COUNT(*) … WHERE PurchaseCategory = ‘2X’
This test uses the same queries as Time Test 2 but runs each
only once. The execution plans show the following percentages:
Test |
% of Total |
Comments |
1. Base |
47 |
Clustered Index Scan |
2. Persisted computed column |
53 |
Same as 1 |
3. Indexed persisted computed column |
0 |
Index Seek using PurchaseCategory index |
4. Indexed persisted computed column with index hint |
0 |
Same as 3 |
Recommendations
Using the simple computation here, it is clear that the
persisted computed column without an index is not a performance win.
Also, because the query with the index hint is not a clear
winner over the query without the hint, I would recommend not using the hint.
Where would this be more useful? I expect that it would be
much more useful where the computation is more complex and there are many more
reads (of the computed columns) than there are modifications (INSERTs/UPDATEs/DELETEs)
of the data used in the computation.
This topic really deserves more investigation. Watch this
space for more on the performance implications of indexed persisted computed
columns.
References
-
Creating
Indexes on Computed Columns from SQL Server 2005 Books Online -
Computed
Columns from SQL Server 2005 Books Online -
Top
10 Hidden Gems in SQL Server 2005 from the SQL
CAT blog, by Cihan Biyikoglu
Download files for part 12 of this series.