SqlCredit - Part 13: More on Indexed Persisted Computed Columns
February 22, 2008
Download files for part 13 of this series.
This is part 13 of a series. If you have not read part 12, you can find it here. This article is a follow-up to part 12 and assumes that you have read that.
Let’s use “PCC” as an acronym for persisted computed columns and “IPCC” for indexed persisted computed columns. Bits are cheap, but it saves me typing.
Further Probing Performance
In part 12, I built three tables and loaded each with ten million records. The Purchase table was straight: no computed columns at all. The PurchasePCC table had a PCC column, but it was not indexed. The PurchasePccIdx table had an IPCC column. The computed columns were all one- or two-character values (‘S’, ‘M’, ‘L’, ‘XL’, and ‘2X’).
For this round, I wanted to add PCCs that used numbers instead of short strings. So I added two more PCCs and called them PurchaseCategoryInt and PurchaseCategoryTinyInt, mapping ‘S’ through ‘2X’ to 1 through 5. For the TinyInt column, I used an explicit CAST statement to force SQL Server to use tinyint for the PCC (see the “PCC Column DataType” section below for more on datatypes and CASTing).
Two changes from last time are that I left out the “ORDER BY Amount” because it was just adding overhead that had nothing to do with PCC, and I left out the index hints because the results were not different from the same query without the hints.
Integers are always faster than strings, right? Let’s see how the results came out.
Time Test 1 – SELECT * ... WHERE PurchaseCategory[Int|TinyInt] = 5
For this test, I called both FREEPROCCACHE and DROPCLEANBUFFERS and then ran each of these SELECT statements twice, recording the time required for only the second run. The numbers below indicate the elapsed time in milliseconds for each of three runs.
1. SELECT * FROM Purchase WHERE Amount > 100000; 2. SELECT * FROM PurchasePCC WHERE PurchaseCategoryString = '2X'; 3. SELECT * FROM PurchasePCC WHERE PurchaseCategoryInt = 5; 4. SELECT * FROM PurchasePCC WHERE PurchaseCategoryTinyInt = 5; 5. SELECT * FROM PurchasePccIdx WHERE PurchaseCategoryString = '2X'; 6. SELECT * FROM PurchasePccIdx WHERE PurchaseCategoryInt = 5; 7. SELECT * FROM PurchasePccIdx WHERE PurchaseCategoryTinyInt = 5;
Time Test 2 – SELECT COUNT(*) ... WHERE PurchaseCategory[Int|TinyInt] = 5
This test was very similar to Time Test 1 except that I am only counting the number of records that are 5 (or ‘2X’).
1. SELECT COUNT(*) FROM dbo.Purchase WHERE Amount > 100000; 2. SELECT COUNT(*) FROM dbo.PurchasePCC WHERE PurchaseCategoryString = '2X'; 3. SELECT COUNT(*) FROM dbo.PurchasePCC WHERE PurchaseCategoryInt = 5; 4. SELECT COUNT(*) FROM dbo.PurchasePCC WHERE PurchaseCategoryTinyInt = 5; 5. SELECT COUNT(*) FROM dbo.PurchasePccIdx WHERE PurchaseCategory = '2X'; 6. SELECT COUNT(*) FROM dbo.PurchasePccIdx WHERE PurchaseCategoryInt = 5; 7. SELECT COUNT(*) FROM dbo.PurchasePccIdx WHERE PurchaseCategoryTinyInt = 5;
Execution Plan 1 – SELECT * ... WHERE PurchaseCategory[Int|TinyInt] = 5
This test uses the same queries as Time Test 1 but runs each query only once. The execution plans show the following percentages:
Execution Plan 2 – SELECT COUNT(*) ... WHERE PurchaseCategory[Int|TinyInt] = 5
This test uses the same queries as Time Test 2 but runs each only once. The execution plans show the following percentages:
Results of [Int|TinyInt] versus String
Using the simple computation here, it is clear that the persisted computed column without an index is not a performance win. Looking at the results from “Time Test 1” and “Time Test 2”, it is interesting to note that using an integer instead of a string makes the PCC result about the same as no PCC.
Using the explicit CAST to tinyint appears to have no positive or negative affect on performance. The only win here is a savings of three bytes per row. The only potential drawback is paying the cost of the CAST operation, but I expect that cost is minimal. If your results fit in a smaller datatype (e.g. tinyint versus the default int), then I would recommend building your CASE statement with the CAST to save the space.
PCC Column DataType
I originally chose to call the added column PurchaseCategoryTinyInt because the range of numbers is between 1 and 5, but I did not have any code that specified the datatype of the PCC. When I checked, I found that it was an int and not a tinyint.
SELECT SUBSTRING(c.[name], 1, 24) AS ColumnName, SUBSTRING(t.[name], 1, 9) AS DataType, c.max_length FROM sys.columns AS c JOIN sys.types AS t ON c.system_type_id = t.system_type_id WHERE c.[object_id] = OBJECT_ID('dbo.PurchasePccIdx') AND c.is_computed = 1 ColumnName DataType max_length ------------------------ --------- ---------- PurchaseCategoryString varchar 2 PurchaseCategoryTinyInt int 4
That sent me into heavy-duty investigation. If I create a table with multiple PCCs of different types, what will SQL Server assign as datatypes? And how can I use CAST to force (more correctly, “suggest”) a specific datatype?
Some of these results were as expected. Some (highlighted) were surprises. Check the results and comments below. If you have thoughts about why some of these came out the way they did or any other comments about this article, please add a comment in the forum.
The main thing I would recommend based on the findings? Unless SQL Server automatically creates the datatype you want, control the datatype of the PCC using CAST.
Download files for part 13 of this series.SqlCredit - Developing a Complete SQL Server OLTP Database Project
Performance Testing SQL 2008's Transparent Data Encryption
SQL Server 2008's Change Data Capture - Tracking the Moving Parts
Performance Testing - SQL Server 2008 versus SQL Server 2005
Exploring SQL Server's Index INCLUDEs
Exploring SQL 2005's Ranking Functions - NTILE() and ROW_NUMBER()
SqlCredit - Part 19: Exploring SQL 2005's Ranking Functions - RANK() and DENSE_RANK()
SqlCredit, Part 18: Exploring the Performance of SQL 2005's OUTPUT Clause
SqlCredit - Part 17: Exploring SQL 2005's OUTPUT Clause
SqlCredit - Part 16: The Cost of Bloat
SqlCredit - Part 15: The Cost of Distribution
SqlCredit - Part 14: The Cost of Translation
SqlCredit - Part 13: More on Indexed Persisted Computed Columns
SqlCredit - Part 12: Exploring Indexed Persisted Computed Columns
SqlCredit - Part 11: Change Tracking Using History Records
SqlCredit - Part 10: MAC Performance and Updating SqlCredit
SqlCredit - Part 9: Message Authentication Codes
SqlCredit Part 8: Comparing Encrypt/DecryptByCert and Encrypt/DecryptByKey
SqlCredit Part 7: Performance Impact of EncryptByCert and DecryptByCert
SqlCredit Part 6: Exploring EncryptByCert and DecryptByCert
SqlCredit - Part 5: Adding Card, Vendor, and Purchase, Plus Much Refactoring
SqlCredit - Part 4: Schema and Procedure Security
SqlCredit - Part 3: Schema/Proc Updates and Automated Unit Testing
SqlCredit - Part 2: Creating the Database, Tables, CRUD Procedures
SqlCredit - Developing a Complete SQL Server OLTP Database Project