Download files for part 13 of this series.
This is part 13 of a series. If you have not read part 12,
you can find it here.
This article is a follow-up to part 12 and assumes that you have read that.
Lets use PCC as an acronym for persisted computed columns
and IPCC for indexed persisted computed columns. Bits are cheap, but it saves
me typing.
Further Probing Performance
In part 12, I built three tables and loaded each with ten
million records. The Purchase table was straight: no computed columns at all.
The PurchasePCC table had a PCC column, but it was not indexed. The PurchasePccIdx
table had an IPCC column. The computed columns were all one- or two-character
values (S, M, L, XL, and 2X).
For this round, I wanted to add PCCs that used numbers
instead of short strings. So I added two more PCCs and called them PurchaseCategoryInt
and PurchaseCategoryTinyInt, mapping S through 2X to 1 through 5. For the TinyInt
column, I used an explicit CAST statement to force SQL Server to use tinyint
for the PCC (see the PCC Column DataType section below for more on datatypes
and CASTing).
Two changes from last time are that I left out the ORDER BY
Amount because it was just adding overhead that had nothing to do with PCC,
and I left out the index hints because the results were not different from the
same query without the hints.
Integers are always faster than strings, right? Lets see
how the results came out.
Time Test 1 SELECT * ... WHERE PurchaseCategory[Int|TinyInt] = 5
For this test, I called both FREEPROCCACHE and
DROPCLEANBUFFERS and then ran each of these SELECT statements twice, recording
the time required for only the second run. The numbers below indicate the
elapsed time in milliseconds for each of three runs.
1. SELECT * FROM Purchase WHERE Amount > 100000;
2. SELECT * FROM PurchasePCC WHERE PurchaseCategoryString = '2X';
3. SELECT * FROM PurchasePCC WHERE PurchaseCategoryInt = 5;
4. SELECT * FROM PurchasePCC WHERE PurchaseCategoryTinyInt = 5;
5. SELECT * FROM PurchasePccIdx WHERE PurchaseCategoryString = '2X';
6. SELECT * FROM PurchasePccIdx WHERE PurchaseCategoryInt = 5;
7. SELECT * FROM PurchasePccIdx WHERE PurchaseCategoryTinyInt = 5;
|
Test
|
Run 1
|
Run 2
|
Run 3
|
Avg
|
% of Total
|
|
1. Base
|
576
|
576
|
563
|
572
|
15.0%
|
|
2. PCC (string)
|
906
|
890
|
893
|
896
|
23.6%
|
|
3. PCC (int)
|
623
|
610
|
596
|
610
|
16.0%
|
|
4. PCC (tinyint)
|
626
|
656
|
626
|
636
|
16.7%
|
|
5. IPCC (string)
|
343
|
326
|
470
|
380
|
10.0%
|
|
6. IPCC (int)
|
363
|
320
|
370
|
351
|
9.2%
|
|
7. IPCC (tinyint)
|
360
|
373
|
330
|
354
|
9.3%
|
Time Test 2 SELECT COUNT(*) ... WHERE PurchaseCategory[Int|TinyInt] = 5
This test was very similar to Time Test 1 except that I am
only counting the number of records that are 5 (or 2X).
1. SELECT COUNT(*) FROM dbo.Purchase WHERE Amount > 100000;
2. SELECT COUNT(*) FROM dbo.PurchasePCC WHERE PurchaseCategoryString = '2X';
3. SELECT COUNT(*) FROM dbo.PurchasePCC WHERE PurchaseCategoryInt = 5;
4. SELECT COUNT(*) FROM dbo.PurchasePCC WHERE PurchaseCategoryTinyInt = 5;
5. SELECT COUNT(*) FROM dbo.PurchasePccIdx WHERE PurchaseCategory = '2X';
6. SELECT COUNT(*) FROM dbo.PurchasePccIdx WHERE PurchaseCategoryInt = 5;
7. SELECT COUNT(*) FROM dbo.PurchasePccIdx WHERE PurchaseCategoryTinyInt = 5;
|
Test
|
Run 1
|
Run 2
|
Run 3
|
Avg
|
% of Total
|
|
1. Base
|
376
|
390
|
376
|
381
|
21.2%
|
|
2. PCC (string)
|
670
|
670
|
673
|
671
|
37.3%
|
|
3. PCC (int)
|
373
|
376
|
376
|
375
|
20.8%
|
|
4. PCC (tinyint)
|
370
|
360
|
370
|
367
|
20.4%
|
|
5. IPCC (string)
|
0
|
0
|
16
|
5
|
0.3%
|
|
6. IPCC (int)
|
0
|
0
|
0
|
0
|
0%
|
|
7. IPCC (tinyint)
|
0
|
0
|
0
|
0
|
0%
|
Execution Plan 1 SELECT * ... WHERE PurchaseCategory[Int|TinyInt] = 5
This test uses the same queries as Time Test 1 but runs each
query only once. The execution plans show the following percentages:
|
Test
|
% of Total
|
Comments
|
|
1. Base
|
13%
|
Clustered Index Scan
|
|
2. PCC (string)
|
16%
|
Same as 1
|
|
3. PCC (int)
|
16%
|
Same as 1
|
|
4. PCC (tinyint)
|
16%
|
Same as 1
|
|
5. IPCC (string)
|
13%
|
Index Seek using PurchaseCategoryString index plus Key
Lookup (the Key Lookup is 100% of overall cost)
|
|
6. IPCC (int)
|
13%
|
Index Seek using PurchaseCategoryInt index plus Key Lookup
(the Key Lookup is 100% of overall cost)
|
|
7. IPCC (tinyint)
|
13%
|
Index Seek using PurchaseCategoryTinyInt index plus Key
Lookup (the Key Lookup is 100% of overall cost)
|
Execution Plan 2 SELECT COUNT(*) ... WHERE PurchaseCategory[Int|TinyInt]
= 5
This test uses the same queries as Time Test 2 but runs each
only once. The execution plans show the following percentages:
|
Test
|
% of Total
|
Comments
|
|
1. Base
|
21%
|
Clustered Index Scan
|
|
2. PCC (string)
|
26%
|
Same as 1
|
|
3. PCC (int)
|
26%
|
Same as 1
|
|
4. PCC (tinyint)
|
26%
|
Same as 1
|
|
5. IPCC (string)
|
0%
|
Index Seek using PurchaseCategoryString index
|
|
6. IPCC (int)
|
0%
|
Index Seek using PurchaseCategoryInt index
|
|
7. IPCC (tinyint)
|
0%
|
Index Seek using PurchaseCategoryTinyInt index
|
Results of [Int|TinyInt] versus String
Using the simple computation here, it is clear that the
persisted computed column without an index is not a performance win. Looking at
the results from Time Test 1 and Time Test 2, it is interesting to note
that using an integer instead of a string makes the PCC result about the same
as no PCC.
Using the explicit CAST to tinyint appears to have no
positive or negative affect on performance. The only win here is a
savings of three bytes per row. The only potential drawback is paying the cost
of the CAST operation, but I expect that cost is minimal. If your results fit
in a smaller datatype (e.g. tinyint versus the default int), then I would
recommend building your CASE statement with the CAST to save the space.
PCC Column DataType
I originally chose to call the added column PurchaseCategoryTinyInt
because the range of numbers is between 1 and 5, but I did not have any code
that specified the datatype of the PCC. When I checked, I found that it was an int
and not a tinyint.
SELECT
SUBSTRING(c.[name], 1, 24) AS ColumnName,
SUBSTRING(t.[name], 1, 9) AS DataType,
c.max_length
FROM sys.columns AS c
JOIN sys.types AS t
ON c.system_type_id = t.system_type_id
WHERE c.[object_id] = OBJECT_ID('dbo.PurchasePccIdx')
AND c.is_computed = 1
ColumnName DataType max_length
------------------------ --------- ----------
PurchaseCategoryString varchar 2
PurchaseCategoryTinyInt int 4
That sent me into heavy-duty investigation. If I create a
table with multiple PCCs of different types, what will SQL Server assign as datatypes?
And how can I use CAST to force (more correctly, suggest) a specific datatype?
Some of these results were as expected. Some (highlighted) were surprises. Check the results
and comments below. If you have thoughts about why some of these came out the
way they did or any other comments about this article, please add a comment
in the forum.
|
CASE Formula
|
Expected
Datatype
|
Actual
Datatype
|
Comments
|
|
1 ELSE 0
|
bit or tinyint
|
int
|
Not completely surprising since it would be difficult to
parse the CASE statement to see that the only possibilities are 0 and 1.
|
|
1
|
bit or tinyint
|
int
|
|
|
0
|
bit or tinyint
|
int
|
|
|
CAST (1 AS bit)
|
bit
|
bit
|
|
|
CAST (1 AS bit)
ELSE 0
|
bit or tinyint
|
int
|
Since 0 is not cast, it creates an int. This makes sense
once you think it through.
|
|
CAST (1 AS bit)
ELSE CAST (0 AS bit)
|
bit
|
bit
|
|
|
CAST(100 AS tinyint)
ELSE CAST(0 AS tinyint)
|
tinyint
|
tinyint
|
|
|
CAST(1 AS smallint)
ELSE CAST(0 AS smallint)
|
smallint
|
smallint
|
|
|
CAST(-2147483648
AS int) ELSE CAST(0 AS int)
|
int
|
int
|
Max negative integer
|
|
CAST(9223372036854775807
AS bigint) ELSE CAST(0 AS bigint)
|
bigint
|
bigint
|
Max bigint
|
|
CAST(9223372036854775808
AS bigint) ELSE CAST(0 AS bigint)
|
Fails
|
Fails
|
(Max bigint + 1)
Fails with "Arithmetic overflow" as expected.
|
|
2147483647
|
int
|
int
|
Max positive integer
|
|
2147483648
|
bigint
|
numeric(9,0)
|
Max positive integer + 1. Interesting that it uses numeric
for large integers instead of bigint.
|
|
-2147483647
|
int
|
int
|
Max negative integer + 1
|
|
-2147483648
|
int
|
numeric(9,0)
|
Max negative integer. Why the inconsistency?
|
|
9223372036854775807
|
bigint
|
numeric(9,0)
|
Max positive bigint
|
|
-9223372036854775808
|
bigint
|
numeric(9,0)
|
Max negative bigint
|
|
NULL ELSE 1234
|
int
|
int
|
|
|
CAST(100.23 AS
money)
|
money
|
money
|
|
|
100.23
|
numeric(5,2)
|
numeric(5,2)
|
|
|
$100.23
|
numeric(5,2)
|
money
|
Really?
|
|
CAST(1.2 AS float)
ELSE CAST(0 AS float)
|
float
|
float
|
|
|
CAST(3.1415926535897932384626433832795028841
AS float)
|
float
|
float
|
|
|
3.1415926535897932384626433832795028841
|
numeric(17,37)
|
numeric(17,37)
|
|
|
3.14159265358979323846264338327950288412
|
Fails
|
Fails
|
This is the previous number with one more digit added. It
fails with The number '3.141 ...' is out of the range for numeric
representation (maximum precision 38) as expected.
|
|
12345678901234567890.123
|
numeric(17,3)
|
numeric(17,3)
|
|
|
3.1415926535897932384626433832795028841
ELSE 12345678901234567890.123
|
numeric(17,18)
|
numeric(17,18)
|
Note the widely-varied precision.
|
|
0.0000000000000000000000000000000000001
|
numeric(17,37)
|
numeric(17,37)
|
|
|
99999999999999999999999999999999999999
|
numeric(17,0)
|
numeric(17,0)
|
|
|
NULL ELSE 'NOT
NULL'
|
varchar(8)
|
varchar(8)
|
|
|
'ASCII String'
|
varchar(12)
|
varchar(12)
|
|
|
N'UNICODE String'
|
nvarchar(14)
|
nvarchar(14)
|
|
|
CAST('ASCII
String' AS varchar(40))
|
varchar(40)
|
varchar(40)
|
|
|
CAST('ASCII
String' AS varchar(MAX))
|
varchar(MAX)
|
varchar(MAX)
|
|
|
CAST('ASCII
String' AS text)
|
text
|
text
|
|
|
CHAR(100)
|
char(1)
|
char(1)
|
|
|
NCHAR(100)
|
nchar(1)
|
nchar(1)
|
|
|
CAST(1 AS varbinary(20))
|
varbinary(20)
|
varbinary(20)
|
|
|
NEWID()
|
uniqueidentifier
|
uniqueidentifier
|
Surprised that this works.
|
|
NEWSEQUENTIALID()
|
Fails
|
Fails
|
Fails with The newsequentialid() built-in function can
only be used in a DEFAULT expression for a column of type 'uniqueidentifier'
in a CREATE TABLE or ALTER TABLE statement. It cannot be combined with other
operators to form a complex scalar expression.
|
|
GETDATE()
|
datetime
|
datetime
|
|
|
CAST(GETDATE() AS int)
|
int
|
int
|
|
|
CAST(GETDATE() AS smalldatetime)
|
smalldatetime
|
smalldatetime
|
|
Recommendations
The main thing I would recommend based on the findings? Unless
SQL Server automatically creates the datatype you want, control the datatype of
the PCC using CAST.
Download files for part 13 of this series.
»
See All Articles by Columnist Rob Garrison