SqlCredit - Part 13: More on Indexed Persisted Computed Columns

February 22, 2008

Download files for part 13 of this series.

This is part 13 of a series. If you have not read part 12, you can find it here. This article is a follow-up to part 12 and assumes that you have read that.

Let’s use “PCC” as an acronym for persisted computed columns and “IPCC” for indexed persisted computed columns. Bits are cheap, but it saves me typing.

Further Probing Performance

In part 12, I built three tables and loaded each with ten million records. The Purchase table was straight: no computed columns at all. The PurchasePCC table had a PCC column, but it was not indexed. The PurchasePccIdx table had an IPCC column. The computed columns were all one- or two-character values (‘S’, ‘M’, ‘L’, ‘XL’, and ‘2X’).

For this round, I wanted to add PCCs that used numbers instead of short strings. So I added two more PCCs and called them PurchaseCategoryInt and PurchaseCategoryTinyInt, mapping ‘S’ through ‘2X’ to 1 through 5. For the TinyInt column, I used an explicit CAST statement to force SQL Server to use tinyint for the PCC (see the “PCC Column DataType” section below for more on datatypes and CASTing).

Two changes from last time are that I left out the “ORDER BY Amount” because it was just adding overhead that had nothing to do with PCC, and I left out the index hints because the results were not different from the same query without the hints.

Integers are always faster than strings, right? Let’s see how the results came out.

Time Test 1 – SELECT * ... WHERE PurchaseCategory[Int|TinyInt] = 5

For this test, I called both FREEPROCCACHE and DROPCLEANBUFFERS and then ran each of these SELECT statements twice, recording the time required for only the second run. The numbers below indicate the elapsed time in milliseconds for each of three runs.

  
  1.  SELECT * FROM Purchase WHERE Amount > 100000;
  2.  SELECT * FROM PurchasePCC WHERE PurchaseCategoryString = '2X';
  3.  SELECT * FROM PurchasePCC WHERE PurchaseCategoryInt = 5;
  4.  SELECT * FROM PurchasePCC WHERE PurchaseCategoryTinyInt = 5;
  5.  SELECT * FROM PurchasePccIdx WHERE PurchaseCategoryString = '2X';
  6.  SELECT * FROM PurchasePccIdx WHERE PurchaseCategoryInt = 5;
  7.  SELECT * FROM PurchasePccIdx WHERE PurchaseCategoryTinyInt = 5;

Test

Run 1

Run 2

Run 3

Avg

% of Total

1. Base

576

576

563

572

15.0%

2. PCC (string)

906

890

893

896

23.6%

3. PCC (int)

623

610

596

610

16.0%

4. PCC (tinyint)

626

656

626

636

16.7%

5. IPCC (string)

343

326

470

380

10.0%

6. IPCC (int)

363

320

370

351

9.2%

7. IPCC (tinyint)

360

373

330

354

9.3%

Time Test 2 – SELECT COUNT(*) ... WHERE PurchaseCategory[Int|TinyInt] = 5

This test was very similar to Time Test 1 except that I am only counting the number of records that are 5 (or ‘2X’).

  1.  SELECT COUNT(*) FROM dbo.Purchase WHERE Amount > 100000;
  2.  SELECT COUNT(*) FROM dbo.PurchasePCC WHERE PurchaseCategoryString = '2X';
  3.  SELECT COUNT(*) FROM dbo.PurchasePCC WHERE PurchaseCategoryInt = 5;
  4.  SELECT COUNT(*) FROM dbo.PurchasePCC WHERE PurchaseCategoryTinyInt = 5;
  5.  SELECT COUNT(*) FROM dbo.PurchasePccIdx WHERE PurchaseCategory = '2X';
  6.  SELECT COUNT(*) FROM dbo.PurchasePccIdx WHERE PurchaseCategoryInt = 5;
  7.  SELECT COUNT(*) FROM dbo.PurchasePccIdx WHERE PurchaseCategoryTinyInt = 5;

Test

Run 1

Run 2

Run 3

Avg

% of Total

1. Base

376

390

376

381

21.2%

2. PCC (string)

670

670

673

671

37.3%

3. PCC (int)

373

376

376

375

20.8%

4. PCC (tinyint)

370

360

370

367

20.4%

5. IPCC (string)

0

0

16

5

0.3%

6. IPCC (int)

0

0

0

0

0%

7. IPCC (tinyint)

0

0

0

0

0%

Execution Plan 1 – SELECT * ... WHERE PurchaseCategory[Int|TinyInt] = 5

This test uses the same queries as Time Test 1 but runs each query only once. The execution plans show the following percentages:

Test

% of Total

Comments

1. Base

13%

Clustered Index Scan

2. PCC (string)

16%

Same as 1

3. PCC (int)

16%

Same as 1

4. PCC (tinyint)

16%

Same as 1

5. IPCC (string)

13%

Index Seek using PurchaseCategoryString index plus Key Lookup (the Key Lookup is 100% of overall cost)

6. IPCC (int)

13%

Index Seek using PurchaseCategoryInt index plus Key Lookup (the Key Lookup is 100% of overall cost)

7. IPCC (tinyint)

13%

Index Seek using PurchaseCategoryTinyInt index plus Key Lookup (the Key Lookup is 100% of overall cost)

Execution Plan 2 – SELECT COUNT(*) ... WHERE PurchaseCategory[Int|TinyInt] = 5

This test uses the same queries as Time Test 2 but runs each only once. The execution plans show the following percentages:

Test

% of Total

Comments

1. Base

21%

Clustered Index Scan

2. PCC (string)

26%

Same as 1

3. PCC (int)

26%

Same as 1

4. PCC (tinyint)

26%

Same as 1

5. IPCC (string)

0%

Index Seek using PurchaseCategoryString index

6. IPCC (int)

0%

Index Seek using PurchaseCategoryInt index

7. IPCC (tinyint)

0%

Index Seek using PurchaseCategoryTinyInt index

Results of [Int|TinyInt] versus String

Using the simple computation here, it is clear that the persisted computed column without an index is not a performance win. Looking at the results from “Time Test 1” and “Time Test 2”, it is interesting to note that using an integer instead of a string makes the PCC result about the same as no PCC.

Using the explicit CAST to tinyint appears to have no positive or negative affect on performance. The only win here is a savings of three bytes per row. The only potential drawback is paying the cost of the CAST operation, but I expect that cost is minimal. If your results fit in a smaller datatype (e.g. tinyint versus the default int), then I would recommend building your CASE statement with the CAST to save the space.

PCC Column DataType

I originally chose to call the added column PurchaseCategoryTinyInt because the range of numbers is between 1 and 5, but I did not have any code that specified the datatype of the PCC. When I checked, I found that it was an int and not a tinyint.

  SELECT
      SUBSTRING(c.[name], 1, 24) AS ColumnName,
      SUBSTRING(t.[name], 1, 9)  AS DataType,
      c.max_length
  FROM sys.columns AS c
  JOIN sys.types   AS t
  ON c.system_type_id = t.system_type_id
  WHERE c.[object_id] = OBJECT_ID('dbo.PurchasePccIdx')
    AND c.is_computed = 1
  
  ColumnName               DataType  max_length
  ------------------------ --------- ----------
  PurchaseCategoryString   varchar   2
  PurchaseCategoryTinyInt  int       4
 

That sent me into heavy-duty investigation. If I create a table with multiple PCCs of different types, what will SQL Server assign as datatypes? And how can I use CAST to force (more correctly, “suggest”) a specific datatype?

Some of these results were as expected. Some (highlighted) were surprises. Check the results and comments below. If you have thoughts about why some of these came out the way they did or any other comments about this article, please add a comment in the forum.

CASE Formula

Expected
Datatype

Actual
Datatype

Comments

1 ELSE 0

bit or tinyint

int

Not completely surprising since it would be difficult to parse the CASE statement to see that the only possibilities are 0 and 1.

1

bit or tinyint

int

 

0

bit or tinyint

int

 

CAST (1 AS bit)

bit

bit

 

CAST (1 AS bit) ELSE 0

bit or tinyint

int

Since 0 is not cast, it creates an int. This makes sense once you think it through.

CAST (1 AS bit) ELSE CAST (0 AS bit)

bit

bit

 

CAST(100 AS tinyint) ELSE CAST(0 AS tinyint)

tinyint

tinyint

 

CAST(1 AS smallint) ELSE CAST(0 AS smallint)

smallint

smallint

 

CAST(-2147483648 AS int) ELSE CAST(0 AS int)

int

int

Max negative integer

CAST(9223372036854775807 AS bigint) ELSE CAST(0 AS bigint)

bigint

bigint

Max bigint

CAST(9223372036854775808 AS bigint) ELSE CAST(0 AS bigint)

Fails

Fails

(Max bigint + 1)

Fails with "Arithmetic overflow" as expected.

2147483647

int

int

Max positive integer

2147483648

bigint

numeric(9,0)

Max positive integer + 1. Interesting that it uses numeric for large integers instead of bigint.

-2147483647

int

int

Max negative integer + 1

-2147483648

int

numeric(9,0)

Max negative integer. Why the inconsistency?

9223372036854775807

bigint

numeric(9,0)

Max positive bigint

-9223372036854775808

bigint

numeric(9,0)

Max negative bigint

NULL ELSE 1234

int

int

 

CAST(100.23 AS money)

money

money

 

100.23

numeric(5,2)

numeric(5,2)

 

$100.23

numeric(5,2)

money

Really?

CAST(1.2 AS float) ELSE CAST(0 AS float)

float

float

 

CAST(3.1415926535897932384626433832795028841 AS float)

float

float

 

3.1415926535897932384626433832795028841

numeric(17,37)

numeric(17,37)

 

3.14159265358979323846264338327950288412

Fails

Fails

This is the previous number with one more digit added. It fails with “The number '3.141 ...' is out of the range for numeric representation (maximum precision 38)” as expected.

12345678901234567890.123

numeric(17,3)

numeric(17,3)

 

3.1415926535897932384626433832795028841 ELSE 12345678901234567890.123

numeric(17,18)

numeric(17,18)

Note the widely-varied precision.

0.0000000000000000000000000000000000001

numeric(17,37)

numeric(17,37)

 

99999999999999999999999999999999999999

numeric(17,0)

numeric(17,0)

 

NULL ELSE 'NOT NULL'

varchar(8)

varchar(8)

 

'ASCII String'

varchar(12)

varchar(12)

 

N'UNICODE String'

nvarchar(14)

nvarchar(14)

 

CAST('ASCII String' AS varchar(40))

varchar(40)

varchar(40)

 

CAST('ASCII String' AS varchar(MAX))

varchar(MAX)

varchar(MAX)

 

CAST('ASCII String' AS text)

text

text

 

CHAR(100)

char(1)

char(1)

 

NCHAR(100)

nchar(1)

nchar(1)

 

CAST(1 AS varbinary(20))

varbinary(20)

varbinary(20)

 

NEWID()

uniqueidentifier

uniqueidentifier

Surprised that this works.

NEWSEQUENTIALID()

Fails

Fails

Fails with “The newsequentialid() built-in function can only be used in a DEFAULT expression for a column of type 'uniqueidentifier' in a CREATE TABLE or ALTER TABLE statement. It cannot be combined with other operators to form a complex scalar expression.”

GETDATE()

datetime

datetime

 

CAST(GETDATE() AS int)

int

int

 

CAST(GETDATE() AS smalldatetime)

smalldatetime

smalldatetime

 

Recommendations

The main thing I would recommend based on the findings? Unless SQL Server automatically creates the datatype you want, control the datatype of the PCC using CAST.

Download files for part 13 of this series.

» See All Articles by Columnist Rob Garrison

SqlCredit - Developing a Complete SQL Server OLTP Database Project








The Network for Technology Professionals

Search:

About Internet.com

Legal Notices, Licensing, Permissions, Privacy Policy.
Advertise | Newsletters | E-mail Offers