This is part 7 of a series. If you have not read part 6, you can find it here.
Let’s Get This Out Right Away
The task here is to determine the performance impact of EncryptByCert and DecryptByCert. In short, the cost is high. Microsoft acknowledges this in the Remarks section of Books Online’s coverage of EncryptByCert, but it is not a well-publicized limitation:
This function encrypts data with the public key of a certificate. … Such asymmetric transformations are very costly compared to encryption and decryption using a symmetric key. Asymmetric encryption is therefore not recommended when working with large datasets such as user data in tables.
I searched for articles that dealt with EncryptByCert and DecryptByCert and found a number of places where people discuss the functionality with no mention of the “large datasets” limitation.
The net is that EncryptByCert and DecryptByCert work for small bits of sensitive data but are inappropriate for use in high-volume tables like Card.
Never Fear, We Do Have Options
These performance results are valuable, even if they make it clear that we need another solution. Next month, I will change all the EncryptByCert and DecryptByCert calls to EncryptByKey and DecryptByKey (symmetric encryption) and use the tests created here to allow direct comparisons.
Performance Testing
What I did was build a test that gives an indication of the performance hit when using EncyptByCert and DecryptByCert. All of the code is included, so please do download the code and try it out yourself. I do not have access to any heavy machinery that I can play with, so if you run these tests on some high-powered system with a SAN, please share your results in the forum.
The test results here are from two different desktop-class systems. The first is my primary development system; the second is a “sandbox” I set up to try SQL Server 2005 before it was released. (As soon as I have time, I will be loading the SQL Server 2008 CTP on that system so I can check out the Resource Governor).
The Tests
The tests are fairly simple. They create two tables, one with encryption (CardEncrypt) and one without (CardNoEncrypt).
CREATE TABLE dbo.CardNoEncrypt ( CardID int NOT NULL IDENTITY, CardNumber1 bigint NOT NULL, CardNumber2 bigint NOT NULL, CardNumber3 bigint NOT NULL, CardNumber4 bigint NOT NULL, SecurityCode1 smallint NOT NULL, SecurityCode2 smallint NOT NULL, SecurityCode3 smallint NOT NULL, SecurityCode4 smallint NOT NULL, SecureString1 varchar(36) NOT NULL, SecureString2 varchar(36) NOT NULL, SecureString3 varchar(36) NOT NULL, SecureString4 varchar(36) NOT NULL, CONSTRAINT CardNoEncrypt_PK PRIMARY KEY CLUSTERED (CardID) ) ON [SqlCredit_FG1]; CREATE TABLE dbo.CardEncrypt ( CardID int NOT NULL IDENTITY, CardNumber1 varbinary(128) NOT NULL, CardNumber2 bigint NOT NULL, CardNumber3 bigint NOT NULL, CardNumber4 bigint NOT NULL, SecurityCode1 varbinary(128) NOT NULL, SecurityCode2 smallint NOT NULL, SecurityCode3 smallint NOT NULL, SecurityCode4 smallint NOT NULL, SecureString1 varbinary(128) NOT NULL, SecureString2 varchar(36) NOT NULL, SecureString3 varchar(36) NOT NULL, SecureString4 varchar(36) NOT NULL, CONSTRAINT CardEncrypt_PK PRIMARY KEY CLUSTERED (CardID) ) ON [SqlCredit_FG3];
There is an index on CardNoEncrypt (CardNumber1, SecurityCode1) but no corresponding index on CardEncrypt.
In a real application, you would not want to encrypt every column in a table, so CardEncrypt includes three encrypted columns (a bigint, a smallint, and a varchar(36)) while the rest are unencrypted.
Test Steps:
- Insert n records into each table.
- Read n random records by ID.
Each test is separated by a five-second delay to allow the system to settle between tests.
To run the tests, run BuildSqlCredit.bat, then EncryptPerfTest_Setup.sql, and finally EncryptPerfTest_Test.sql. The Test script has one important parameter that maps to the n, above: @createMax. The tests were run with it set to 10,000, but when you run it on your own system, set it low to start with.
The Systems
Common
SQL Server 2005 SP2
Windows XP Professional
System 1
CPU: Single P4, 2.66 GHz
Memory: 2 GB
Hard Drives: One physical drive
System 2
CPU: Single P4, 3.0 GHz HT
Memory: 1.5 GB
Hard Drives: Two physical drives (SQL, OS, and logs on C:, data on D:)
The Results
System 1 CPU Graph:
System 1 Statistics:
Create Unencrypted |
10,750 ms |
Create Encrypted |
13,736 ms |
Cost Ratio |
1.28 : 1 |
Read Unencrypted |
436 ms |
Read Encrypted |
364,710 ms |
Cost Ratio |
836.5 : 1 |
Average Cost Ratio |
33.8 : 1 |
System 2 CPU Graph:
Note that the cycling during the “Read Encrypted” phase was consistent throughout the test. It did not show up when all filegroups were on the same physical drive.
Here is the same test on the same computer when all filegroups were on C:
System 2 Statistics (single hard-drive):
Create Unencrypted |
3,170 ms |
Create Encrypted |
11,220 ms |
Cost Ratio |
3.54 : 1 |
Read Unencrypted |
406 ms |
Read Encrypted |
342,433 ms |
Cost Ratio |
843.4 : 1 |
Average Cost Ratio |
98.9 : 1 |
System 2 Statistics (two physical hard-drives):
Create Unencrypted |
2,970 ms |
Create Encrypted |
11,033 ms |
Cost Ratio |
3.71 : 1 |
Read Unencrypted |
406 ms |
Read Encrypted |
338,196 ms |
Cost Ratio |
833.0 : 1 |
Average Cost Ratio |
103.4 : 1 |
As you can see, the write performance is not bad, but the read performance is significantly slower. The CPU hit is unacceptable.
One Running Change
- Now setting “READ_COMMITTED_SNAPSHOT ON” based on this posting by Lubor Kollar. (What do you think about NOLOCK and the arguments against it?)
For Next Time
Change all the EncryptByCert and DecryptByCert calls to EncryptByKey and DecryptByKey (symmetric encryption) and use the tests created here to allow direct comparisons. Encryption and decryption using symmetric keys is the recommended approach when dealing with large amounts of data. Here is the quote from the Books Online’s coverage of EncryptByKey:
Symmetric encryption and decryption is relatively fast, and is suitable for working with large amounts of data.
One very serious limitation here is that EncryptByCert, EncryptByKey, EncryptByAsymKey, and EncryptByPassPhrase all create different results (ciphertext) each time they are called. We will discuss message authentication codes as a way to allow indexed look-ups of encrypted data.
Not covered yet: Last month, I said “I really don’t like passing actual numbers around for StatusCd. I will change that to use UDFs to translate between numbers and their corresponding status strings.” We’ll get to that eventually, but for now, the focus is on encryption.
References
NOLOCK
- Microsoft SQL Server Development Customer Advisory Team: Previously committed rows might be missed if NOLOCK hint is used
- Craig Freedman’s WebLog: Query Failure with Read Uncommitted
Download the code.