Download the files for this article.
This is part 9 of a series. If you have not read part 8, you
can find it here.
The Problem: Searching Encrypted Data
Once we have encrypted the data in our table, we cannot
efficiently search that data without making further changes. Remember from part 7
that “EncryptByCert, EncryptByKey, EncryptByAsymKey, and EncryptByPassPhrase
all create different results (ciphertext) each time they are called.”
The solution is to use Message Authentication Codes. Note
that this is a special use of the general term used for messaging.
Hashing is different from encryption in that it always
returns the same ciphertext. If I run this code:
SELECT HashBytes('MD5', 'secret')
I get this ciphertext:
Running it again returns the same result. Running it on a
different machine also returns the same result.
Different algorithms return different ciphertext. For
example, if I run this code:
SELECT HashBytes('SHA1', 'secret')
I get this ciphertext:
HashBytes always returns varbinary results with the
|MD2, MD4, MD5||16|
I created two UDTs (HashResultMD and HashResultSHA) based on
the result lengths of the different algorithms. It would be very reasonable to use
the maximum length and have only one type. Another option is to use just some
of the bytes of the result. I have chosen here to use the complete result. It
is important to understand that, even using the full result, there is not a guarantee
that you will not have collisions.
What About Dictionary Attacks?
Because hashing creates consistent ciphertext, it would be
possible to run a dictionary attack against a table and find values that match
a given string. From the example above, we can see that any entry with
ciphertext that matches “0x5EBE2294ECD0E0F08EAB7690D2A6EE69” was created by
hashing the string “secret” using the MD5 algorithm.
The way to fix this is by using a “salt” value. If we
can unencrypt a salt value using DecryptByKey (which we have seen is fast) and
then pre-pend (or append) the string we’re trying to store, then a hacker can’t
just hash a set of values and compare the ciphertext against the result.
The salt should be a reasonable length, but I don’t think
you should expose the exact length in your code. If you create a UDT that
allows for the storage of a hashed value prepended with a secret, the UDT
should allow for some variance in the secret length.
Look-Up by Hashed Value
Reference the attached file EncryptByKeyPerfTest_Setup.sql.
Given a value (for example, CardNumber or SecureString), we
can now perform efficient searches. Since hashing consistently returns the same
ciphertext, we can first hash the look-up value and then find a record or
records that match that value.
Note that SecurityCode1 does not have a corresponding MAC
column in CardEncryptByKey. This is because there is no use case for looking up
a record by its security code without its card number. If we use a MAC column
to allow efficient look-ups of records, the security code column is used only
for validation that the proper record has been found.
Create MAC columns only for columns that will be searched,
and remember that the MAC supports efficient look-up and not exact
A Simple(ish) Example
Reference the attached file SimplestEncrWithMAC.sql.
It is easier to show code than explain everything behind it.
Here, we build a table to store the salt and another to store data records:
CREATE TABLE dbo.SecretSalt (
SaltID tinyint NOT NULL,
Salt varbinary(100) NOT NULL
CONSTRAINT SecretSalt_PK PRIMARY KEY CLUSTERED (SaltID)
CREATE TABLE dbo.Hash1 (
RecordID int NOT NULL,
EncryptResult varbinary(116) NOT NULL, — Encrypted
HashResult varbinary(20) NOT NULL — Hashed
CONSTRAINT Hash1_PK PRIMARY KEY CLUSTERED (RecordID)
CREATE NONCLUSTERED INDEX Hash1_HashResult_IDX ON dbo.Hash1 (
First, let’s add the “salt” record.
INSERT INTO dbo.SecretSalt (
Next, add records to Hash1. We run this code within a loop
to create n records.
SELECT @string1 = ‘Simple’ + CONVERT(varchar(11), @ct);
INSERT INTO dbo.Hash1 (
EncryptByKey(Key_GUID (N’TestKey1′), @string1),
The code will add records that look like this:
RecordID EncryptResult HashResult
——– ————————– ——————-
1 0x00EC5555DE58…FFEDCA8C7 0xF71C033…4C307B5
2 0x00EC5555DE58…D73349886 0x7A66A82…A6B4DBC
Now, we can look up a single record that matches an expected
string. We pick the string “5” which should match the fifth record and return a
RecordID of 5.
DECLARE @lookUp int;
SELECT @lookUp = 5; — Pick some record to search for
CONVERT(int, CONVERT(varchar(11), DecryptByKey(h.EncryptResult)))
AS ‘Decrypted Value’
FROM dbo.Hash1 AS h
CROSS JOIN dbo.SecretSalt AS s
WHERE s.SaltID = 1
AND h.HashResult = HashBytes (
CONVERT(varchar(63), DecryptByKey(s.Salt)) + CONVERT(varchar(11), @lookUp)
) – This uses the index to find the record
AND CONVERT(int, CONVERT(varchar(11), DecryptByKey(h.EncryptResult))) = @lookUp;
The WHERE clause has three parts:
though there is only one record in the SecretSalt table, tell it which one to
the optimizer to use the index on the varbinary hash column.
that the correct record or records are returned.
All these parts are required for an efficient and reliable look-up.
This returns the exact record we expected.
What do you think? If you agree, fine. If not, provide
feedback in the forum. I have added a new thread called MAC Code.
This code is complex. If you find a more straightforward way of doing this,
Limitations and Options
Message authentication codes work only for finding an exact match.
It does not support range queries or wildcards.
Using a message authentication code column allows a hacker
to determine values that are the same. As an example, if we salted and then
hashed two credit card numbers that were the same, the ciphertext would be
exactly the same.
You may choose to not use the full ciphertext returned. For
instance, the varbinary result (16 or 20 bytes) can be converted to an integer (4
bytes) or bigint (8 bytes).
Whether you use the full ciphertext or a portion of it, you
still have to validate that the encrypted value matches the value you are
searching for because of the possibility of hash collisions. An option is to
SELECT records that match the hashed value and only validate the encrypted data
if the SELECT returns more than one record.
For Next Time
Next month, I will discuss the performance results of the
combination of encryption and hashing. I will also update the SqlCredit code to
include all these changes.
2005: searching encrypted data from Laurentiu Cristofor’s blog
Hashing to Obscure Sensitive Data from the Microsoft SQL Server Development
Customer Advisory Team blog
Download the files for this article.
Discuss this article in the SQLCredit Forum.