SqlCredit - Part 9: Message Authentication Codes

September 27, 2007

Download the files for this article.

This is part 9 of a series. If you have not read part 8, you can find it here.

The Problem: Searching Encrypted Data

Once we have encrypted the data in our table, we cannot efficiently search that data without making further changes. Remember from part 7 that “EncryptByCert, EncryptByKey, EncryptByAsymKey, and EncryptByPassPhrase all create different results (ciphertext) each time they are called.”

The solution is to use Message Authentication Codes. Note that this is a special use of the general term used for messaging.

Hashing is different from encryption in that it always returns the same ciphertext. If I run this code:

	SELECT HashBytes('MD5', 'secret')

I get this ciphertext:

	0x5EBE2294ECD0E0F08EAB7690D2A6EE69

Running it again returns the same result. Running it on a different machine also returns the same result.

Different algorithms return different ciphertext. For example, if I run this code:

	SELECT HashBytes('SHA1', 'secret')

I get this ciphertext:

	0xE5E9FA1BA31ECD1AE84F75CAAA474F3A663F05F4

HashBytes always returns varbinary results with the following length:

Algorithm      Length
MD2, MD4, MD5      16
SHA, SHA1      20

I created two UDTs (HashResultMD and HashResultSHA) based on the result lengths of the different algorithms. It would be very reasonable to use the maximum length and have only one type. Another option is to use just some of the bytes of the result. I have chosen here to use the complete result. It is important to understand that, even using the full result, there is not a guarantee that you will not have collisions.

What About Dictionary Attacks?

Because hashing creates consistent ciphertext, it would be possible to run a dictionary attack against a table and find values that match a given string. From the example above, we can see that any entry with ciphertext that matches “0x5EBE2294ECD0E0F08EAB7690D2A6EE69” was created by hashing the string “secret” using the MD5 algorithm.

The way to fix this is by using a “salt” value. If we can unencrypt a salt value using DecryptByKey (which we have seen is fast) and then pre-pend (or append) the string we’re trying to store, then a hacker can’t just hash a set of values and compare the ciphertext against the result.

The salt should be a reasonable length, but I don’t think you should expose the exact length in your code. If you create a UDT that allows for the storage of a hashed value prepended with a secret, the UDT should allow for some variance in the secret length.

Look-Up by Hashed Value

Reference the attached file EncryptByKeyPerfTest_Setup.sql.

Given a value (for example, CardNumber or SecureString), we can now perform efficient searches. Since hashing consistently returns the same ciphertext, we can first hash the look-up value and then find a record or records that match that value.

Note that SecurityCode1 does not have a corresponding MAC column in CardEncryptByKey. This is because there is no use case for looking up a record by its security code without its card number. If we use a MAC column to allow efficient look-ups of records, the security code column is used only for validation that the proper record has been found.

Create MAC columns only for columns that will be searched, and remember that the MAC supports efficient look-up and not exact filtering.

A Simple(ish) Example

Reference the attached file SimplestEncrWithMAC.sql.

It is easier to show code than explain everything behind it. Here, we build a table to store the salt and another to store data records:

	CREATE TABLE dbo.SecretSalt (
	    SaltID        tinyint        NOT NULL,
	    Salt          varbinary(100) NOT NULL
	
	    CONSTRAINT SecretSalt_PK PRIMARY KEY CLUSTERED (SaltID)
	);
	
	CREATE TABLE dbo.Hash1 (
	    RecordID      int            NOT NULL,
	    EncryptResult varbinary(116) NOT NULL,  -- Encrypted
	    HashResult    varbinary(20)  NOT NULL   -- Hashed
	
	    CONSTRAINT Hash1_PK PRIMARY KEY CLUSTERED (RecordID)
	);
	
	CREATE NONCLUSTERED INDEX Hash1_HashResult_IDX ON dbo.Hash1 (
	    HashResult
	);

First, let’s add the “salt” record.

	INSERT INTO dbo.SecretSalt (
	    SaltID,
	    Salt
	)
	SELECT
	    1,
	    EncryptByKey(
	        Key_GUID('TestKey1'),
	        'd=s_4ZeG3me4E#U4ru6ag&@ru@A2Afr7'
	    );

Next, add records to Hash1. We run this code within a loop to create n records.

	SELECT @string1 = 'Simple' + CONVERT(varchar(11), @ct);
	
	INSERT INTO dbo.Hash1 (
	    RecordID,
	    EncryptResult,
	    HashResult
	)
	VALUES (
	    @ct,
	    EncryptByKey(Key_GUID (N'TestKey1'), @string1),
	    HashBytes('SHA1', @string1)
	);

The code will add records that look like this:

	RecordID EncryptResult              HashResult
	-------- -------------------------- -------------------
	       1 0x00EC5555DE58...FFEDCA8C7 0xF71C033...4C307B5
	       2 0x00EC5555DE58...D73349886 0x7A66A82...A6B4DBC

Now, we can look up a single record that matches an expected string. We pick the string “5” which should match the fifth record and return a RecordID of 5.

	DECLARE @lookUp int;
	SELECT @lookUp = 5;     -- Pick some record to search for
	
	SELECT
	    h.RecordID,
	    CONVERT(int, CONVERT(varchar(11), DecryptByKey(h.EncryptResult)))
	        AS 'Decrypted Value'
	FROM dbo.Hash1            AS h
	CROSS JOIN dbo.SecretSalt AS s
	WHERE s.SaltID = 1
	  AND h.HashResult = HashBytes (
	    'SHA1',
	    CONVERT(varchar(63), DecryptByKey(s.Salt)) + CONVERT(varchar(11), @lookUp)
	  )	-- This uses the index to find the record
	  AND CONVERT(int, CONVERT(varchar(11), DecryptByKey(h.EncryptResult))) = @lookUp;

The WHERE clause has three parts:

1.      Even though there is only one record in the SecretSalt table, tell it which one to use.

2.      Allow the optimizer to use the index on the varbinary hash column.

3.      Validate that the correct record or records are returned.

All these parts are required for an efficient and reliable look-up. This returns the exact record we expected.

What do you think? If you agree, fine. If not, provide feedback in the forum. I have added a new thread called MAC Code. This code is complex. If you find a more straightforward way of doing this, please share.

Limitations and Options

Message authentication codes work only for finding an exact match. It does not support range queries or wildcards.

Using a message authentication code column allows a hacker to determine values that are the same. As an example, if we salted and then hashed two credit card numbers that were the same, the ciphertext would be exactly the same.

You may choose to not use the full ciphertext returned. For instance, the varbinary result (16 or 20 bytes) can be converted to an integer (4 bytes) or bigint (8 bytes).

Whether you use the full ciphertext or a portion of it, you still have to validate that the encrypted value matches the value you are searching for because of the possibility of hash collisions. An option is to SELECT records that match the hashed value and only validate the encrypted data if the SELECT returns more than one record.

For Next Time

Next month, I will discuss the performance results of the combination of encryption and hashing. I will also update the SqlCredit code to include all these changes.

References

Download the files for this article.

Discuss this article in the SQLCredit Forum.

» See All Articles by Columnist Rob Garrison

SqlCredit - Developing a Complete SQL Server OLTP Database Project








The Network for Technology Professionals

Search:

About Internet.com

Legal Notices, Licensing, Permissions, Privacy Policy.
Advertise | Newsletters | E-mail Offers