Exploring SQL Server 2016 Dynamic Data Masking – Part Two – Masking and Exporting Data in Existing Tables

In my last article I introduced you to Dynamic Data Masking: http://www.databasejournal.com/features/mssql/exploring-sql-server-2016-dynamic-data-masking-part-one-creating-a-table-that-uses-dynamic-data-masking.htm.  In that article I show you how to create and populate a table where you wanted to mask one or more columns.  In this article I will continue on and explore some of the other aspects of using the Dynamic Data Masking feature that is being introduced with SQL Server 2016.

Adding Masking Rules for an Existing Table

There might be a time when you start a data masking project where the columns you want to mask are already in an existing table.  If this is the case then you will need to ALTER your table to add a masking rule to those columns you want to mask.  In order to show you how to do this I will first need to create and populate a table with some data, where this new table doesn’t contain any masking rules.  To create that table I will use the following TSQL code:

USE tempdb;
GO
CREATE TABLE MyData (
   TransactionID int identity, 
   AccountNum int, 
   CreditCardNum varchar(19)  
         );
INSERT INTO MyData (AccountNum, CreditCardNum) 
VALUES (123456, '1234-1234-5678-9012'),
       (999999, '9999-9999-9999-9999');

By reviewing this code you can see that I created a table named MyData and then I populated it with two rows.  Note that my table was created where no columns had masking rules applied to them.  To prove that a user can see all these columns that have unmasked values let me run the following code:

USE tempdb;
	GO
CREATE USER MyDataReader WITHOUT LOGIN;
GRANT SELECT ON MyData TO MyDataReader;
EXECUTE AS USER = 'MyDataReader';
SELECT AccountNum, CreditCardNum FROM MyData;
REVERT;
 

In this code, I first create a database user named MyDataReader and provide this user SELECT access to the table MyData.  Once this user is created, I then execute a SELECT statement from MyData table using this new database user.  When the SELECT statement runs the following output is produced:

AccountNum  CreditCardNum
----------- -------------------
123456      1234-1234-5678-9012
999999      9999-9999-9999-9999
 

By reviewing this output you can see that the MyDataReader can see the CreditCardNum column values in clear text.  This is because currently this column doesn’t have any masking rules apply to it.  To add a masking rule to the CreditCardNum column in MyData table I run the following code:

USE tempdb;
GO
ALTER TABLE MyData ALTER COLUMN CreditCardNum
   ADD MASKED WITH (FUNCTION = 'partial(0,"XXXX-XXXX-XXXX-",4)');
 

This code altered the CreditCardNum column so only the last 4 characters of the card number are in clear text and the rest or the credit card number is masked with the value “xxxx-xxxx-xxxx-“. 

In order to verify that my CreditCardNum column is now masked using the masking rule I applied above I run the following TSQL Code.

USE tempdb;
GO
EXECUTE AS USER = 'MyDataReader';
SELECT AccountNum, CreditCardNum FROM MyData;
REVERT;

In this code I execute a SELECT statement to read data from MyData table using the security context of my new database user named MyDataReader.  When the SELECT statement runs the following output is produced:

 
AccountNum  CreditCardNum
----------- -------------------
123456      XXXX-XXXX-XXXX-4690
999999      XXXX-XXXX-XXXX-9998

Now you can see that the CreditCardNum is masked based on the masking rule defined in the ALTER TABLE statement above.   

As this section showed it is very easy to add a masking rule to a column in an existing table. 

How do Masking Rules Affect UPDATE Statements

This article and my prior article about Dynamic Data Masking has only shown you how the output of a SELECT statement will be affected when a column is defined with a Dynamic Data Masking rule.   In this section I will explore how having a masking rule on an existing column might cause you to lose the actual unmasked value when you update a table.

For my first example in this section I’m going to explore updating data records in a new table, with data from the table MyData that I created above.  To set up for this example I’m going to create, populate, and provide permission to my new table using the following code:

USE tempdb;
GO
CREATE TABLE MyUpdatableData (
   TransactionID int identity, 
   AccountNum int, 
   CreditCardNum varchar(19)
      MASKED WITH (FUNCTION = 'partial(0,"XXXX-XXXX-XXXX-",4)'),
   InsertDate datetime
         );
GO
INSERT INTO MyUpdatableData (AccountNum, CreditCardNum, InsertDate) 
VALUES (123456, '4444-4444-4444-4444', getdate()),
       (999999, '5555-5555-5555-5555', getdate());
GRANT SELECT, UPDATE ON MyUpdatableData TO MyDataReader

By reviewing this code you can see I created a new table named MyUpdatableData, where the CreditCardNum column has been defined with a masking rule.  I also populated this table with 2 rows of data, and granted SELECT and UDPATE rights to the user MyDataReader, the users I created in my prior example. 

To test what happens when the user MyDataReader updates the table MyUpdatableData with the data from the MyData table I run the following code:

USE tempdb;
GO
EXECUTE AS USER = 'MyDataReader';
UPDATE MyUpdatableData  
   SET CreditCardNum = O.CreditCardNum, InsertDate = getdate()
FROM MyUpdatableData AS N   
INNER JOIN 
MyData AS O
ON O.AccountNum = N.AccountNum;
SELECT AccountNum, CreditCardNum, InsertDate FROM MyUpdatableData;
REVERT;
SELECT AccountNum, CreditCardNum, InsertDate FROM MyUpdatableData;

When I run this code I get the following two output results:

AccountNum  CreditCardNum       InsertDate
----------- ------------------- -----------------------
123456      XXXX-XXXX-XXXX-9012 2016-02-25 06:24:57.837
999999      XXXX-XXXX-XXXX-9999 2016-02-25 06:24:57.837
 
(2 row(s) affected)
 
AccountNum  CreditCardNum       InsertDate
----------- ------------------- -----------------------
123456      1234-1234-5678-9012 2016-02-25 06:24:57.837
999999      9999-9999-9999-9999 2016-02-25 06:24:57.837
 
(2 row(s) affected)

Reviewing this output from the first SELECT statement, which is right after the update, confirms that the data in MyUpdatableData table was successfully updated with the CreditCardNum column values from MyData table. You can tell this because the two different CreditCardNum values don’t end in either 4444, or 5555, which are the originally values for this column prior to the update.   Also you can see that this data is masked appropriately for this user.  If you look at the output from the second SELECT statement, the one after the REVERT statement, you can see the unmasked value of the updated rows in the MyUpdatableData.   What this example demonstrates is a user that can only see masked data can still perform UPDATE statements as if the data was not masked.   

It is not true that all TSQL code will allow users that do not have UNMASK permission to perform UPDATE statements without losing the masked data they read from a source table.  To demonstrate this let me show you a different example.   First I will run this code to setup for my example:

USE tempdb;
GO
DROP TABLE MyUpdatableData;
CREATE TABLE MyUpdatableData (
   TransactionID int identity, 
   AccountNum int, 
   CreditCardNum varchar(19)
      MASKED WITH (FUNCTION = 'partial(0,"XXXX-XXXX-XXXX-",4)'),
   InsertDate datetime
        );
GO
INSERT INTO MyUpdatableData (AccountNum, CreditCardNum, InsertDate) 
VALUES (123456, '4444-4444-4444-4444', getdate()),
       (999999, '5555-5555-5555-5555', getdate());
GRANT SELECT, UPDATE ON MyUpdatableData TO MyDataReader;

The code above drops the table MyUpdatableData table, and then recreates it.  I ran this code so I could start with fresh data in this table.

To demonstrate that some TSQL code can lose the unmasked data values I will run the code below.  If I run this code I will actually lose the underlying unmask data when updating AccountNum 999999 with the CreditCardNum column using the CreditCardNum value that was read from the MyData table for the AccountNum column:

EXECUTE AS USER = 'MyDataReader';
DECLARE @CreditCardNum varchar(19);
SELECT @CreditCardNum = CreditCardNum FROM MyData;
UPDATE MyUpdatableData  
   SET CreditCardNum = @CreditCardNum, InsertDate = getdate()
FROM MyUpdatableData AS N   
WHERE AccountNum = 999999;
REVERT;
SELECT AccountNum, CreditCardNum, InsertDate FROM MyUpdatableData;

When I run this code I get the following output:

AccountNum  CreditCardNum       InsertDate
----------- ------------------- -----------------------
123456      4444-4444-4444-4444 2016-02-25 17:30:29.210
999999      XXXX-XXXX-XXXX-9999 2016-02-25 17:30:39.257

If you review this output, which was created by the SELECT statement after the REVERT statement, you can see that the CreditCardNum column now contains a masked value. Why did this happen on this UPDATE example and not the prior UPDATE example?  This occurred because the prior UPDATE example really didn’t bring any data back to the user.  The first UPDATE example consisted of just a single UPDATE statement.  Whereas the second example contained multiple statements.  The variable @CreditCardNum when populated under the security context of MyDataReader was populated with masked values from the MyData table.   That masked value in variable @CreditCardNum was then used to update the MyUpdatableData table.  Sure this is a contrived example, but it does demonstrate you need to be careful when you update a table from the context of a user that doesn’t have UNMASK rights, so you don’t lose the unmasked data complete.   

How Masked Data can Affect INSERT INTO Statement

If you are a user that doesn’t have UNMASK rights then you need to be aware of how only being able to see masked data may affect your INSERT INTO statements.  In order to understand how masked data can affect the outcome of an INSERT INTO statement let me run the following code:

SET NOCOUNT ON;
USE tempdb;
GO
CREATE TABLE MyData2 (
   TransactionID int identity, 
   AccountNum int, 
   CreditCardNum varchar(19)  
         );
GRANT SELECT, INSERT ON MyData2 TO MyDataReader;
EXECUTE AS USER = 'MyDataReader';
INSERT INTO MyData2 SELECT AccountNum, CreditCardNum FROM MyData;
REVERT;
SELECT * FROM MyData2;
 

When I run this code I get the following output:

TransactionID AccountNum  CreditCardNum
------------- ----------- -------------------
1             123456      XXXX-XXXX-XXXX-9012
2             999999      XXXX-XXXX-XXXX-9999

If you review the code above you will see that I created a table named MyData2 and then I granted SELECT and INSERT rights to the user MyDataReader.  I then executed an INSERT INTO command to copy data into the table MyData2 from the MyData table under the security rights of the MyDataReader.  Since the user MyDataReader can only see masked data in table MyData, the masked values for the CreditCardNum column where inserted into table MyData.  You can see this be reviewing the output above. 

This same problem can caused if I were to perform a SELECT INTO statement. I’ll leave it up to you to verify that masked data can be lost when a user without UNMASK permissions issues a SELECT INTO statement.  

Due to the fact you might lose masked data with an INSERT INTO or SELECT INTO statement, you need to be careful when using these two commands when processing data that is masked.

Summary

Dynamic Data Masking allows you to obscure your confidential data column values at the database engine level for both new and existing SQL Server data.  Being able to alter the definition of an existing column to add a masking rule makes it very simple to obscure your existing column values without even changing your application code.  Keep in mind when a database user doesn’t have UNMASK rights they could perform an INSERT or UPDATE statement that could lose the actual clear text values of masked data.   If you are looking for ways to obscure your confidential data from some user, but not all users than consider looking at Dynamic Data Masking as a way to meet that requirement.

See all articles by Greg Larsen

Gregory Larsen
Gregory Larsen
Gregory A. Larsen is a DBA at Washington State Department of Health (DOH). Greg is responsible for maintaining SQL Server and other database management software. Greg works with customers and developers to design and implement database changes, and solve database/application related problems. Greg builds homegrown solutions to simplify and streamline common database management tasks, such as capacity management.

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles