Introduction
Data security is a major requirement for any application. Every time we create an application, we must follow all security policies before going live with that application.
In SQL Server, various methods are available to implement data security: data encryption at the database level using TDE, cell level data encryption, always encryption, and many more. All these methods have their own pros and cons. There will always be a tradeoff between data security and performance in data read/write operations. These methods ensure data security, and comply with all expected security policies, however they impact overall application performance.
Microsoft introduced a new feature, dynamic data masking, which is available in SQL Server 2016 onwards, on both on-premise and Azure SQL database versions.
Dynamic Data masking (DDM) is one of the methods to hide sensitive information from non-privileged users. This feature helps to mask the data in a specific format without exposing the complete data. The best part of this feature is that it does not have any impact on the application; the data will be partially or fully masked and exposed to users to fulfil their needs.
As an example, a table contains an SSN number and an application support engineer may identify a user by the last four digits of the social security number, without exposing the full SSN to the support engineer. A masking rule can be defined that masks all the digits not to be exposed to support engineer but the last four digits of the social security number return in the result set of queries. Similarly, as per application needs, other data such as email addresses or credit card numbers, etc. can also be masked.
SQL Server provides four functions to mask the data and there are multiple ways to implement data masking on any column data. The generic method is T-SQL commands to implement dynamic data masking on both On-premise and Azure database. Also, the Azure table configuration blade can be leveraged to implement DDM on a column in Azure SQL Database.
SQL Server engine uses a different character or value for masking the original value of a designated column to be masked. The value or character used varies, based on the data type of the designated data type of the column to be masked. Refer to the table below to understand the character or value of data mask.
Data Type |
SQL Server Data Type |
Mask Value |
String |
char, nchar, varchar, nvarchar, text, ntext |
X |
Number |
bigint, bit, decimal, int, money, numeric, smallint, smallmoney, tinyint, float, real |
0 |
Date Time |
date, datetime2, datetime, datetimeoffset, smalldatetime, time |
01.01.1900 00:00:00.0000000 |
Binary |
binary, varbinary, image |
single byte of ASCII value 0 |
Data Masking functions
default() : The default() function masks the full value of a column. The masked character will appear as per data type of the designated field to be masked. If the data type of the column is string then the mask function replaces the original value with X. If the data type of the column is a number then the mask function will replace the original value with 0 (zero). Refer to the above table to understand the relation between masking character and SQL server data types.
Example
Original Value in Column |
Masked Value for Non-privileged Users |
Definition Syntax with Function |
123.456.1234 |
XXXX |
Phone# varchar(12) MASKED WITH (FUNCTION = ‘default()’) NULL |
email() : The email() function helps in masking the email address stored in a column. The function will mask the whole string of the email address except the first letter of the email, the special character of address (@), and constant suffix of the email address (.com).
Example
Original Value in Column |
Masked Value for Non-privileged Users |
Definition Syntax with Function |
abcd@outlook.com |
Email varchar(80) MASKED WITH (FUNCTION = ’email()’) NULL |
random() : The random() function is used on any numeric type to mask the original value with a random value within a specified range.
Example
Original Value in Column |
Masked Value for Non-privileged Users |
Definition Syntax with Function |
99000 |
321 |
employee_salary int MASKED WITH (FUNCTION = ‘random([start range], [end range])’) employee_salary int MASKED WITH (FUNCTION = ‘random([101], [999)’) |
partial() : The partial () function gives the option to customize the masking as per business needs. The function exposes the first and last letters and adds a custom padding string in the middle. If the original value in the column is too short to complete the defined mask rule, part of the prefix or suffix will not be exposed. The partial() function is useful to mask SSN or credit card numbers. Using this mask function, we can define the rule to expose only the last 4 digits of the whole number.
Example
Original Value in Column |
Masked Value for Non-privileged Users |
Definition Syntax with Function |
123-45-6789 |
XXX-XX-6789 |
SSN varchar(11) MASKED WITH (FUNCTION = ‘partial(prefix, [padding],suffix)’) NULL SSN varchar(11) MASKED WITH (FUNCTION = ‘partial(0,”XXX-XX-“,4)’) |
Overall, DDM is the method to mask your data, while not exposing sensitive data to non-privileged users. This method is a great way to limit data exposure without impacting the application. The most important benefit of DDM is that it doesn’t change data at a physical level in the table or column and data masking can be achieved without introducing any change in the application. In future articles, we will discuss how T-SQL commands and the Azure table configuration blade helps in implementing DDM.