In my article last month we started looking at using the Data Masking Pack in order to address the balance between needing to protect the data and the requirement to provide realistic test data. Oracle has introduced their Data Masking Pack in Oracle Database 11g that provides a permanent, one way mask of sensitive data. The Data Masking Pack is a separately licensed Oracle Enterprise Manager pack that has been included with both Database Control and Grid Control in Oracle Database 11g R2.
Because this is a non-reversible activity – ALWAYS make sure that you are running masking jobs against a clone of the production database.
Creating a Custom Masking Format
Oracle provides many built in Masking Formats as part of their library, however there will be times that creating a custom mask will be necessary. A custom mask can be created as part of a single Masking Definition job. This approach works fine if it’s a mask that will be used once and never have to be re-used.
A better approach is to take advantage of the Masking Library – where we can create a custom masking format, and save it to be shared by multiple jobs. With EM Grid Control a customized masking format can even be shared between databases. The advantages include creating and maintaining the mask once and the ability to have a consistent mask applied across the IT infrastructure.
Let’s look at how to create a custom mask in the library.
From the database home page in Enterprise Manager, select the “Schema” tag. From here you will see the two links for working with the Data Masking features.
- Definitions – used for generating the data masking scripts and submitting the job
- Format Library – used to create and store custom format masks
The database home page in Enterprise Manager
Select the Format Library Link in the Data Masking section
Select the Format Library Link
You will see here the list of some of the Oracle supplied format masks. Let’s say we need to create a format mask for Driver’s License numbers. And for argument’s sake, we have the following guidelines:
The format is Ann-annnnn – where the first alpha must be one of the following letters – R, M, C, S. That is followed by two digits and a hyphen. The 2nd half is any alpha followed by four digits.
First, we’ll click create and give the new format a name (here I called it MyDLNumber) and put in a simple description.
Database Control: Create Format
The next step(s) require that we break down the DL into individual steps. The order that we create each of the format entries is important as it will dictate the way the final mask is generated.
Our first character is a letter, and one of a very specific list. A simple way to build this step is to use the Array List option. Select Array List from the Add drop down menu and click Go.
In the Array List screen, list the values you want to restrict the entry to, separated by commas. Then click on OK.
The Array List option
For the next step, select Random Digits and enter 2 for the starting and ending length.
Then, we use a Fixed String to enter the hyphen, Random Strings for the next alpha, and Random Digits with a starting and ending length of 4 for the last section.
Format Entries
At the bottom of the screen you will see some samples of the masked data so that you can verify that the format mask is being properly defined. Once you have built and verified the mask click on OK.
Format Library
You should now see your custom format mask in the library. Now that we have the format mask created, we will take a look at creating and running a Data Masking job.
Running a Data Masking Job
I have a small table that I created and populated (with fake data) to show the before and after results from Data Masking. Remember, Data Masking is an irreversible action, so be sure you are working on a clone of any real production data before actually submitting the job.
As part of the job creation, you will actually see the option to include cloning the database as one of the steps. Once you have cloned the database, and masked all of the sensitive data, it’s this clone that can be used to create development, test and training databases.
In this table called “MyPeople” you will see the columns id (to be masked), last name and first name (which I won’t mask for this demo), a phone number (to be masked) and a Driver’s License number (to be masked using the new entry previously created).
MyPeople Table
To start a Data Masking job, navigate to the Schema page and select Definition from the Data Masking area and click on Create.
Start a Data Masking job
Data Masking is applied at a column level and columns from different tables can all be incorporated into one single Data Masking job if desired. After assigning a name for the definition we begin by clicking on the Add button under the Columns Section.
Click the add button
On this page select the schema, table and columns from that table that you wish to mask. Once the proper columns are selected, click on the Add button.
Once the columns have been added to the list, we will be able to click on the configure icon under the heading Format. A red wrench indicates that the format mask has yet to be configured. A blue pencil indicates that it has been set up (but can still be edited if necessary).
Configure Format Mask
To assign the format mask for the Phone column – select the configure icon – you will be directed to the Definition page.
Definition Page
Because we want to use one of the formats from the library, click on the Import Format button and select the US Phone Number option (I used the formatted US phone). Likewise assign our custom Driver’s License mask to the Drivers column. For the ID column I simply added a Random Number entry with a start and end length of five. For the names I selected the option not to mask them. I could have also simply left them off of the list as an alternative.
Select Masking Definition
Once everything has been designed, the next step is to go ahead and generate the script that will be executed to actually perform the data masking. Note that there is also a button here that would allow you to incorporate cloning the database as one of the steps of doing the masking.
Script Generation Results
Once the script is generated, you can go ahead and schedule the job (or schedule a job to clone and mask). Additionally the script can be saved to be run later.
After submitting the job you should see the new masked data when you run a query against the tables.
The new masked data
As you will notice, the 5 digit ID’s have been assigned, the phone numbers were modified as were the Driver’s License columns.
In conclusion
The Oracle Data Masking feature is a powerful tool that DBA’s can use (with input from the IT Security and/or Audit departments) that allows us to provide fabricated but realistic looking data to programmers, testers, training developers or others that legitimately require access to realistic volumes of data and realistic looking data without compromising the security of any sensitive data.
Easily a feature worth exploring.