Oracle Database 11g: Working with the Data Masking Pack

In order to be effective, developers and the team performing system and QA testing need to have a test system that is realistic in terms of both volumes and look and feel.  For example, it’s hard to do effective performance testing on a system that has 10 to 25 percent of a true production system.  It’s also challenging to do testing on systems where names are obviously made up (although I kind of liked my cartoon based names from the past – Marvin the Martian was always a particular favorite LOL).

A frequent request made of the DBAs is to provide a full clone of a production system that will then be used for a development and/or testing system.  While Oracle provides great tools (primarily RMAN) to do cloning…just giving a straightforward full clone of production, as is, is one of those actions that makes (and should make) IT Security folks shudder with fear.

At the same time, almost every production database has at least some sensitive data in it and the more copies of this data that exist in our organizations, the higher the risks of some sort of accidental exposure.  Corporate rules often mandate that sensitive data is protected and secured at all times.

Needless to say, these two requirements pretty well directly conflict with one another.  In order to address the balance between needing to protect the data and the requirement to provide realistic test data, some form of data masking (modifying the data in the development/test system) is required.  Manual procedures, such as writing our own code, have often been used to modify the original data.  The drawback to most manual approaches is that the possibility does exist that the masking process could potentially be “reverse” engineered.

Oracle has introduced their Data Masking Pack as a new feature in Oracle Database 11g that provides a permanent, one way mask of sensitive data.  The Data Masking Pack is a separately licensed Oracle Enterprise Manager pack that has been included with both Database Control and Grid Control in Oracle Database 11g R2.

If you are using Data Masking through EM Grid Control you need to make sure the DM_FMTLIB package has been installed in the target database.  They can be created by running the following two supplied scripts (copy them to the target system if necessary).

$ORACLE_HOME/sysman/admin/emdrep/sql/db/latest/masking/dm_fmtlib_pkgdef.sql

and

$ORACLE_HOME/sysman/admin/emdrep/sql/db/latest/masking/dm_fmtlib_pkgbody.plb

and ensure that the DM_FMTLIB package is owned by the DBSNMP schema.

The most important feature of the Data Masking Pack is that it provides an irreversible method for replacing the original data with realistic looking “scrubbed” data.  A second great feature is that a sharable library of masking formats can be used to ensure that any masking rules and processes are applied consistently to all production data in an organization. It also means that custom masking formats only need to be developed once.

Because this is a non-reversible activity – ALWAYS make sure that you are running masking jobs against a clone of the production database.

Built In Data Masking Formats

Oracle provides many built in Masking Formats as part of their library.  These include more complex masks such as

  • American Express CC Numbers
  • MasterCard CC Numbers
  • Visa CC Numbers
  • Discover CC Numbers
  • Generic CC Numbers
  • National Insurance Numbers
  • Social Security Numbers
  • Social Insurance Numbers
  • ISBN Numbers
  • UPC Numbers
  • US Phone Numbers

Oracle also has “primitive” masks such as:

  • Array List (a list of values that should be selected randomly)
  • Fixed Number
  • Fixed String
  • Random Dates (a range can be specified)
  • Random Digits (a range can be specified)
  • Random Numbers ( a range can be specified)
  • Random Strings (literals in a range)

There are also some built in routines such as:

  • Shuffle (the original data)
  • Substitute (deterministic masking that ensures the same value in yields the same result)
  • Substring
  • Table Column (select a random entry from another table)

Beyond the built in masks, we have the ability to create our own custom formats (which will be covered in my next article).

Steps Performed by A Data Masking Job

When a data masking job is executed against the database the following series of steps is performed.

1)         A mapping table is built for each column being masked, which contains the following two columns – original_value, masked_value

2)         Drop the constraints and grants on the table

3)         Rename the table

4)         Create a new table using the original table and the mapping table(s)

5)         Create indexes on the new table

6)         Gather new statistics on the table

7)         Replace the constraints and grants on the table

8)         Drop the indexes on the original table and the original table itself with the purge option

9)         Drop the mapping tables created in step one

In order to help with the performance, the data masking job takes advantage of bulk operations, disables logging and runs in parallel.  Because the original table with the original data is dropped with the purge option the sensitive data is removed completely.  

For testing purposes only a DBA could override the behavior of disabling logging (both redo and flashback logging are disabled by default while a masking job runs).  Additionally the automatic gathering of new statistics could be turned off, the automatic drop of the mapping tables could be turned off and the degree of parallel processing can be controlled.

Data Masking Reports for Auditors in Grid Control

While there are no predefined reports for Data Masking in Enterprise Manager Grid Control, it is very easy to create Data Masking reports if they are required.  The steps to create a report are:

1)         Click on the reports tab

2)         Click Create

3)         Specify what you want for your report title

4)         Select “Security” from the Category drop down list (or create a custom category if preferred)

5)         Select “Security Policy Overview” for the Sub-Category (or create a custom sub-category if preferred)

6)         On the Elements page, click Add

7)         Add a Table From SQL and click OK

8)         Enter a header and the SQL, continue and OK

Here are some query examples.  (Select on the tables must be given to the MGMT_VIEW user account)

This first query would show the information about the masking job(s) themselves including which database they ran in, the name of the job, which account ran it, when it ran and how long it took.

SELECT        ds.source_name as "Database", 
                        ds.ss_name as "Mask Defn", 
                        ds.ss_owner as "Owner", 
                        je.submission_ts as "Run Date", 
                        j.job_owner as "Job Submitted By", 
                        (js.end-time - js.start-time) as "Job Duration" 
FROM                        sysman.mgmt_dm_scopespecs ds,
                        sysman.mgmt_dm_job_executions je,
                        sysman.mgmt_job j,
                        sysman.mgmt_job_exec_summary js
WHERE         ds.ss_guid = je.ss_guid
AND               j.job_id = js.job_id
AND               je.execution_id = js.execution_id
AND               ds.source_id = ??EMID_BIND_TARGET_GUID??

This query could be used to provide information on what was actually masked and which mask formats were used

SELECT        ds.ss_name as "Mask Defn",
                        sc.table_schema as "Schema",
                        sc.table_name as "Table",
                        sc.column_name as "Column",
                        re.entry_order as "Step",
                        decode (re.rule_type,          'RN','Random Number',
                                                                        'CC','Table Column',
                                                                        'DT','Random Date',
                                                                        'AL', Array List',
                                                                        'FN','Fixed Number',
                                                                        'RD','Random Digits',
                                                                        'RS','Random String',
                                                                        'SH','Shuffling',
                                                                        'UF','User Function',
                                                                        'UT','Post Process'
                        as "Format Type",
                        decode (re.rule_type,          'RN', 'Start Length: '||rule_low||' End Length: '||rule_high,
                                                                        'CC','Schema Owner: '||re.table_schema||' Source table and column: '||re.table_name||'.'||e.column_name,
                                                                        'AL',da.array_list.item,
                                                                        'FS','Fixed String',
                                                                        'RD','Start Digits:,||rule_low||' End Digits: '||rule_high,
                                                                        null
                        as "Format Mask Parameters"
FROM                        sysman.mgmt_dm_ruleentry re,
                        sysman.mgmt_dm_scopespecs ds,
                        sysman.mgmt_dm_ss_columns sc,
                        sysman.mgmt_dm_alitems da,
                        sysman.mgmt_dm_job_executions je
WHERE         re.rule_guid = sc.rule_guid
AND               re.rule_guid = a.rule_guid (+)
AND               re.entry_order = a.entry_order (+)
AND               ds.ss_guid = je.ss_guid
AND               ds.ss_guid = sc.ss_guid
AND               ds.source_id = ??EMIP_BIND_TARGET_GUID??
ORDER BY   ds.ss_name,
                        sc.table_schema,
                        sc.table_name,
                        sc.column_ name,
                        re.entry_order;

Next month we’ll take a detailed look at creating custom format masks, sharing the format mask libraries and creating and running a data masking job.  Until then…

See all articles by Karen Reliford

Karen Reliford
Karen Reliford
Karen Reliford is an IT professional who has been in the industry for over 25 years. Karen's experience ranges from programming, to database administration, to Information Systems Auditing, to consulting and now primarily to sharing her knowledge as an Oracle Certified Instructor in the Oracle University Partner Network. Karen currently works for TransAmerica Training Management, one of the foremost Oracle Authorized Education Centers (OAEC) in the Oracle University North America region. TransAmerica Training Management offers official Oracle and Peoplesoft Training in Coral Gables FL, Fayetteville AR, Albuquerque NM, Providence RI and San Juan PR. Karen has now been teaching Oracle for Oracle University for more than 15 years. Karen has attained her Certified Technical Trainer designation along with several Oracle certifications including OCP-DBA, OCP-Internet Developer, Oracle Expert - Oracle 10g RAC and Oracle Expert - Oracle Application Express (3.2). Additionally, Karen achieved her Oracle 10g Oracle Certified Master (OCM) in 2008. Karen was raised in Canada, and in November 2009 became a US Citizen. Karen resides in Columbus OH with her husband, Ron along with their 20 pets, affectionately referred to as the "Reliford Zoo".

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles