Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

Oracle

Posted March 25, 2013

Oracle Database 11g: Working with the Data Masking Pack

By Karen Reliford

In order to be effective, developers and the team performing system and QA testing need to have a test system that is realistic in terms of both volumes and look and feel.  For example, it's hard to do effective performance testing on a system that has 10 to 25 percent of a true production system.  It's also challenging to do testing on systems where names are obviously made up (although I kind of liked my cartoon based names from the past - Marvin the Martian was always a particular favorite LOL).

A frequent request made of the DBAs is to provide a full clone of a production system that will then be used for a development and/or testing system.  While Oracle provides great tools (primarily RMAN) to do cloning...just giving a straightforward full clone of production, as is, is one of those actions that makes (and should make) IT Security folks shudder with fear.

At the same time, almost every production database has at least some sensitive data in it and the more copies of this data that exist in our organizations, the higher the risks of some sort of accidental exposure.  Corporate rules often mandate that sensitive data is protected and secured at all times.

Needless to say, these two requirements pretty well directly conflict with one another.  In order to address the balance between needing to protect the data and the requirement to provide realistic test data, some form of data masking (modifying the data in the development/test system) is required.  Manual procedures, such as writing our own code, have often been used to modify the original data.  The drawback to most manual approaches is that the possibility does exist that the masking process could potentially be "reverse" engineered.

Oracle has introduced their Data Masking Pack as a new feature in Oracle Database 11g that provides a permanent, one way mask of sensitive data.  The Data Masking Pack is a separately licensed Oracle Enterprise Manager pack that has been included with both Database Control and Grid Control in Oracle Database 11g R2.

If you are using Data Masking through EM Grid Control you need to make sure the DM_FMTLIB package has been installed in the target database.  They can be created by running the following two supplied scripts (copy them to the target system if necessary).

$ORACLE_HOME/sysman/admin/emdrep/sql/db/latest/masking/dm_fmtlib_pkgdef.sql

and

$ORACLE_HOME/sysman/admin/emdrep/sql/db/latest/masking/dm_fmtlib_pkgbody.plb

and ensure that the DM_FMTLIB package is owned by the DBSNMP schema.

The most important feature of the Data Masking Pack is that it provides an irreversible method for replacing the original data with realistic looking "scrubbed" data.  A second great feature is that a sharable library of masking formats can be used to ensure that any masking rules and processes are applied consistently to all production data in an organization. It also means that custom masking formats only need to be developed once.

Because this is a non-reversible activity - ALWAYS make sure that you are running masking jobs against a clone of the production database.

Built In Data Masking Formats

Oracle provides many built in Masking Formats as part of their library.  These include more complex masks such as

  • American Express CC Numbers
  • MasterCard CC Numbers
  • Visa CC Numbers
  • Discover CC Numbers
  • Generic CC Numbers
  • National Insurance Numbers
  • Social Security Numbers
  • Social Insurance Numbers
  • ISBN Numbers
  • UPC Numbers
  • US Phone Numbers

Oracle also has "primitive" masks such as:

  • Array List (a list of values that should be selected randomly)
  • Fixed Number
  • Fixed String
  • Random Dates (a range can be specified)
  • Random Digits (a range can be specified)
  • Random Numbers ( a range can be specified)
  • Random Strings (literals in a range)

There are also some built in routines such as:

  • Shuffle (the original data)
  • Substitute (deterministic masking that ensures the same value in yields the same result)
  • Substring
  • Table Column (select a random entry from another table)

Beyond the built in masks, we have the ability to create our own custom formats (which will be covered in my next article).

Steps Performed by A Data Masking Job

When a data masking job is executed against the database the following series of steps is performed.

1)         A mapping table is built for each column being masked, which contains the following two columns - original_value, masked_value

2)         Drop the constraints and grants on the table

3)         Rename the table

4)         Create a new table using the original table and the mapping table(s)

5)         Create indexes on the new table

6)         Gather new statistics on the table

7)         Replace the constraints and grants on the table

8)         Drop the indexes on the original table and the original table itself with the purge option

9)         Drop the mapping tables created in step one

In order to help with the performance, the data masking job takes advantage of bulk operations, disables logging and runs in parallel.  Because the original table with the original data is dropped with the purge option the sensitive data is removed completely.  

For testing purposes only a DBA could override the behavior of disabling logging (both redo and flashback logging are disabled by default while a masking job runs).  Additionally the automatic gathering of new statistics could be turned off, the automatic drop of the mapping tables could be turned off and the degree of parallel processing can be controlled.

Data Masking Reports for Auditors in Grid Control

While there are no predefined reports for Data Masking in Enterprise Manager Grid Control, it is very easy to create Data Masking reports if they are required.  The steps to create a report are:

1)         Click on the reports tab

2)         Click Create

3)         Specify what you want for your report title

4)         Select "Security" from the Category drop down list (or create a custom category if preferred)

5)         Select "Security Policy Overview" for the Sub-Category (or create a custom sub-category if preferred)

6)         On the Elements page, click Add

7)         Add a Table From SQL and click OK

8)         Enter a header and the SQL, continue and OK

Here are some query examples.  (Select on the tables must be given to the MGMT_VIEW user account)

This first query would show the information about the masking job(s) themselves including which database they ran in, the name of the job, which account ran it, when it ran and how long it took.

SELECT        ds.source_name as "Database", 
                        ds.ss_name as "Mask Defn", 
                        ds.ss_owner as "Owner", 
                        je.submission_ts as "Run Date", 
                        j.job_owner as "Job Submitted By", 
                        (js.end-time - js.start-time) as "Job Duration" 
FROM                        sysman.mgmt_dm_scopespecs ds,
                        sysman.mgmt_dm_job_executions je,
                        sysman.mgmt_job j,
                        sysman.mgmt_job_exec_summary js
WHERE         ds.ss_guid = je.ss_guid
AND               j.job_id = js.job_id
AND               je.execution_id = js.execution_id
AND               ds.source_id = ??EMID_BIND_TARGET_GUID??

This query could be used to provide information on what was actually masked and which mask formats were used

SELECT        ds.ss_name as "Mask Defn",
                        sc.table_schema as "Schema",
                        sc.table_name as "Table",
                        sc.column_name as "Column",
                        re.entry_order as "Step",
                        decode (re.rule_type,          'RN','Random Number',
                                                                        'CC','Table Column',
                                                                        'DT','Random Date',
                                                                        'AL', Array List',
                                                                        'FN','Fixed Number',
                                                                        'RD','Random Digits',
                                                                        'RS','Random String',
                                                                        'SH','Shuffling',
                                                                        'UF','User Function',
                                                                        'UT','Post Process'
                        as "Format Type",
                        decode (re.rule_type,          'RN', 'Start Length: '||rule_low||' End Length: '||rule_high,
                                                                        'CC','Schema Owner: '||re.table_schema||' Source table and column: '||re.table_name||'.'||e.column_name,
                                                                        'AL',da.array_list.item,
                                                                        'FS','Fixed String',
                                                                        'RD','Start Digits:,||rule_low||' End Digits: '||rule_high,
                                                                        null
                        as "Format Mask Parameters"
FROM                        sysman.mgmt_dm_ruleentry re,
                        sysman.mgmt_dm_scopespecs ds,
                        sysman.mgmt_dm_ss_columns sc,
                        sysman.mgmt_dm_alitems da,
                        sysman.mgmt_dm_job_executions je
WHERE         re.rule_guid = sc.rule_guid
AND               re.rule_guid = a.rule_guid (+)
AND               re.entry_order = a.entry_order (+)
AND               ds.ss_guid = je.ss_guid
AND               ds.ss_guid = sc.ss_guid
AND               ds.source_id = ??EMIP_BIND_TARGET_GUID??
ORDER BY   ds.ss_name,
                        sc.table_schema,
                        sc.table_name,
                        sc.column_ name,
                        re.entry_order;

Next month we'll take a detailed look at creating custom format masks, sharing the format mask libraries and creating and running a data masking job.  Until then...

See all articles by Karen Reliford



Oracle Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
Oracle Forum
Topic By Replies Updated
Oracle Data Mining: Classification jan.hasller 0 July 5th, 07:19 AM
Find duplicates - Unique IDs Lava 5 July 2nd, 08:30 AM
no matching unique or primary key rcanter 1 April 25th, 12:32 PM
Update values of one table based on condition of values in other table using Trigger Gladiator 3 February 29th, 06:01 PM