Pseudonymization Of Personal Data With Oracle

The General Data Protection Regulation from the European Commission, known as the GDPR and primarily applicable to the European Union, can affect enterprises outside of the European Union if they service EU customers. It is a law and compliance is mandatory. One provision of the GDPR involves the privacy of personal data and it offers two techniques to ensure that personal data cannot be obtained and used in ways that are not intended. The one often-discussed technique is pseudonymization, basically calling for protecting personal data through encryption or through tokenization. Oracle offers Transparent Data Encryption (TDE) as a way to implement pseudonymization. Let’s look at how that can protect personal data.

Personal data is considered as names, Social Security Numbers (in the U.S.), driver’s license numbers, addresses, birthdates, employee identification numbers, insurance account numbers or any data that can uniquely identify an individual. Such data is what the GDPR is intended to protect.

One way to make personal data obscured is through a one-way hashing algorithm; this, of course obfuscates the data but doesn’t provide a way to reverse the process. Two-way hashing, such as used in many password algorithms, also obfuscates data but does so based on accepted and published hashing algorithms, making the obscured data less secure as anyone with access to the internet can find the hashing algorithm and, through time and patience, possibly reverse the obfuscation making the personal data unprotected. Thus, hashing algorithms are not ideal choices for protecting personal data and do not meet the requirements set forth in the GDPR.

Pseudonymization requires that the obfuscation of personal data can’t easily be undone. The ‘key’ used to encrypt/decrypt personal data isn’t available or can’t easily be reverse-engineered. Compression, where repeating patterns in data are replaced with tokens, can fairly easily be undone as the tokens can be discerned by simply looking at the compressed results. The tokens are directly identified by their data patterns; this makes tokenization unsuitable for protecting personal data. Pseudonymization must be performed in such a manner as to make the keys undecipherable outside of the context of the encryption. TDE provides this by using an encryption wallet where the password is not decipherable from the encrypted results.

TDE operates at the data file level, encrypting the entire datafile regardless of the data stored inside. Access is granted through the database by opening the encryption wallet; the same password ‘unlocks’ all datafiles using TDE in a given database. The wallet password can’t be reverse engineered so if it’s not known the datafiles encrypted with it are useless. Of course, every datafile in a database can use TDE but that might be considered “over-kill’ since some data, like expiration dates, office locations, publicly available reports and similar data need not be encrypted as no personal information is available in them.

How can TDE satisfy the pseudonymization requirement of the GDPR? As an example, an Oracle database is configured with TDE and the PDI tablespace is encrypted with it. One night the datafiles for that tablespace are copied by a hacker in an attempt to steal all of the personal data stored in those files. Since a wallet is required to allow access to the data, and the password used for the encryption is not available to the hacker, simply creating a new wallet with a new password will fail to open those stolen datafiles, making those datafiles useless.

Other utilities can also be used to encrypt personal data in such a way as to meet the GDPR requirement that the obfuscation not be easily undone; since information on other encryption tools is readily available on the internet they won’t be discussed here.

Breaches of personal data have made the news recently, some of which have affected millions of people world-wide. The GDPR is designed to make such breaches nearly impossible, if the provisions are implemented correctly. Again, the GDPR will affect more enterprises than those in the EU, and U.S. businesses will need to be ready to comply. Getting a head-start on this by hardening the database and its data will only make the job easier when the time comes to secure enterprise data.

See all articles by David Fitzjarrell

David Fitzjarrell
David Fitzjarrell
David Fitzjarrell has more than 20 years of administration experience with various releases of the Oracle DBMS. He has installed the Oracle software on many platforms, including UNIX, Windows and Linux, and monitored and tuned performance in those environments. He is knowledgeable in the traditional tools for performance tuning – the Oracle Wait Interface, Statspack, event 10046 and 10053 traces, tkprof, explain plan and autotrace – and has used these to great advantage at the U.S. Postal Service, American Airlines/SABRE, ConocoPhilips and SiriusXM Radio, among others, to increase throughput and improve the quality of the production system. He has also set up scripts to regularly monitor available space and set thresholds to notify DBAs of impending space shortages before they affect the production environment. These scripts generate data which can also used to trend database growth over time, aiding in capacity planning. He has used RMAN, Streams, RAC and Data Guard in Oracle installations to ensure full recoverability and failover capabilities as well as high availability, and has configured a 'cascading' set of DR databases using the primary DR databases as the source, managing the archivelog transfers manually and montoring, through scripts, the health of these secondary DR databases. He has also used ASM, ASMM and ASSM to improve performance and manage storage and shared memory.

Latest Articles