Large Objects (LOBs): A Brief History Lesson
The
first time I encountered an application that required a binary large object
(BLOB) was literally a millennium ago. In 1999, our IT organization at the
American Medical Association (AMA) in Chicago, Illinois was tasked with
upgrading that organization’s seventy-plus mainframe applications into four
client server applications. I led one of several development teams that used
PowerBuilder to develop the next generation of client-server front-end applications.
Our
flagship membership processing application had a requirement to store a
physician’s personal photograph so that when she accessed the AMA’s web site,
her picture would appear next to her personal information. We later expanded
this requirement so that we could store images of a physician practice location
or an organization’s logo. We also developed a mailing list application that
utilized Microsoft Word documents as templates for mailing labels, form
letters, and fax cover sheets so that we could easily generate documents for
distributing membership communication in standardized formats. During the initial
project development cycle we utilized Informix as our RDBMS, and it had the
capability to store JPEGs and Microsoft Word documents as BLOBs.
At
that time our long-term vision for these images and documents was that they’d
be loaded one time, updated only infrequently, and would be relatively small in
size. And as it turns out, we weren’t the only IT team with this relatively
narrow vision; in fact, these assumptions are reflected in the initial roll-out
of Large Object capabilities in Oracle Database 8i as well:
-
LOBs would
never grow beyond a size of a few
hundred kilobytes or in extreme cases a megabyte or two. Based on this
assumption, the largest “chunk size” – the maximum amount of space that Oracle
would have to convert from its source file to a LOB at one time – could be kept
relatively small as well, at just 32KB. -
LOBs would be loaded
relatively infrequently – usually, only once – and then would be updated very rarely. After all, how often
does a standard mailing label format or form letter change? -
Likewise, we anticipated that only one application user session would be
modifying a physician’s picture or a mailing label master document at one time.
Therefore, concurrency and read consistency was not going to be a major
concern, so we didn’t need to worry about the impact of maintaining additional
rollback segments just for LOBs. -
Finally, remember that Oracle 8i didn’t yet offer the complete power of
the clustered, multi-instance database to support highly-available, scalable
applications; that only happened when Oracle 9i
introduced Cache Fusion as part of its deployment of Real Application Clusters.
So … What’s Changed?
At
the risk of overstating the obvious, the footprint of Large Objects has increased
exponentially within our databases over the last decade. It’s being driven by
diverse business needs that span every industry: medical and pharmaceutical
applications that capture and serve up patient images, manufacturing
applications that store components’ designs, and semi-structured data that
business-to-business (B2B) applications use to order products to replenish a
company’s warehouse. However, these newer application requirements have several
things in common:
-
LOB sources are
much more diverse. These sources include Extended Markup Language (XML) documents, geospatial data,
biometric information, video files, and other semi-structured or non-structured
data. -
Simultaneous
multiple-user updates are common. As semi-structured data like XML
has proliferated, the need to provide the ability for more than one user to
simultaneously update the same LOB has also exploded. -
Large objects
are gigantic … and getting larger. It’s not uncommon for LOBs to
extend into the gigabyte and even terabyte range, especially for audiovisual
and medical imaging data. -
Transmission
bandwidths have increased dramatically. Likewise, the applications
that consume these LOBs are now able to take advantage of the fiber-optics
broadband lines that telecommunications providers laid down in the late 1990s.
(I have several friends who’ve terminated their cable TV subscriptions because
most of the television shows they want to watch are available as streaming
video from Internet content providers for no additional cost beyond that of their
broadband internet service.) -
Self-published
user content has exploded. The YouTube phenomenon and other web
sites that store self-produced audiovisual files for wide consumption has also
contributed to the explosion of LOB sources. -
Security
requirements have increased dramatically. Finally, many of these
LOBs require special security – especially personal medical data that is stored
within Digital Imaging
for Communication of Medical Information (DICOM)
sources – because of federal, state, and international regulations like the Health Insurance Portability and Accountability Act
(HIPAA).
LOBs, the Next Generation: SecureFiles
As a
result of these new business requirements, the LOB data type has been
redesigned completely in Oracle 11g
Release 1. This new LOB datatype is called a SecureFile,
but unfortunately that’s a bit of a misnomer because security enhancements are
a relatively small subset of SecureFile’s features, as we’ll see throughout this
article series.
Performance and Manageability Improvements.
Since the sources of LOBs have increased in size by several orders of magnitude
in the past decade, it was obvious that the methods that load, store, and
return data from LOBs needed complete refurbishment. Oracle 11g doesn’t disappoint in this regard:
Bigger CHUNK sizes. In prior releases, the maximum size of a
“chunk” of LOB data that could be read or written at one time was limited to 32KB.
Internal and external LOB sizes have grown exponentially since Oracle 8i, so this was a serious limitation to
effective LOB I/O. Oracle 11g remedies this by automatically choosing an
appropriate CHUNK
size between the database’s standard block size and a maximum of 64MB. Further,
Oracle 11g helps limit the internal fragmentation of SecureFile LOBs
by attempting to store extremely large SecureFile LOB segments on disk so that
they’re physically adjacent. Finally, the ability to utilize variable CHUNK
sizes means that the corresponding UNDO segments generated during updates of
SecureFile LOBs only need to be as large as the CHUNKs and thus contain less
wasted space.
Improved REDO and UNDO Generation.
Before Oracle 11g, the Oracle DBA
had relatively free rein when configuring CHUNK size, as well as the retention
of undo segments for LOBs, and that didn’t always yield optimal results. But
since Oracle 11g will choose automatically
the best method for generating redo and undo for SecureFile LOBs, only two storage-related
parameters are required: RETENTION and MAXSIZE. The RETENTION
directive tells Oracle 11g how to
handle the consistent read data
(stored in UNDO segments) for a SecureFile LOB. Here’s how the setting affects
read consistency:
Table 1-1. RETENTION |
|
Setting |
Explanation |
MAX |
is |
MIN |
keeps old versions |
AUTO |
This |
NONE |
Instructs |
The
new MAXSIZE storage setting works in concert with the
setting for RETENTION,
and it determines the maximum size of a SecureFile LOB. When RETENTION
is set to MAX,
then the LOB is allowed to grow up to a size of MAXSIZE, after which space can be
reclaimed from UNDO segments; otherwise, the MAXSIZE setting is treated as a
hard limit on the size of the SecureFile LOB’s segment.
Note
that the storage attributes CHUNK, PCTVERSION, FREEPOOLS,
FREELISTS,
and FREELIST
GROUPS are simply ignored for SecureFile LOBs
“Intelligent Compression.” This new
feature allows Oracle 11g to compress and uncompress SecureFile LOB data
automatically and transparently, and this means that disk space for these data
is reduced dramatically. Whenever a query or DML statement retrieves or applies
changes to a SecureFile LOB’s data, Oracle 11g
uncompresses only those data blocks needed for the operation, and it
automatically maintains the required mapping between the uncompressed data
blocks in the Database Buffer Cache and the database blocks on disk. There are two
compression levels: MEDIUM (the default) and HIGH;
specifying HIGH
will tend to increase latency during SecureFile LOB access, but will also
result in the highest storage compression ratios.
Automatic Deduplication. One of the more
intriguing SecureFile LOB features is the ability for Oracle 11g to automatically locate any LOBs that
store exactly the same data within the same table and column. Once detected,
Oracle 11g will store only one copy
of the LOB data. The advantage here, of course, is dramatically reduced amounts
of disk space are needed to store the same amount of data; further, redo
logging and UNDO is reduced significantly because an update to SecureFile LOB
data doesn’t require retaining duplicate versions of redo log entries and UNDO
segments. Deduplication of SecureFile LOB data is enforceable at either the
table or partition level. Note that deduplication does require the Oracle Advanced Compression option, a
separately-licensed option that’s also required for OLTP table compression and
several other advanced compression features in Oracle 11g.
Transparent Encryption. Every organization
has ethical and legal obligations to protect customer-sensitive
information stored within LOBs. For example, a patient’s medical
records and radiology images must be protected under HIPAA regulations.
However, there’s also enormous amounts of intellectual
property information that must be stored in semi-structured object
forms – for example, molecular formulas and manufacturing blueprints – that is
crucial to any company’s competitive advantage.
SecureFile
LOBs fill this hitherto serious gap in database security because it’s now
possible to encrypt LOB data within an Oracle 11g database. SecureFile LOBs are encrypted using the same Transparent Data Encryption (TDE)
algorithms that were introduced in Oracle 10gR2,
and this means it’s possible to encrypt either any or all SecureFile LOBs
within the same table using the four standard TDE encryption algorithms
(AES128, AES192, AES256 and 3DES168). Finally, note that TDE does require the
Oracle Advanced Security pack, an extra-cost option, to enable encryption of
SecureFile LOBs.
Wait! What About My “Old” LOBs?
The
DBA must set the value of the COMPATIBLE initialization
parameter to 11.0.0.0.0
or higher to enable SecureFile LOB features. However, the original LOB datatype
and its capabilities are still preserved for backward compatibility: A
pre-11gR1 LOB is now called a BasicFile.
BasicFile and SecureFile LOBs can even co-exist in the same table. In fact, if
a LOB is not specified as a SecureFile during its creation, then Oracle 11g
simply assumes it to be a BasicFile LOB.
Controlling Selection of SecureFiles for LOB Storage.
The new DB_SECUREFILE
initialization parameter offers another layer of control over how Oracle 11g chooses the type of LOB to create when
neither the BasicFile or SecureFile option is specified:
Table 1-2. |
|
|
Value |
Meaning |
|
ALWAYS |
Attempts |
|
FORCE |
Forces |
|
PERMITTED |
(Default) Allows SecureFiles to be created |
|
NEVER |
Disallows |
|
IGNORE |
Disallows |
|
Note
that if a SecureFile LOB storage option is specified during the creation or
modification of a BasicFile LOB – for example, TDE encryption, deduplication,
or compression – Oracle 11g will raise an exception during the operation.
Using SecureFiles: A Practical Example
To
illustrate how simple and practical it is to create and use SecureFile LOBs, I’ll
construct a sample application that captures trouble ticket information. Along
with each trouble ticket identifier and description, the application needs to
retain a Microsoft Word document that each customer submits when a trouble
ticket is created; optionally, the customer can also provide an appropriate
screen print or other image in JPEG format to help clarify the problem
encountered.
I’ve
provided the code to create the initial infrastructure for this application in Listing
1.1:
-
First, I’ll create two new tablespaces, BASICFILES
and SECUREFILES,
in which all BasicFile and SecureFile LOBs will be stored, respectively. -
Next, I’ll create a new schema named TRBTKT
that will own all database objects for this application. -
To provide a location from which the Microsoft Word
documents and JPEG images will be stored as external LOBs until they’re loaded
into the database, I’ll create two new directory objects, TRBTKT_DOCS
and TRBTKT_IMGS.
The
sample Microsoft Word documents and JPEG files can be downloaded as a single ZIP file. To duplicate this application
infrastructure, you’ll need to create directory folders that correspond to the
ones created above and then copy the documents and images into their respective
directories.
Now
that my infrastructure is in place, I’ll construct two tables to demonstrate
how to store the trouble ticket information and related LOBs. As shown in Listing
1.2, both tables are virtually identical; however, table TRBTKT.TICKETS
will utilize BasicFiles to store its LOBs, while, table TRBTKT.SECURE_TICKETS
will use SecureFiles. I’ll be using both tables in this article series to
compare and contrast the different features of BasicFiles and SecureFiles.
I’ll also
construct a new package, TRBTKT.PKG_SECUREFILES, that’s
shown in Listing 1.3.
I’ll use this package to implement all methods for loading LOBs, determining
LOB sizing and space utilization, and displaying rudimentary information from
the character-based LOBs.
Next,
I’ll load sample data into table TRBTKT.TICKETS via calls to
procedure AddTroubleTickets
in package TRBTKT.PKG_SECUREFILES.
I’ve shown the invocation of this procedure in Listing
1.4. The sample Microsoft Word documents and JPEG files can be
downloaded as a single ZIP file here;
to duplicate my loading technique, you’ll need to create directory folders that
correspond to the ones defined in for directory objects TRBTKT_DOCS
and TRBTKT_IMGS.
Finally,
I’ll transfer the data from TRBTKT.TICKETS to TRBTKT.SECURE_TICKETS
using a simple INSERT
INTO … SELECT statement as shown in Listing
1.5.
Next Steps
In
the next article in this series, I’ll delve a lot more deeply into the advanced
capabilities of Oracle Database 11g’s
new SecureFile features, including:
-
Migrating
efficiently from BasicFile LOBs to SecureFile LOBs - Determining the relative storage efficiency of SecureFile vs. BasicFile LOBs
-
Exploring the different compression and deduplication
options for SecureFile LOBs -
Encrypting
SecureFile LOBs via Transparent Data
Encryption
References and Additional Reading
While
I’m hopeful that I’ve given you a thorough grounding in the technical aspects
of the features I’ve discussed in this article, I’m also sure that there may be
better documentation available since it’s been published. I therefore strongly
suggest that you take a close look at the corresponding Oracle documentation on
these features to obtain crystal-clear understanding before attempting to
implement them in a production environment. Please note that I’ve drawn upon
the following Oracle Database 11g
documentation for the deeper technical details of this article:
B28279-02 Oracle Database 11g New Features Guide
B28320-01 Oracle Database 11g Reference Guide
B28393-03 Oracle Database 11g SecureFiles and Large Objects
Developer’s Guide
B28419-02 Oracle Database 11g PL/SQL Packages and Types
Reference
B28424-03 Oracle Database 11g Advanced Application Developer’s
Guide