Oracle 10gR2 Transportable Tablespace Enhancements: Tablespace Versioning and Data Jukeboxes
May 24, 2006
Synopsis. Oracle 10g Release 2 (10gR2) augments the capabilities of Transportable Tablespaces to provide the ultimate in data availability: the capability to store and retrieve multiple versions of any tablespace's data. The final article in this series discusses how to utilize these new features to create a data "jukebox," as well as how to transport tablespaces directly from Recovery Manager backup files without incurring any database unavailability.
In the previous article, I demonstrated how Oracle 10gR1's new transportable tablespace features make it simple to "plug in" a single tablespace into another database, even when that tablespace is on a different platform and operating system. I also showed that with a few simple commands, Oracle 10gR1 can easily transport an entire database between platforms.
Although these enhancements expand greatly the flexibility and range of transportable tablespaces, they do have one unfortunate drawback: the database on the source platform is either slightly or completely unavailable until transport operations are completed. For example, to transport a tablespace between platforms, that tablespace must be put into READ ONLY mode on the source platform, and to transport a database between platforms, the entire database must be opened in READ ONLY mode on the source platform.
While these interruptions in availability are certainly temporary, it does mean that some operations are prohibited while Oracle prepares the necessary transport scripts and the appropriate datafile(s) are copied between the source and target platforms. Many IT shops have increasingly adopted extremely tight time frames under which scheduled maintenance may occur; in fact, the trend toward 100% uptime has continued to accelerate over the past several years. Therefore, I may simply be unable to limit my database's availability for any length of time while I prepare it for transportation to another platform.
Transporting a Tablespace from RMAN Backups
Thankfully, Oracle 10gR2 offers a new transportable tablespace feature that addresses this concern: the ability to create a transportable tablespace set directly from existing RMAN backup sets. Oracle leverages all available RMAN backups to restore a copy of the tablespaces' datafile(s), and then it simply applies all available archived redo logs and online redo logs to roll forward all available transactions to bring the tablespace up to its most current status. Even better, I can override this default behavior and create the transportable tablespaces as of a particular log sequence number, timestamp, or SCN.
To illustrate, in Listing 2.1 I have created one additional tablespace, LMT_SFH, that contains one new table, SH.SALES_FORCE_HIERARCHY, that is then populated with the contents of the current sales force hierarchy. Once the tablespace and its objects were created, I then created an RMAN backup of the new tablespace for its eventual transport.
Listing 2.2 shows the RMAN command script that creates a transportable tablespace from the available RMAN backups using the new TRANSPORT TABLESPACE command. Note that RMAN automatically determines which backups, archived redo logs, and online redo logs to use to recover the tablespace to the appropriate point in time. RMAN then automatically recovers the LMT_SFH tablespace by creating an auxiliary instance in the directory specified by the AUXILIARY DESTINATION keyword, and the transportable tablespace itself is stored in the directory specified by the TABLESPACE DESTINATION keyword. Output 2.1 shows the results of executing this script in the Windows NT environment, and Output 2.2 shows the similar results when run in a Linux environment.
Tablespace Versioning: One Tablespace, Many Versions
One of the biggest advantages that IBM mainframe file storage systems have over UNIX-based or NT-based file storage systems and one that I truly miss! is the capability to store
multiple versions of a file in a generation data group (GDG). When I wanted to keep, say, up to five different versions of a daily payroll transaction file, I would simply create a base GDG entry, and then tell the mainframe's operating system that I wanted to add a new version of the file into that GDG. As each new version of the file arrived and was placed into the GDG, the OS kept track of the total number of versions available, and when the sixth version of the file arrived, the original version simply "rolled off" the GDG.
The good news is that Oracle 10gR2 has implemented a similar structure to the GDG known as a file group repository that lets me keep multiple versions of a file within that repository. Even better, I can now store several related files together -- collectively known as a version within a file group repository so that at a later time, I can retrieve and restore those files for use. This feature is at the heart of a new set of tools designed to facilitate a concept called information provisioning: the ability to create, retain, recall, and manage multiple versions of the same data structures over time. (Section 8 of the Oracle 10gR2 Streams Concepts and Administration guide provides an excellent high-level discussion of information provisioning and how these new features fulfill those requirements, by the way.)
Finally, Oracle 10gR2 gives me one more powerful tool for information provisioning: the ability to create a file group repository that stores different versions of tablespace sets. In this context, a tablespace version includes all the tablespaces' datafiles, all DataPump metadata that was exported for those tablespaces, and the DataPump export log file generated when I created the tablespace version. These tablespace versions can be stored within the same Oracle 10gR2 database that typically needs to "plug in" the selected version of the tablespace; however, they can also be stored in a remote Oracle 10gR2 database to provide a central repository for retrieval of tablespace versions for direct plug-in" to any other Oracle 10gR2 database.
With this feature, I can now retain a plethora of information over time, but manage it easily should it need to be retrieved for use in the near (or far!) future. Here are a few business scenarios that benefit from information provisioning through tablespace versioning:
Retaining historical data to meet long-term legal requirements. The onset of Sarbanes-Oxley's Section 404 requirements have shown in recent years that it is important to retain data for long periods of time to handle requests from our auditors. Since Section 404 requirements tend to be vague, it is not unlikely that we could be asked to provide data from several years (or even decades) hence. Since it is impractical to keep that data inside one database, some sort of repository structure simply makes good business sense. Some industries (financial, banking, and insurance, to name just a few) also require long-term data retention to handle transaction queries from dark in a client's past, to enable tracking of fraudulent claims, or to comply with US federal or state government guidelines and requirements.
Creating a "fallback position" during large-scale software deployments. I have participated in many deployments of complex applications during my career, and that means I have seen my share of occasionally horrendous failures during these deployments, too. During those failures, I had wished that I had a reasonably quick method to roll back all the changes made to the database during the deployment of the application release, perhaps to a point just prior to the related changes. With tablespace versioning, I can now simply create versions of the affected tablespace sets just before the application changes are deployed, and that gives me a fallback position in case my application developers and quality assurance analysts have completely dropped the ball during their development and testing efforts.
Building "data jukeboxes" for complex data warehousing requirements. As data warehouses continue to grow ever larger into the petabyte and exabyte range, many DBAs have turned to partitioning tables as a solution for purging old data that's no longer deemed useful. Once the old data has been rolled off from a table partition into just another table, it can be exported and then deleted from the database. However, I have been in situations when my application users suddenly need to get access to the data that has just been purged with this method perhaps to fulfill an emergency audit request, or some type of specialized data warehousing report that needs all available data.
These special requests can be time-consuming to complete because the data needs either to be restored to the same database from exports or backups, or restored to an auxiliary instance. With tablespace versioning, however, I can create what I like to call a data jukebox that has all available data stored within separate file groups. I can then easily "plug in" just the needed versions of the tablespace(s) to an auxiliary instance, give my users access privileges to that auxiliary instance, and once their reporting requests are complete, I can just as easily "unplug" the tablespace versions from the auxiliary instance.
Building a Data "Jukebox" with DBMS_FILE_GROUP
To illustrate how tablespace versioning can work to my advantage in this third scenario, I will construct a data jukebox to supply information provisioning capabilities for a selected set of tablespaces. Before I start, though, let's review the four components of any tablespace version and file group, from the highest to lowest level of granularity:
To initialize the repository, I will create some new DIRECTORY objects:
I will then call DBMS_FILE_GROUP.CREATE_FILE_GROUP to create a new file group repository called SH.GRP_RPTREPOS. Listing 2.3 contains the code that illustrates these initial housekeeping tasks. Output 2.3 and Output 2.4 show the resulting entries that are written into my database's alert log and export log, respectively, when this code has been executed.
Creating Tablespace Versions With Tablespace Cloning
Now that my tablespace repository has been set up, I will use the tablespaces I created for demonstration purposes for both articles in this series as my sources for versioned tablespaces. I will invoke the DBMS_STREAMS_TABLESPACE_ADM.CLONE_TABLESPACES procedure to clone a copy of the LMT_REF, LMT_SFH, and LMT_XACT tablespaces into the tablespace repository as the first version of these tablespaces.
Listing 2.4 shows the code I executed to achieve this. I have also included a snippet from my database's alert log and the export log file that Oracle generated when it created this version of the tablespaces. Note that Oracle automatically determined that the target tablespaces were not in READ ONLY mode when the process began, so it automatically made the tablespaces READ ONLY, then performed the export of the tablespace's data files and metadata, and finally brought the tablespaces back into READ WRITE mode when the tablespace version was completed.
Once the first version is completed, I reloaded the SH.AGGR_SALES table in the LMT_XACT tablespace with new data reflecting the next period of sales, and then updated a few rows in the SH.SALES_FORCE_HIERARCHY table in the LMT_SFH tablespace. I then created the second version of these tablespaces with a similar invocation of the CLONE_TABLESPACES procedure. The code I used to update the tables and then create the next version of the tablespace set is shown in Listing 2.5.
"Plugging In" a Tablespace Version
Now that I have multiple tablespace versions to choose from, I will utilize a sample Windows NT database as my target for plugging in the tablespace versions. To accomplish this, I need to first create a database link on the target server that references the database on the source server. Then I will create directories on the target server with the same names as those on the source server. (Of course, I can use different directories on the target server for storage of the datafiles and metadata export files.)
Once these setup steps are complete, I will invoke DBMS_FILE_TRANSFER.GET_FILE to pull the corresponding files from the source server's directories and copy them to my target database server's directories. To attach the first version of these tablespaces to the target database, I will run the DBMS_STREAMS_TABLESPACE_ADM.ATTACH_TABLESPACES procedure. This procedure will plug these tablespaces into my target database and make them immediately available for access. The code in Listing 2.6 shows how to perform all of these steps, and Output 2.5 shows the results of querying the first version of the tablespace.
To plug in the second version of the tablespace set, however, I first need to drop the tablespaces, including their objects and datafiles, on the target database. I then simply run the code shown in Listing 2.7 to plug in the second version of this tablespace set; Output 2.6 shows the results of querying the second version of these tablespaces.
"Unplugging" a Tablespace Version
I can also "unplug" an existing tablespace set from a database; however, be aware that this also has the possible undesirable side-effect of physically removing the detached tablespaces from the database. I have illustrated how to perform this at the end of Listing 2.7.
I will first remove the second version of the tablespace set from the repository via the DBMS_FILE_GROUP.DROP_VERSION procedure, and then I will recreate the version with the DBMS_STREAMS_TABLESPACE_ADM.DETACH_TABLESPACES procedure. When it has completed executing, the tablespaces will only exist in the tablespace repository and will have been physically removed from the database.
Managing the Data Jukebox
DBMS_FILE_GROUP offers several procedures for maintenance of the tablespace repository. Here is a brief summary of some of the more pertinent procedures:
Listing 2.8 shows sample invocations of these procedures.
Querying the Data Jukebox's Contents
Finally, I have included some useful queries in Listing 2.9 for gathering information from the database's data dictionary on repository file groups, tablespace versions, and the files and directories that correspond to these objects. Heres a list of the data dictionary views that will be helpful when working with tablespace versioning:
Output 2.7 shows the results of running these queries against a sample of my tablespace repository metadata.
Oracle 10gR2 provides powerful data retention capabilities by allowing the storage of multiple versions of a tablespace a flexible solution for creating data jukeboxes for complex data warehousing requirements, providing a "fallback position" during large-scale software deployments, and retaining data historically to meet long term legal requirements. The ability to perform cross-platform transfers of individual tablespaces while incurring absolutely no downtime by creating transportable tablespaces from RMAN backups offers extremely high availability for the source database as well.
References and Additional Reading
Even though I have hopefully provided enough technical information in this article to encourage you to explore with these features, I also strongly suggest that you first review the corresponding detailed Oracle documentation before proceeding with any experiments. Actual implementation of these features should commence only after a crystal-clear clear understanding exists. Please note that I have drawn upon the following Oracle 10gR2 documentation for the deeper technical details of this article:
B14191-02 Oracle Database 10gR2 Backup and Recovery Advanced User's Guide
B14194-03 Oracle Database 10gR2 Backup and Recovery Reference
B14214-01 Oracle Database 10gR2 New Features Guide
B14229-01 Oracle 10gR2 Streams Concepts and Administration
B14231-01 Oracle Database 10gR2 Administrator's Guide
B14257-01 Oracle 10gR2 Streams Advanced Queuing User's Guide and Reference
B14258-01 PL/SQL Packages and Types Reference (10gR2)