Database Migration - A Planned Approach
February 22, 2006
A fairly common event in a database's lifecycle is that of the migration from version "older" to version "newer." Migrating from one version to another may be as simple as exporting the old and importing into the new, but chances are there is a lot more involved than first meets the eye. It is not uncommon to also incorporate other significant changes such as an operating system change, a schema modification, and changes to related applications. Each change has its own inherent risk, but lumping them together in one operation flies in the face of common sense, even more so without having tested the migration from start to end. Amazingly, this situation occurs all too often.
From a software engineering standpoint, is it safe or a best practice to heap so many significant changes together in one step? Further, wouldn't it seem obvious that you would want to one, not only practice the migration, but two, test the changes before actually applying them to your live/production environment?
Here is something else to consider: break a dependency chain before it breaks you and the migration process. Given the scenario of migrating from Oracle8i to 10g, changing the underlying operating system to Linux from Solaris, modifying major tables within a schema, and running newer/modified versions of related applications, where are the places you can break the dependency chain? Put another way, what are the safer/well-known/"charted by many others before you" steps, and which are the uncharted/"applies only to you" steps?
Separate the known from the unknown (where versus how)
For non-leading edge/early adopter/early implementers ("sure, we're more than happy to provide our production environment as a beta testing ground for the rest of the world") of a new version of Oracle, by the time you (and your company) are ready to migrate from an older version of the RDBMS software to a newer one, many others will have gone before you. Likewise, many others have already crossed over to the dark side by having adopted Linux as their underlying OS.
Considering the combined RDBMS/OS version change as the known, this combination is also the "where" part of "where versus how." Where your production database lives in terms of version and OS is a logical place to break the dependency chain. In an all-or-nothing do-or-die migration scenario, failure means losing the time spent on what is perhaps the simplest part of the scenario, namely, the hours spent on exporting and importing. If you can separate the overall migration into at least two distinct stages, you will have broken the dependency chain into smaller chains. The guiding principle/lesson to be learned here is to move from point A to D via safe, incremental steps.
Unfortunately, no one can authoritatively tell you what the best approach is for "how." How your database operates with respect to schema and application interaction is up to you to determine. Until you have thoroughly test driven schema and application changes, this part of the overall migration process stays in the realm of the unknown. Going live and finding out for the first time that the new application/database code results in cascading triggers (thereby bringing an instance to its knees, so to speak) is obviously a poor time to become aware of this situation. Developers and testers using 100 records as a test size when the production environment contains tens of millions records is hardly a thorough test.
Export and Import via a proactive approach
With respect to the export and import utilities, you do not have to accept the default parameters. In fact, you owe it to yourself to use quite a few non-default settings, and doing so makes the process easier to perform and saves time when it is time do it for real. Let's look at the indexfile parameter as a start. There are (at least) four excellent reasons to use indexfile=filename on an import.
The first is that the output documents the storage of tables and indexes (all or some, depends on what was included in the export dump file). Where is your source code for schema creation? If you do not have source code, this parameter (along with a fairly simple query that returns everything else) goes a very long way towards providing that information. The query part is spooling out the contents of all or user_source. Code for packages, package bodies, procedures, functions, and triggers will be included in the output. With very little editing such as adding "create or replace" and cleaning up SQL*Plus artifacts (i.e., feedback, heading, page breaks if these weren't suppressed to begin with), you are left with the current source for a significant portion of a schema.
The second is that if you are going to do any housecleaning or rearranging of tables and indexes, now is the time to edit the indexfile and update tablespace mappings and storage parameters. If the logical layout is to remain the same, then the third reason comes into play.
Separate the tables from the indexes, that is, separate the SQL create statements (one script for tables, the other for indexes). Do as much as you can on the target database before it is time to do the actual migration. Part of this includes creating the same/new tablespaces and running the create tables script. Run the create tables script ahead of time for two reasons: one is to validate the logical layout, the other is to help speed up the import (concepts question: how does import work if an object exists or does not exist?).
The fourth reason comes back to the indexes listed in the indexfile. Performance-wise, when doing bulk inserts, is it better to have indexes or not? What happens when a new record is inserted? One or more indexes have to be updated (assuming there is at least a primary key for that record). Oracle's recommendation is that (for large databases) you should hold off on creating indexes until after all the data has been inserted. Again, this comes back to the importance of the indexfile because it is the link between export using "indexes=n" (the default is y) and your being able to re-create the indexes after the data has been loaded.
In the next article about migration, I will provide a checklist/plan covering steps and procedures for the before, during, and after phases. Even if you are forced to bundle together four major changes at the same time, there are proactive measures you can take to mitigate and reduce risk.