In the same way that friendly fire really isn’t, it is not
uncommon to encounter “Complete Guides” on MetaLink which aren’t. You would
think that upon exercising due diligence in researching and following the steps
therein of a process or procedure, especially after reading the official
documentation, README text files, notes on My Oracle Support (MetaLink),
including “complete” FAQs, that your likelihood of success would be very nearly
100%. The less than 100% factor allows for the possibility that you just happen
to stumble across some obscure or new bug. If you rely on that chain of
thought, don’t, because there are cases where Oracle’s information about
installing feature X or migrating product Y is incomplete and downright
incorrect.
As old as STATSPACK is, you would think that by now all
typical errors or problems would be documented in an Oracle-published complete
FAQ. As much as version upgrades are tested, you would think that a “complete”
manual migration guide would be bulletproof in its steps. Another is that a
relatively simple operation (via the installed script) would match the notes on
(formerly) MetaLink. Listed below are two examples of how due diligence was not
complete enough in order to avoid problems. These were not obscure or one-off
operations by any stretch of the imagination. A third is somewhat less known,
but not really out the realm of what others have done.
Installing STATSPACK
In older versions of the RDBMS software (prior to release
10g), running the spcreate.sql script can cause literally hundreds of objects
to go invalid, and not just your objects, but Oracle’s as well. No problem, you
think, I’ll just run utlrp.sql to recompile everything. As you sit there and wait
for the repair script to run, you’ll notice nothing is happening. Why is that?
Because one of the objects (indirectly) invalidated by installing STATSPACK is
the DBMS_UTILITY package body. As one of the first Oracle-owned objects to be
recompiled, it isn’t. Chances are you’ll be able to compile most everything
else – manually – but be left with this package body having an invalid status.
Do you believe that if an Oracle-supplied built-in goes
invalid that you can just re-run the script that installed it in the first
place AND have no other consequences or impacts upon your system? If so, change
that belief right now. In the STATSPACK installation situation, what caused
everything to go haywire? Running spcreate.sql invokes calls to other scripts.
One is the spcusr.sql script. This script in turn calls “@@dbmsjob” which
guessing by the name does what? That’s right – it installs (or at least
attempts to) the DBMS_JOB supplied built-in. Umm, what if you already had
DBMS_JOB on your system, and what if DBMS_JOB happens to be in use?
How can this be? STATSPACK has been around forever. There is
(or was until recently) no mention of dbmsjob causing any problems in the
documentation, the release notes, the README-like file (spdoc.txt) in the rdbms
directory, and even better, in the STATSPACK Complete Reference note. A note
has been updated (149113.1, “Installing and Configuring StatsPack [sic] Package”)
which now mentions a recommendation to comment the call to dbmsjob in the spcusr.sql
script. It also refers to a bug documented in 2002, but was not included in
this document until more than six years later.
In the 10g version of spcusr, released several years ago, the
call to dbmsjob was removed because by now (or then), the use of DBMS_JOB is
much more prevalent. All in all, it was a big gaffe on Oracle’s part to not
have mentioned the dbmsjob versus DBMS_UTILITY much sooner and in a clearer
way.
A manual migration from 9i to 10g
One question appearing with regular frequency on many forums
deals with migrating versions of Oracle. The upgrade or migration guide
(depending on the version) lists several means of migrating, one of which being
a manual approach. The release of 10g was significant and Oracle posted a note
(316889.1) titled, “Complete Checklist for Manual Upgrades to 10gR2.” Overall,
that was a big help in providing a robust step-by-step document detailing what
needs to take place. Unfortunately, there were (at least) two omissions from
the note, one being a show-stopper. The missing step (but known to Oracle
because it is an undocumented bug) was to drop what is essentially a
placeholder table for something related to XML DB. It is a one record table
that if not dropped before the upgrade script is run will likely cause an
unrecoverable error and require restoration of a backup (which you really,
really need to take before starting, trust me on this). A prior version of the
note made mention of dropping the table after the upgrade script had run. If
you waited until then, you were doomed. Why wasn’t the possibility of the
undocumented bug listed in the guide? The least that could have been done was
to address it in the “Known issues” section of the guide.
Another issue with the “complete” guide was some erroneous
information regarding time zone data used within your database. The note said
the issue applied only to 10gR1, but it really applied to R2 as well (and still
does). Looking at the note today, you can see that many revisions have taken
place, and even with a known issue, it still says (in step 5) “Please this step
is only required for the 10gR1.” For that matter, the statement is still
missing a period at the end.
Changing the word size
This migration/upgrade related step is taken care of for you
when moving from 32-bit to 64-bit software (or vice versa) – depending on how
you are migrating or upgrading. If *ALL* you are doing is moving from a 32-bit
version of Oracle to a 64-bit version (and again, vice versa), then the word
change process is done manually. You are required to run a script (utlirp.sql),
and depending on the source of your documentation (to include notes on
MetaLink), you may be told that the script compiles all invalid objects when
done. That is not true.
The change in word size invalidates all PL/SQL within the
database. It is up to you to recompile everything. What enables the word change
is having the code compiled under the correct or new version of the Oracle
software, and nothing else. You can read through the script and see that its
main steps are to update a SYS-owned table and set a status column to 6. Where
is a call to utlrp.sql made?
Communicating the changes
Okay, so maybe you were the lucky one to have first
encountered a new bug. How do you get your experience and feedback incorporated
into the “complete” guides and errata? Don’t count on having any luck with the
analyst working your case. There is a huge lack of ownership on Oracle’s part,
ownership in this case referring to having an employee take ownership of a
customer’s problem and ensuring an issue is resolved. With Oracle Support, the
best you get is “I passed your comments to the analyst who wrote the note. That
analyst owns the note. Can I close the SR now?”
In better customer service oriented companies, that
attitude, policy, or lack of taking responsibility for a customer’s issue does
not fly. Why does it exist within Oracle Corporation? Seriously, I know people
from Oracle read these articles once in a while. Why do you not fix erroneous
notes once they have been pointed out to you? I’ve been to Technology Days and
listened to a presentation by some group or planet-level support manager who
gives out his contact information. Does getting a note fixed really have to go
that level of exception handling within the company?
In Closing
It can be frustrating to have done your homework (research
and testing) on something and wind up stepping on an Oracle landmine during the
implementation phase. It reminds me of the movie, “Dead Zone,” when Christopher
Walken grabs the arm of the deputy’s mother (he was the killer) and through a
vision realizes that the mother knew about her son’s crimes. Walken exclaims,
with some indignation, “You knew. Didn’t you? You knew.” Same thing with
“complete” FAQs and references on MetaLink when someone knew there is an issue
and failed to incorporate it.
There is no real defense against this situation, but what
you can do to mitigate any repercussions against yourself job-wise is to have
done your homework with respect to reading the documentation, release notes,
script comments (although I cannot recall anything saying explicitly to do so),
included README files (even when not named “readme”), and whatever notes are
available on My Oracle Support. With this approach, you can keep an unintended data-altering
event from turning into an employment-altering event. If you stumble across
some missing steps or information, please help others within the community by
asking your assigned analyst to get the note corrected.