Oracle Support Notes - "Complete Guides" Usually Aren't
January 27, 2009
In the same way that friendly fire really isnt, it is not uncommon to encounter Complete Guides on MetaLink which arent. You would think that upon exercising due diligence in researching and following the steps therein of a process or procedure, especially after reading the official documentation, README text files, notes on My Oracle Support (MetaLink), including complete FAQs, that your likelihood of success would be very nearly 100%. The less than 100% factor allows for the possibility that you just happen to stumble across some obscure or new bug. If you rely on that chain of thought, dont, because there are cases where Oracles information about installing feature X or migrating product Y is incomplete and downright incorrect.
As old as STATSPACK is, you would think that by now all typical errors or problems would be documented in an Oracle-published complete FAQ. As much as version upgrades are tested, you would think that a complete manual migration guide would be bulletproof in its steps. Another is that a relatively simple operation (via the installed script) would match the notes on (formerly) MetaLink. Listed below are two examples of how due diligence was not complete enough in order to avoid problems. These were not obscure or one-off operations by any stretch of the imagination. A third is somewhat less known, but not really out the realm of what others have done.
In older versions of the RDBMS software (prior to release 10g), running the spcreate.sql script can cause literally hundreds of objects to go invalid, and not just your objects, but Oracles as well. No problem, you think, Ill just run utlrp.sql to recompile everything. As you sit there and wait for the repair script to run, youll notice nothing is happening. Why is that? Because one of the objects (indirectly) invalidated by installing STATSPACK is the DBMS_UTILITY package body. As one of the first Oracle-owned objects to be recompiled, it isnt. Chances are youll be able to compile most everything else manually but be left with this package body having an invalid status.
Do you believe that if an Oracle-supplied built-in goes invalid that you can just re-run the script that installed it in the first place AND have no other consequences or impacts upon your system? If so, change that belief right now. In the STATSPACK installation situation, what caused everything to go haywire? Running spcreate.sql invokes calls to other scripts. One is the spcusr.sql script. This script in turn calls @@dbmsjob which guessing by the name does what? Thats right it installs (or at least attempts to) the DBMS_JOB supplied built-in. Umm, what if you already had DBMS_JOB on your system, and what if DBMS_JOB happens to be in use?
How can this be? STATSPACK has been around forever. There is (or was until recently) no mention of dbmsjob causing any problems in the documentation, the release notes, the README-like file (spdoc.txt) in the rdbms directory, and even better, in the STATSPACK Complete Reference note. A note has been updated (149113.1, Installing and Configuring StatsPack [sic] Package) which now mentions a recommendation to comment the call to dbmsjob in the spcusr.sql script. It also refers to a bug documented in 2002, but was not included in this document until more than six years later.
In the 10g version of spcusr, released several years ago, the call to dbmsjob was removed because by now (or then), the use of DBMS_JOB is much more prevalent. All in all, it was a big gaffe on Oracles part to not have mentioned the dbmsjob versus DBMS_UTILITY much sooner and in a clearer way.
A manual migration from 9i to 10g
One question appearing with regular frequency on many forums deals with migrating versions of Oracle. The upgrade or migration guide (depending on the version) lists several means of migrating, one of which being a manual approach. The release of 10g was significant and Oracle posted a note (316889.1) titled, Complete Checklist for Manual Upgrades to 10gR2. Overall, that was a big help in providing a robust step-by-step document detailing what needs to take place. Unfortunately, there were (at least) two omissions from the note, one being a show-stopper. The missing step (but known to Oracle because it is an undocumented bug) was to drop what is essentially a placeholder table for something related to XML DB. It is a one record table that if not dropped before the upgrade script is run will likely cause an unrecoverable error and require restoration of a backup (which you really, really need to take before starting, trust me on this). A prior version of the note made mention of dropping the table after the upgrade script had run. If you waited until then, you were doomed. Why wasnt the possibility of the undocumented bug listed in the guide? The least that could have been done was to address it in the Known issues section of the guide.
Another issue with the complete guide was some erroneous information regarding time zone data used within your database. The note said the issue applied only to 10gR1, but it really applied to R2 as well (and still does). Looking at the note today, you can see that many revisions have taken place, and even with a known issue, it still says (in step 5) Please this step is only required for the 10gR1. For that matter, the statement is still missing a period at the end.
Changing the word size
This migration/upgrade related step is taken care of for you when moving from 32-bit to 64-bit software (or vice versa) depending on how you are migrating or upgrading. If *ALL* you are doing is moving from a 32-bit version of Oracle to a 64-bit version (and again, vice versa), then the word change process is done manually. You are required to run a script (utlirp.sql), and depending on the source of your documentation (to include notes on MetaLink), you may be told that the script compiles all invalid objects when done. That is not true.
The change in word size invalidates all PL/SQL within the database. It is up to you to recompile everything. What enables the word change is having the code compiled under the correct or new version of the Oracle software, and nothing else. You can read through the script and see that its main steps are to update a SYS-owned table and set a status column to 6. Where is a call to utlrp.sql made?
Communicating the changes
Okay, so maybe you were the lucky one to have first encountered a new bug. How do you get your experience and feedback incorporated into the complete guides and errata? Dont count on having any luck with the analyst working your case. There is a huge lack of ownership on Oracles part, ownership in this case referring to having an employee take ownership of a customers problem and ensuring an issue is resolved. With Oracle Support, the best you get is I passed your comments to the analyst who wrote the note. That analyst owns the note. Can I close the SR now?
In better customer service oriented companies, that attitude, policy, or lack of taking responsibility for a customers issue does not fly. Why does it exist within Oracle Corporation? Seriously, I know people from Oracle read these articles once in a while. Why do you not fix erroneous notes once they have been pointed out to you? Ive been to Technology Days and listened to a presentation by some group or planet-level support manager who gives out his contact information. Does getting a note fixed really have to go that level of exception handling within the company?
It can be frustrating to have done your homework (research and testing) on something and wind up stepping on an Oracle landmine during the implementation phase. It reminds me of the movie, Dead Zone, when Christopher Walken grabs the arm of the deputys mother (he was the killer) and through a vision realizes that the mother knew about her sons crimes. Walken exclaims, with some indignation, You knew. Didnt you? You knew. Same thing with complete FAQs and references on MetaLink when someone knew there is an issue and failed to incorporate it.
There is no real defense against this situation, but what you can do to mitigate any repercussions against yourself job-wise is to have done your homework with respect to reading the documentation, release notes, script comments (although I cannot recall anything saying explicitly to do so), included README files (even when not named readme), and whatever notes are available on My Oracle Support. With this approach, you can keep an unintended data-altering event from turning into an employment-altering event. If you stumble across some missing steps or information, please help others within the community by asking your assigned analyst to get the note corrected.