IBM DB2 Error Checking
May 17, 2010
Database management systems (DBMSs) have simultaneously simplified and complicated the lives of many IT workers. Error codes passed from the database back to the application can take on more than 1000 values. What level of error checking should developers include in applications?
Gone are the days of file processing. In the past, it was enough to check for end-of-file conditions and key errors. Other errors such as device failure or resource lockups either caused the program to terminate abnormally or forced the system to suspend the program until the condition was fixed. Today, the application is expected to handle all of these conditions and more.
Applications accessing DB2 fall into this category. Error codes passed back to the application can take on more than 1000 values. How many of these are actually fatal? Which should the application check?
The Singleton Select
The term singleton select refers to a SELECT statement embedded in a program. This statement is expected to retrieve only a single row and is typically found in random or direct accessing of data (e.g., retrieving a row from the Customer table based upon a unique customer number).
Possible problems with this statement stem from differences between the application developer's expectations of the data and data model and the definition of the table. What if the data model or database design changes? Consider a table with a datetime column as the key that is defined as unique. Application code is written to select based on this key. Then, at some point, the table definition is changed to allow duplicate datetime values. What will happen in the application? Where does the responsibility lie for error checking in this case?
An application can retrieve rows from a multi-row result set using a cursor. Cursors appear in several different SQL statements including Open, Fetch, Update, Delete and Close. Some languages such as java have constructs that can be used to issue all of these cursor operations automatically. With other languages such as COBOL, it is the responsibility of the developer to code each of these operations explicitly.
In either case, some part of the application must handle error conditions that may occur during cursor operations. Some of the possible conditions that may be encountered are:
1. DB2 is not active
2. The application is not authorized to connect to DB2
3. The application's plan or package is invalid or disabled
4. One or more of the tables being accessed are not available
5. One or more of the result set columns allow Nulls and no indicator variables are specified
6. FETCH was specified for a cursor that was not open
7. OPEN was specified for a cursor that was already open
8. Update or Delete was specified for a cursor that was not open
9. Deadlock or Timeout encountered
Note that some of these issues (such as 1-4 above) may indicate a system, connection, or configuration issue; however, the application is still in control! How does it respond to these errors? Who should it notify?
Other errors (6-8) may indicate application logic issues. This may be possible if application code is changed without thorough testing.
Error 9 in the above list is particularly annoying and can occur under several circumstances. I treat this in a separate section below.
Application logic for row insert may encounter some unique issues. These include:
An application updating a row may encounter authorization issues, including violation of column or referential integrity constraints.
Most errors associated with Delete operations involve referential integrity. When an application attempts to delete a row from a parent table, the delete rules for the foreign keys in all descendent tables determine whether the Delete is successful and, if so, what other rows in other tables will be affected. Regrettably, the application cannot ascertain how many other rows are affected unless this is included in the application logic prior to the Delete.
These involve attempts to attach to DB2, use of the Connect statement in the distributed environment and distributed processing errors. What should the program do when it encounters errors of this kind?
Deadlock, Timeout and Resource Problems
All applications accessing DB2 resources should be prepared for deadlock and timeout conditions. In the typical scenario, the program is several levels deep in if-then-else logic, has several cursors open, and has processed and perhaps updated several DB2 tables. When a resource issue occurs, all is undone.
In this case, DB2 detected a deadlock or timeout condition or an unavailable resource. It is informing the application that all update activities (Insert, Update and Delete) in the current transaction have been rolled back. Should the application report this? Should it re-attempt the transaction from the beginning (and how)?
This can make program logic really nasty. Must the application put special logic after every SQL statement to handle each specific deadlock condition? No. Generally, the program need only be restarted from the beginning of the most recent transaction.
It is the application's responsibility to recover from deadlocks and timeouts in a manner that will restart the current transaction. In some cases, the program can attempt to analyze the reasons for the allocation failure and report them.
One last problem involves exceeding installation-specified resource limits. This normally indicates the application has used excessive CPU time as defined in the resource limit table.
Summary and Best Practices
What level of error checking should developers include in applications? The answer lies somewhere between minimal and too much.
Consider a suite of DB2 applications that do only minimal error checking. Exception conditions that occur in production must now be handled by either users, help desk staff or on-call personnel. How many of the situations described so far can be replicated, let alone fixed? Diagnosis of a problem may require turning traces on, using an on-line monitor or debugging tool, re-creating the application situation and hoping the error occurs again. Minimal error checking leads to execution errors, a larger maintenance staff and disgruntled users.
The alternative is to build as much error-handling logic into applications as possible. This is most easily accomplished by inclusion of standard error processing code, perhaps with an optional subroutine to process "fatal" errors. Such a subroutine would be responsible for analyzing errors, extracting diagnostic data, correcting problems if possible, and notifying the proper authorities.
Return codes from the subroutine would inform the application which steps must be taken.
While completeness has its virtues, there is a practical limit to error checking. Developers will start tearing their hair out if every other line they code is error-checking logic. IT shops must develop installation-wide standard for error checking. These should include a list of error codes or messages that must be checked after:
Standards would include a set of error-processing routines that would be required, including methods for programs to route error and diagnostic information to the proper destination.
The bottom line: Create meaningful standards for error checking and have applications check for as many errors as possible consistent with those standards.