Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

Oracle

Posted Oct 30, 2003

Oracle: Preventing Corruption Before it's Too Late - Part 2 - Page 2

By Marin Komadina

Database Recommendations

As stated earlier, corruption typically hits an Oracle database block in the memory or on the hard disk. Memory corruptions are handled with the Oracle kernel and operating system detection mechanism. This is not always a perfect solution; sometimes, good blocks are marked as corrupt. Disk corruptions are the most often seen type of corruption. Once written to the disk, the block is not checked again until it is requested. Oracle will follow block delivery until the block reaches operating system. After that, the operating system will take responsibility for block handling. However, the operating system might not make an "intelligent" block check, since it has no knowledge about the Oracle block structure.

Oracle provides several possibilities to defend against block corruptions:

a.)   Disk protection

DB_BLOCK_CHECKSUM=TRUE

Oracle's main method for detecting Oracle block corruption is with the block checksum. After the block modification, the header bytes checksum is calculated by the database writer. On the next reread of the same bock, the block is validated. In addition to regular data and rollback blocks, the database also manages the redo log blocks. The redo log blocks hold undo information, which is necessary for transaction recovery. The Archive process (ARCH), as a part of Oracle's background process structure, will calculate checksum for the redo log block and compare it with the existing checksum information from the redo block header before making an entry into archive log file.

The checksum information represents the logical block structure. If a block has a standard Oracle block format, then the block is good. The process of checksum validation does not read the actual block content, therefore, there is still the possibility that data inside the block is corrupt.

Setting this parameter prevents a block from being written that does not confirm to the standard Oracle block format. If the block corruption is due to a change of format, that kind of the corruption will be discovered as a media error. The disadvantage of setting this parameter is that some blocks can be retry-able with the good data portion, but due to the header corruption, Oracle will mark the block as soft corrupted. Once a checksum is generated for a block, the block always uses checksums, even if the parameter is later removed.

DB_BLOCK_CHECKING=TRUE 

Setting this instance parameter instructs Oracle to make a detailed logical block check, (not only header), before making any block changes. This block self-consistent checking prevents memory and disk corruptions. By default, the Oracle kernel always executes a detailed logical check for blocks that belong to the system tablespace. If corruption inside of the block content is detected, media error ORA-00600 [6599] or some variation of the media error, depending of the action completed on the block, will result. This condition will suspend database activity until the problem has been resolved. In addition, all data in the corrupted block is lost. There has been much discussion regarding how much detailed block checking will slow down the system. From my experience, the overhead was never more than 3%, even under a heavy load. Oracle says that we can expect from 1% to 10% overhead. Again, it all depends how much insert/delete activity you have on your Oracle database.

b.)   Memory protection

 _DB_BLOCK_CACHE_PROTECT=TRUE

Database block memory protection is enabled by leaving DB_BLOCK_CACHE_PROTECT on default settings (TRUE) and having the DB_BLOCK_CHECKSUM parameter enabled.

Oracle will use a special database algorithm to protect all modified blocks in the database memory. Any uncontrolled database writes inside the database memory area will be detected, protecting the disk from getting in memory corrupted blocks. After every successful disk read, the database block is checked in the database memory. The checksum information from the block header is compared with the memory checksum calculated on the fly. When they do not match, Oracle raises an error indicating a soft corrupted block, which will be reformatted and reused again.

Although in memory corruption detection is helpful, any in the memory corruption will crash the database instance. This parameter is hidden and by default always on.

c.)   Additional parameters

_CHECK_BLOCK_AFTER_CHECKSUM - perform block check after checksum, by default set ON, and activated only when DB_BLOCK_CHECKSUM=TRUE

_DB_ALWAYS_CHECK_SYSTEM_TS - always perform block check and checksum for the system tablespace, by default set to ON

Oracle Hardware Assisted Resilient Data (HARD)

Once the data block leaves Oracle's protected memory area, control is handed to the underlying operating system. Information stored in the Oracle data block will be checked again, but only on the next read. There is no guarantee that data was not changed before actually being written to the disk. Usually, this should not be a problem, since the operating system will use the underlying control mechanisms for data block checking. However, incomplete writes and corrupted blocks still, from time to time, occur on the disk device. For this reason, storage vendors began to provide Oracle validation and checking algorithms at the storage level. Can you imagine one EMC, or XP storage box, which, after writing to disk, reads the Oracle stored block and then does the Oracle checksum check? This whole initiative is called HARD, with the main target to physical protect data. Storage software products, certified according to the HARD standard are able to make:

  • validation of checksum and selected block fields for the datafile blocks
  • validation of checksum and selected block fields for the redo log blocks
  • validation of checksum and selected block fields for the controlfile blocks
  • validation of single block write

With this methodology, it is possible to follow the Oracle block all the way to the physical disk, allowing us to prevent corruptions before they happen.

Conclusion

Corruption cannot be eliminated, but we can make every effort to discover the problem early. Having advanced system configurations increases the chance of having a block corruption on one of interconnected system layers. In addition, there are many more possible block corruption situations, with even more variations of the same errors, than were presented in this article.

» See All Articles by Columnist Marin Komadina



Oracle Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
Oracle Forum
Topic By Replies Updated
Oracle Data Mining: Classification jan.hasller 0 July 5th, 07:19 AM
Find duplicates - Unique IDs Lava 5 July 2nd, 08:30 AM
no matching unique or primary key rcanter 1 April 25th, 12:32 PM
Update values of one table based on condition of values in other table using Trigger Gladiator 3 February 29th, 06:01 PM