Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

DB2

Posted Aug 14, 2003

Protecting DB2 against Non Concurrent Application Code - Page 3

By Marin Komadina

Performance Problem *Slow Database*

As an example, a customer called because there was a problem with database performance. The machine was a SUN Solaris 8 CPU with 8 database partitions, version 7 with FixPack 3. It was an OLTP type database. The machine was projected to handle more customer connections than were actually connected. A snapshot of the system resources showed high CPU utilization.

	$ vmstat 1 10
	kthr     memory             page              faults        cpu     
	----- ----------- ------------------------ ------------ -----------
	 r  b   avm   fre  re  pi  po  fr   sr  cy  in   sy  cs us sy id  wa 
	 0  5 160319 23319   0   0   0   0    0   0 475 5113 920 20 26  0 54
	 2  3 160319 23240   0   0   0   0    0   0 541 4791 929 23 24  0 53
	 1  3 160319 23250   0   0   0   0    0   0 466 5219 842 16 25  0 59
	 2  3 160319 23213   0   0   0   0    0   0 439 4151 868 21 29  0 50
	 5  2 160319 23204   0   0   0   0    0   0 509 6069 829 30 21  0 50
	 0  3 160319 23181   0   0   0   0    0   0 455 6385 865 28 18  0 54
	 0  3 160319 23183   0   0   0   0    0   0 456 6472 849 25 35  0 40
	 3  2 160319 23235   0   0   0   0    0   0 382 5346 605 26 23  0 51
	 2  4 160176 23404   0   0   0   0    0   0 407 6537 624 34 26  0 40

The memory situation was good, no swapping. Disk IO was very low, so the database had no problem working with the disk subsystem. We had a problem with high CPU utilization. On the 8 CPU system (idle time) is 0. So, what could cause that high CPU usage?

In most cases, the reason for high CPU utilization is badly written SQL application code or a database locking problem. It is not unusual for an overloaded machine to suffer from both. Frequently, non-optimal  SQL code will cause a locking problem.

List of the database locking parameters:

	>>  db2 get db cfg for ARTIST | grep LOCK
	
	 Max storage for lock list (4KB)              (LOCKLIST) = 50
	 Percent. of lock lists per application       (MAXLOCKS) = 10
	 Lock timeout (sec)                        (LOCKTIMEOUT) = -1
	 Interval for checking deadlock (ms)         (DLCHKTIME) = 10000

List of the database commit parameters:

	Group commit count                          (MINCOMMIT) = 1
	Percent log file reclaimed before soft chckpt (SOFTMAX) = 100

A locking information snapshot:

	Lock list memory in use (Bytes)          = 2952 bytes
	Lock Escalations                         = 10023
	Exclusive Lock Escalations               = 1322
	Locks Held Currently                     = 43
	Current Applications Waiting on Locks    = 6
	Lock waits                               = 2322333
	Time database waited on locks (ms)       = 38447777
	Average wait time per lock               = 233.00 ms

Connections to the monitored database:

	db2 =>  LIST APPLICATIONS    
	 
	Auth Id  Application    Appl.      Application Id                 DB       # of
	         Name           Handle                                    Name    Agents
	-------- -------------- ---------- ------------------------------ -------- -----
	APPEASY  ARTIST101.exe    46         AA34000A.0BBA.030716115358     ARTIST   1    
	APPEASY  ARTIST101.exe    17         AA34000A.0BCD.030716115401     ARTIST   1    
	APPEASY  ARTIST101.exe    13         AA34000A.0BBE.030716115359     ARTIST   1    
	DB2INST1 db2bp_s          133        *LOCAL.db2inst1.030716082332   ARTIST   1    

Checking database log file db2alert.log for possible errors:

	2003-07-16-10.30.10.642353   Instance:db2inst1   Node:003
	PID:12255(db2agntp (ARTIST) 3)   Appid:*N0.db2inst1.030716072626
	data_management  sqldEscalateLocks   Probe:2   Database:ARTIST
	 
	 -- Lock Count, Target  : 113791, 56895
	 -- Table (ID) Name     : (123;5) APPEASY.VRES111
	 -- Locks, Request Type : 113769, X 
	 -- Result (0 = success): 0
	 
	 -- Lock Count, Target  : 113784, 56892
	 -- Table (ID) Name     : (37;2) APPEASY.TRES222
	 -- Locks, Request Type : 113781, S
	 -- Result (0 = success): 0
	
	2003-07-16-16.13.28.046685   Instance:db2inst1   Node:004
	PID:60794(db2agent (ARTIST))   Appid:*LOCAL.db2inst1.030627061228
	data_management  sqldEscalateLocks   Probe:3   Database:ARTIST
	 
	 -- Lock Count, Target  : 1141, 570
	 -- Table (ID) Name     : (2;39) APPEASY.MESSAGES
	 -- Locks, Request Type : 514, X
	 -- Result (0 = success): FFFF8502 -> DEADLOCK
	 
	2003-07-16-16.15.50.070572   Instance:db2inst1   Node:002
	PID:60794(db2agent (ARTIST))   Appid:*LOCAL.db2inst1.030708151448
	data_management  sqldEscalateLocks   Probe:3   Database:ARTIST
	 
	 -- Lock Count, Target  : 631, 574
	 -- Table (ID) Name     : (2;69) APPEASY.PAYMENTS
	 -- Locks, Request Type : 500, X
	 -- Result (0 = success): FFFF8502 -> DEADLOCK
	 
	2003-07-16-17.17.52.015357   Instance:db2inst1   Node:000
	PID:23998(db2agent (ANLDEB2))   Appid:*LOCAL.db2inst1.030709090853
	data_management  sqldEscalateLocks   Probe:3   Database:ARTIST
	 
	 -- Lock Count, Target  : 1140, 570
	 -- Table (ID) Name     : (2;39) APPEASY.ACCOUNTS
	 -- Locks, Request Type : 509, X
	Result (0 = success): FFFF8502   ' DEADLOCK

The database log file had numerous deadlock occurrences. The customer had already complained that end user's applications timed out from time to time, with the following errors:

	[DB2/6000] SQL0911N  The current transaction has been rolled back because of a deadlock 
	or timeout.  Reason code "2".  SQLSTATE=40001

General conclusion:

The system was overloaded with only 3 database connections. In the database log file there was a good deal of information about lock escalations and deadlock situations. A deadlock is an indication that:

  • applications have contention problems caused by lock escalations

  • application was built with wrong isolation level

  • application lock ordering is not designed correctly

  • in very rare cases there might be a problem with catalog tables that are locked for repeatable read

This was my first guess after getting this initial information.

Before taking any further action, we decided to take a deeper look at the database. An agreement was reached with the customer to collect information from the database for one day. The collected information was converted to graphical charts using Quest Central for DB2. The locking information points to the actual problem:

The database had an average of 60 database locks for all connected applications. This was rising from time to time to 100 locks. We had only 3 applications running against the database.

It is not unusual to expect a thousand locks held with more than 30 connected applications, which is what the customer wanted in the first place.

The lock information can be collected on an individual agent level using "snapshot for locks." For example, getting lock info for the application handle number 13:

	db2 =>  LIST APPLICATIONS    
	 
	Auth Id  Application    Appl.      Application Id                 DB       # of
	         Name           Handle                                    Name    Agents
	-------- -------------- ---------- ------------------------------ -------- -----
	APPEASY  ARTIST101.exe    46         AA34000A.0BBA.030716115358     ARTIST   1    
	APPEASY  ARTIST101.exe    17         AA34000A.0BCD.030716115401     ARTIST   1    
	APPEASY  ARTIST101.exe    13         AA34000A.0BBE.030716115359     ARTIST   1    
	DB2INST1 db2bp_s          133        *LOCAL.db2inst1.030716082332   ARTIST   1    
	
	db2 > GET SNAPSHOT FOR LOCKS FOR APPLICATION AGENTID 13

This command will produce a report for all of the holding locks by an agent number 13.

Other useful information, "Average lock wait time" was collected:

Average lock wait time was from 250 ms to 4.000 ms. It was very high and not suitable for a multi-user concurrent system. Locking too many resources made the UNIX machine overload and the applications locked in waiting to get a lock. Even adding additional CPU units will not solve this problem.

Database and Application Tuning

We are going to solve this problem, in a conversation with the application developers. They need to provide the DBA with information about the target database load and we need to convince them to change their SQL code and the way that they lock database objects.

The statement from the application developers regarding the target database load:

"We suppose that a maximum of 35 concurrent programs will attack the database and each will process around 20000 records from the customer table."

During database observations, we had noticed high "Exclusive escalations," which was equal to the total number of escalations in the database. An exclusive escalation is when the locks have been escalated from one or several row locks to one exclusive table lock. It is controlled by the database's internal mechanism. Normal lock escalations are controlled by the application code. The problem was mainly too small of a lock list size for the current number of running applications or, one or more applications are using an excessive number of the locks.

Very basic calculations for the lock parameters:

1 connections -> 20.000 records = 20.000 row lock x 72 bytes = 1,5 MB

35 connections -> 1,5 MB x 35 users =52 MB + additional space for the internal processing and the other users, up to the 60MB total.

To solve the problem with the high lock escalations, we decided to change the
locklist size from 200KB to 60MB. 

	Max storage for lock list (4KB)              (LOCKLIST) = 50 -> 15000
	Percent. of lock lists per application       (MAXLOCKS) = 10 -> 20

The parameter "Maxlock" was changed from 10% (20 KB) to 5% (3MB), so one application could hold 3MB of lock space before starting the lock escalation procedure. After these changes, we had eliminated a 'lock escalation' where DB2 replaces several 'row locks' with the one 'table lock'.

Additional improvements were implemented in the application code by the developers:

  • implemented more frequent commits than before
  • changed lock behaviour for updates on many table rows to table lock, for very short time
  • changed LOCKSIZE parameter for several critical tables from row level to table
    	db2 =>  ALTER TABLE ARTIST.PAYMENT LOCKSIZE TABLE
  • changed isolation level used for application from Repeatable Read to Cursor Stability. Result was a decreased number of the shared locks.

The situation with deadlocks was discussed and managed with the following changes:

	Lock timeout (sec)                        (LOCKTIMEOUT) = -1 -> 60 
	Interval for checking deadlock (ms)       (DLCHKTIME) = 10000

The application will wait a maximum of 60 seconds to lock the resource. Deadlock timeout will resolve a deadlock condition after 10 seconds. This will bring down high CPU utilization. The application was changed to handle exceptions regarding inability to aquire lock in limited time of 60 seconds.

The commit behavior was also changed with the following parameter:

	Group commit count                          (MINCOMMIT) = 1 -> 3

This change will force the grouping of three commit requests into one write. This will manage optimal writing for COMMMIT records to disk without additional overhead. We decided to do that, since numerous applications would request a commit within a very short time.

The developers changed the application code for one critical batch process. Now they are executing the job with additional DB2 connections, separating jobs between connections and keeping rows locked for less time than before.

The database was restarted and statistics again collected. We had no more Deadlocks and the Average lock wait time was drastically reduced from 4.000 to 250 ms. Newly generated statistics were much better:

The Average lock wait time is still high, but no longer critical. CPU utilization was dumped to a normal level of 25% and all connected users were satisfied with performance.

Lock situation overview after changes:

The locks are shorter and the application behaves faster. Commit handling implemented in the program code worked much better. Commits were made more frequently, reducing the time locks were held on updated rows. Packages were rebinded with isolation level (CS) and locksize (PAGE) was changed to locksize (ROW). The batch programs were implemented with RELASE(COMMIT) instead of the RELASE(DEALLOCATE) option specified during the BIND of the package. The database log file db2alert.log has no new entries regarding the deadlock problem, just regular row lock escalation from time to time. The customer is satisfied and the problem is solved.

Conclusion

Is it realistic to expect the database engine to solve all our application problems?The database is recently enhanced with the self-tuning and auto-management features but still has no ability to rewrite all user, non-efficient application code. It is amazing, how many sites running versatile DB2 applications, still are not capable of running them concurrently.The only solution that I see is close work between the application team and the production DBA.

» See All Articles by Columnist Marin Komadina



DB2 Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.