Monitoring an ERP application suite involves using several basic measurement techniques. Since it probably co-exists with other applications within a DB2 subsystem or data sharing group, a means must be found to separate ERP package performance measures from those of other applications. Additionally, when measuring subsystem performance you must isolate the effects of the package from those of other applications.
Monitoring an ERP Application in DB2 for z/OS
Creating Measurement Processes
There are three things to remember when creating measurement processes:
- Implement Automation. One-time measures are certainly useful – performance ‘snapshots’ are sometimes essential for debugging performance problems; however, an ERP package contains many thousands of objects and hundreds of processes (or more). You need a process that develops meaningful measures that allow you to get overall status information as well as the ability to drill down to the details. To do this you must find the means to automate both the data gathering and measurement processes. Many performance measurement products have features for automating either data gathering or performance reporting.
- Save Historical Data. It takes time to develop a complete picture of ERP package behavior. It is unrealistic to believe that you will be able to do a good job of monitoring and tuning within a few weeks – or months. You must gather data on a regular basis and store it for later processing. (Relational databases are great for this!) Then, over the course of time, you can do trending analysis and capacity planning. Historical data is essential for regression testing; for example, if you are considering making major performance changes you will want a set of measures both before and after the changes to see if there was any improvement.
- Develop graphical reports. A spreadsheet filled with numbers will not be enough. You must decide on the best way to present your data in order to interpret it correctly. Bar charts, scatter diagrams, and pie charts can all be used to good effect.
What to Monitor; What to Measure
There are two general categories of measurements: system-related and application-related. These categories overlap somewhat, but generally speaking system-related measurements are at a high, or global level. Applications demand work and resources from the DB2 subsystem (such as data storage and retrieval) and the subsystem then performs work on the applications’ behalf.
Some common system-related measurements are:
Virtual pool threshold attainment Sortpool, EDM Pool, Rid Pool filling or failure Logging rate and Log Write performance Global buffer pool usage Subsystem address space CPU usage
Application-related measurements are more detailed and specific. They deal with suites of SQL statements (either separately or taken in sets). Here, the most common measure deal with I/O:
Buffer Pool hit rate (or page residency times) Waits for synchronous I/O Prefetch rates Hardware Cache usage Index efficiency Data access paths
Measurement Classes
There are also two classes of measurements that you will need: measurements of resources (which may be overused, underutilized, or constrained) and measurements of behaviors (which may be symptoms of underlying problems).
Resources
Memory DBM1 Address Space (Virtual Storage constraints) EDM Pool, Sort Pool, RID Pool Virtual Pools (Page Residency) CPU I/O Subsystem Network
Behaviors
Locking Compression Work File Extents Deadlocks, Timeouts
The difficulty in dealing with behaviors is that they are symptoms only; they are not the problems themselves. It is easy to fall into the trap of thinking that you have isolated a problem to a few SQL statements. True, making changes (adding OPTIMIZE FOR n ROWS, re-designing indexes, running multi-column RunStats) may increase statement execution speed; however, the net result may be that you have obscured or totally missed the underlying problem(s).
This is why you need to gather measurements from all classes: resources and behaviors, system and application.
In order to better understand what you are measuring, here’s an example from one installation in the early stages of an ERP package tuning effort. Table 1 shows a cross-reference of their measurement categories and classes with specific measurements. Table 2 shows some typical tactics and tuning efforts for each quadrant.
Table 1: Measurement Categories and Classes for an ERP Installation
|
System-Related |
Application-Related |
Resources
|
Memory DBM1 Addr Space (total) SortPool RID Pool EDM Pool CPU Addr Space Usage I/O Subsystem Log Writing Pageset Placement Global BP Sizing |
CPU CPU used by SQL stmts Class 1 & Class 2 Times I/O Subsystem Pageset Placement Hardware Cache Usage "xxLARGE" Tablespaces Network 3-Tier Configuration DB2 Connect Settings |
Behaviors
|
Compression Hardware Compression Work Files Total Size Bpool Assignment Bpool Thresholds Physical Separation Locking Behaviors IRLM Parms Incidence of Lock Escalation |
Compression Selection of Tablespaces Access Paths Access Path Analysis Index Selection Locking Behaviors Incidence of Deadlocks Incidence of Lock Escalation LOCKSIZE settings |
Table 2: Typical Tuning Efforts by Measurement Category and Class
|
System-Related |
Application-Related |
Resources
|
Memory Pool Re-sizing CPU Forbid Parallelism Remove Compression I/O Subsystem Catalog/Directory Reorg Consider HiperPools BP Thresholds LOGLOAD/CHKFREQ Network Re-Configure DB2Connect |
CPU No Compression for "small" tables Access Path Tuning I/O Subsystem Implement Data Spaces FREEPAGE, PCTFREE |
Behaviors
|
Work Files Enlarge Locking Behaviors NUMLKUS, NUMLKTS IRLMRWT, DEADLOK |
Compression OFF for Small Tables Access Paths BIND Acquire and Release Partitioning Optimizer Hints Parallelism RunStats (multi-column) Locking Behaviors Row-level Locking Large PCTFREE; Reorg |
So, What Should I Measure?
For ERP systems there are three initial areas to emphasize, depending upon data and transaction volumes. These relate to the package’s use of memory and the I/O subsystem:
- DBM1 Address Space total real storage
- the EDM Pool
- the virtual pools (buffer pools)
The DBM1 Address Space
The DBM1 address space contains many areas of memory called pools essential to proper ERP functioning. Among these are the Virtual Pools, EDM Pool, Sort Pool, RID Pool, and others. You should construct a spreadsheet that lists your various storage allocations.
You can estimate usage based on ZParm settings for most of these pools; however, in a memory-constrained environment you should use whatever tools you have available to get real-time storage measurements. For example, it is possible to use DSNWDMP (described briefly in the IBM DB2 Diagnosis Guide) to dump the DBM1 address space and get actual pool size measurements. Other third-party tools may have summary or detail statistics you can use.
The important things to do (as mentioned at the beginning of this article) are to:
- Automate the measurements
- Retain historical data
- Implement graphical reports
So, you can begin with a simple daily (or hourly) measurement of total DBM1 address space real storage usage. Store the measurements over time, and graph the result. You will probably find that memory usage increases sharply on DB2 subsystem startup. This should be no surprise; many of the pools are allocated only on first use; also, some control blocks stored in DBM1 are related to datasets that are not open yet.
Ensure that ZParms and other changes are controlled (SRTPOOL, MAXRBLK, MAXKEEPD, EDMBSPAC, EDMPOOL, DSMAX, CTHREAD, MAXDBAT, buffer pools, etc.). The settings for these and other ZParms will affect memory allocation in the DBM1 address space.
As the production week rolls on your DBM1 real storage graph should stabilize, particularly as ERP package functions are executed. For example, if your package includes a payroll function you should see real storage allocations jump as payroll begins running.
Over time if you do not see real storage usage stabilize then it is possible that there is a DB2 subsystem problem. Check your maintenance level and refer to IBMLink for APARs related to storage issues.
The EDM Pool
The EDM Pool contains database objects such as database descriptors (DBDs) and, if you have implemented dynamic SQL caching, the dynamic SQL cache. Most third-party tools have a process for measuring the EDM pool contents, including the percentage of the pool taken up by each object category.
Begin with this and (as above) map your EDM Pool usage over time. Pay particular attention to the ratio of Request for Pages to Page Already in Pool for each object category. If these ratios are all above 99% it indicates that, in general, your EDM Pool is functioning well. Whenever DB2 needed a database object or a prepared SQL statement, 99% of the time the object was already in the EDM pool and no I/O to disk was required to fetch it.
The dynamic SQL cache needs to be treated a bit differently. ERP packages typically implement much of their functionality with dynamic SQL. Good performance will necessitate that you cache the prepared versions of these statements. The Global Dynamic Cache is allocated when ZParm CACHEDYN is set to YES. Based on your DBM1 storage allocation report you may consider your environment to be virtual storage constrained. In that case, consider setting ZParm EDMDSPAC to YES. This will place the global cache in a Data Space, thus outside of the DBM1 address space. (Note that you must also set additional ZParms that control the size of this data space. These ZParms all begin with “EDM”.)
One final note on the EDM Pool. There seems to be some difference of opinion on the Pages in Use statistic, especially when it is near 100%. Many experts recommend sizing the EDM Pool to allow a certain percentage (say 10%) of the EDM Pool to be unused. Theoretically, this allows for a cushion in case a long-running (and not committing) transaction is responsible for holding object definitions in the EDM Pool for an extended period.
Continue to measure your EDM Pool usage over time. As the maximum size stabilizes, note the high-water mark. If the pool is near 100% used, increase its size gradually to allow for about 10% non-used. Continue to monitor over time. A sudden jump to 100% used may signal the appearance of a rogue application or process.
The Virtual Pools
With most ERP package processes being I/O-constrained, the virtual pools deserve special attention
First, it is assumed that you have assigned objects to the virtual pools based on activity (random access, sequential access, etc.). (You can verify these assignments by a simple query against SYSTABLESPACE and SYSINDEXSPACE.) Now you will need to collect measurements of virtual pool use. You can begin with the -DISPLAY BPOOL . . . DETAIL command if you wish; alternatively, there are several IBM and third-party tools available for measuring and tuning buffer pools.
Your initial measurements should concentrate on when and how often thresholds were exceeded. After that, you can continue with hit ratios. Threshold settings are described in the Administration Guide, including how best to determine them for random and sequential workloads.
A complete discussion of virtual pool measurements and tuning is beyond the scope of this article. I recommend you begin with the materials on virtual pool tuning in the Administration Guide.
In the next article, I’ll cover tuning an ERP application.
Wikipedia – Enterprise Resource Planning – Jan 01, 2011 –
IBM – IBM InfoCenter – Jan 13, 2011 –
IBM – System z Parallel Sysplex Best Practices – Jan 01, 2011
IBM – Infrastructure Solutions: SAP IS Automotive on an IBM Platform – Dec 01, 2008