Synopsis. Oracle 11gR2 introduced significant features that have made it easier than ever to isolate and identify potential root causes of poor I/O performance. This article – the eighth and last in this series – explores how solid-state devices (SSDs) can make an incredible difference when applied judiciously to an apparently intractable I/O performance bottleneck.
The prior article in this series explored how the methodologies I’d discussed throughout this series could be applied to:
- Improve I/O response time via intra-database solutions
- Improve I/O response time through enhancements to the I/O subsystem
And even though I was able to bring most of those methodologies to bear against an I/O performance issue, I’m pretty certain there are alternatives for improving the application workload’s I/O performance other than simply migrating the tablespaces’ datafiles of the hottest segments to ASM storage. So in the final article in this series, I’ll conclude my I/O performance tuning investigations by exploring how solid-state devices (SSDs) can be leveraged effectively to overcome I/O performance issues … and just as importantly, where they may make absolutely no difference in I/O performance.
Solid State Device (SSD) Architecture: It’s All About the Cells
Solid state devices, colloquially known as SSDs, or solid state drives, have been around for some time now, but especially with the onset of the Oracle Database Machine X2-2 and its corresponding Exadata Storage Server, interest in utilizing SSDs has increased exponentially. In truth, the advantages of SSDs are not that difficult to grasp because the IT community has actually been using them for at least a decade, but in a much smaller format: USB (“thumb”) drives incorporate the same technology as does any SSD, albeit on a much smaller scale.
SSDs are essentially composed of NAND “flash memory.” However, note that this is most definitely not the same type of memory that’s used for cache memory SAN platforms or DRAM in database servers. SSDs are manufactured in two basic architectures, single-level cell (SLC) and multiple-level cell (MLC). In Table 8-1 below I’ve compared and contrasted some of the common characteristics of SLC and MLC SSDs:
Table 8-1. Single-Level Cell (SLC) vs. Multiple-Level Cell (MLC) Solid State Storage Devices |
||
Type |
Single-Level Cell |
Multiple-Level Cell |
Size |
Lower in density, so SLCs trend toward smaller sizes (< 300GB) but they have slowly been getting larger |
Higher in density, so MLCs typically trend toward larger sizes (300 GB and larger) |
Speed |
Less dense, so they are typically faster |
Higher density, so they are typically slower |
Life Cycle |
Less density means a longer life |
Higher density means a shorter life |
Best Use |
Extremely short retention period; perfect for smaller I/Os |
Longer data retention period; larger size I/Os (e.g. code tables, indexes) |
In recent years it appears that most manufacturers of solid-state disk are trending toward the SLC architecture, mainly because of their higher speed and longer life expectancy. For my testing, I used a relatively inexpensive 30 GB SSD manufactured by OCZ that I’d purchased about 18 months back for use in one of my older laptops. This SSD is rated by the manufacturer to last for 2,000,000 hours between failures, and this compares favorably to most hard disk storage devices, which are generally manufacturer-rated at somewhere near 1,000,000 hours mean time between failure (MTBF).
Table 8.2 below compares the sizes and speeds of the storage devices I’ve used throughout my experiments in this article series.
Table 8-2. Storage Devices Used For Testing and Demonstration |
||||
Manufacturer |
Model |
Capacity (GB)
|
Rotational Speed
|
Cache Size (MB)
|
OCZ Flash Drive |
Vertex |
30
|
N/A
|
64
|
Samsung HDD |
HD753LJ |
750
|
7200
|
32
|
Western Digital HDD |
800JD-00JN |
80
|
7200
|
8
|
To SSD or Not to SSD: Is This a Question?
Most Oracle DBAs would not dispute the fact that SSDs are “the next new thing” in terms of faster, energy-efficient, and reliable storage. But as wondrous as they are for increasing I/O performance, they do still have some drawbacks that must be considered, especially from the viewpoint of database I/O processing.
Reading from SSDs is wicked fast … but writing to SSDs is significantly slower. SSD I/O rates, especially for random reads, are significantly higher – usually at least a full order of magnitude – than traditional hard disk devices (HDDs). There’s really no way to overcome this disadvantage because it’s simply inherent in the way NAND flash memory operates: Reads from flash memory can occur at a single byte at a time, but writes to flash memory require that all existing data first be erased and then rewritten. Since NAND flash memory architecture actually requires this erase-and-rewrite cycle happens at the individual block level, writes generally take significantly longer than reads, and sequential writes take much longer than random writes.
SSDs do wear out. Eventually, all hardware does fail, and while SLC SSDs have a significantly higher MBTF than HDDs, the individual cells do fail over time. This too is inherent in NAND flash memory architecture, a direct result of what’s called the program-erase (PE) cycle that I mentioned above. SLC cells can usually withstand over one million such PE cycles before failure occurs. The good news is that several sophisticated methods for bad block management (BBM) have already been built into flash memory to insure that bad cells are bypassed automatically, so the failure of an entire SSD device is extremely unlikely.
SSDs are still significantly more expensive than traditional hard disk devices (HDDs). One of my most recent purchases of consumer-market HDDs brought home to me how cheap disk storage has become. The per-gigabyte price of the HDDs I purchased had already fallen below $0.10, and it continues to fall as larger capacity HDDs are manufactured in greater quantities. SSDs, on the other hand, are still roughly ten to 15 times more expensive on a per-gigabyte basis, but this price continues to drop as the major manufacturers compete in an IT environment that’s driven more by hardware prices than any time in our industry’s history.
The key concern of most Oracle DBAs, therefore, is to focus on where the application of SSD and flash memory technology will generate the “biggest bang for the buck” in terms of Oracle database I/O performance improvement. Based on the intrinsic architecture of NAND-based devices, it should be readily apparent that random access database application workloads – especially OLTP – will probably benefit most. It’s equally important to recognize that sequential read application workloads might be able to benefit from SSDs, but that sequential write workloads may not benefit anywhere near as much.
Evaluating SSD vs. HDD I/O Performance: Preparations
To demonstrate these concepts, I’ll need to prepare sufficiently-sized targets for both random and sequential I/O operations. I used the code in Listing 8.1 to construct the necessary logical and physical structures in my test Oracle 11gR2 (11.2.0.1) database:
First, I allocated a 10 GB partition on the OCZ Flash Drive and a second 10 GB partition on one of my HDDs.
- Next, I formatted both partitions using the Linux native EXT3 file system and mounted these two partitions at mount points /u01/app/oracle/ssdonly and /u01/app/oracle/oradata, respectively.
- I then created two new tablespaces, SSDONLY and HDDONLY, and placed an 8GB datafile for each tablespace on these two respective mount points.
- In each tablespace I created an identical table – SH.SSD_SALES_COPY and SH.HDD_SALES_COPY – using the CTAS (Create Table As Select) method to initially populate data from the SH.SALES table. I also used a sequence to provide a unique key value for each table entry, and created a primary key index on this new column in each table.
- Finally, to force the database to read from or write to the storage subsystem directly a majority of the time, I set the size of the database instance’s database buffer cache almost absurdly low – only 12 MB – so that I/O would tend to be forced to disk more frequently.
To generate application workloads of sufficient intensity and duration, I’ll utilize Dominic Giles’s excellent open-source workload testing tool, Swingbench. Dominic has obviously invested quite a bit of time to develop a relatively sophisticated tool that makes extremely effective use of the Oracle JDBC thin client interface. Swingbench can construct and execute simulated application workloads as well as monitor in real time exactly how an application workload is directly impacting database performance. I’ve included the Swingbench application workload profile as well as the PL/SQL objects that implement my customized application workloads against the SSD-based and HDD-based test objects.
When Will SSDs Improve Performance Most Dramatically?
Based on the strengths of SSD architecture, my expectation is that random access application workloads – especially those that feature random reads and random writes – will benefit the most. To demonstrate this, I constructed a random read workload that executes a simple query against each of my test tables. I likewise constructed a random write workload that executes a simple UPDATE DML statement against each of my test tables. I implemented the DBMS_RANDOM function to randomly select the beginning and ending ranges for the bind variable values in each statement’s selection criteria, thus insuring the result set chosen for each iteration of the workload was kept relatively small.
I then executed these two simulated application workloads as a 50-user Swingbench benchmark over a 15-minute time frame. I’ve reproduced a sample of the tool’s main execution panel in Figure 8-1 below.
Figure 8.1. Swingbench Benchmark Execution Panel.
Since the HDD-based table and index reside in the TSP_HDD tablespace and the SSD-based table and index reside in the TSP_SSD tablespace, the comparison in I/O performance is simple yet striking. Here’s a portion of the Tablespace IO statistics from the complete AWR report I generated for this application workload simulation:
Tablespace
|
Reads
|
Av Reads/s
|
Av Rd(ms)
|
Av Blks/Rd
|
Writes
|
Av Writes/s
|
Buffer Waits
|
Av Buf Wt(ms)
|
TSP_HDD |
33,977 |
37 |
11.22 |
1.00
|
1,602
|
2
|
1
|
10.00
|
TSP_SSD |
33,368 |
37 |
1.46 |
1.00
|
1,739
|
2
|
1
|
0.00
|
SYSAUX
|
5,486
|
6
|
2.03
|
1.17
|
1,528
|
2
|
0
|
0.00
|
For the simulated random read workload, each tablespace encountered approximately the same number of physical reads at the same transaction rate. However, note that on average the SSD tablespace outperformed the HDD-based tablespace by almost 670% (1.46 reads/ms versus 11.22 reads/ms, respectively).
For the simulated random write application workload, however, the results are not quite as striking. Below I’ve reproduced a subset of the Tablespace IO statistics from the complete AWR report I generated for this simulation:
Tablespace
|
Reads
|
Av Reads/s
|
Av Rd(ms)
|
Av Blks/Rd
|
Writes
|
Av Writes/s
|
Buffer Waits
|
Av Buf Wt(ms)
|
TSP_HDD |
44,719
|
49
|
10.28
|
1.00
|
15,037 |
17 |
3 |
10.00 |
TSP_SSD |
42,422
|
47
|
1.09
|
1.00
|
14,027 |
16 |
0 |
0.00 |
SYSAUX
|
4,005
|
4
|
3.35
|
1.09
|
1,480
|
2
|
0
|
0.00
|
Note that there was a measurable but slight increase in I/O performance for the number of average writes per second, and no buffer waits were encountered when writing to the SSD device. Again, this result is not totally unexpected because SSDs are not necessarily likely to improve I/O performance for physical writes.
When May SSDs Fail to Improve Performance Dramatically?
Since the nature of SSD architecture favors read activity more so than write activity, it’s not unusual to encounter improved I/O performance for sequential read workloads. I’ve illustrated this concept with a Swingbench simulation of a ten-user workload that performs a single iteration of full table scans against each of the two test tables over a 15-minute time frame. I’ve excerpted the results from the IOStat by Function/Filetype Summary sections of the corresponding AWR reports for the HDD workload and SSD workload in Table 8.3 below.
Table 8-3. Data File Performance Improvements |
||||||
Test Case:
|
Datafiles on HDD
|
Datafiles On SSD
|
||||
Statistic
|
Reads: Data
|
Requests Per Second
|
Data Per Second
|
Reads: Data
|
Requests Per Second
|
Data Per Second
|
Direct Reads (Data File)
|
53.1G
|
1926.4
|
60.1846 |
57.2G
|
2072.19
|
64.7419 |
There was indeed some modest improvement in sequential read performance when the identical data was placed on HDD versus SSD storage. The I/O throughput performance for the SSD-based table improved by approximately 7.5% versus the HDD-based table based on the Data Per Second metric in each AWR report section (64.7419 MBPS versus 60.1846 MBPS). Again, this result was not unexpected because SSDs perform best when handling random access workloads.
If you’ve been keeping score, you already know that there’s only one IO performance category left, and that’s sequential writes. As I mentioned earlier, the nature of SSD architecture simply can’t help but improve random read and random write activity, but larger writes will almost certainly not be able to benefit from SSD technology. Interestingly, I’ve talked to many Oracle DBAs who believe that SSDs will dramatically benefit the “hottest” files in their databases, and those are usually the online redo logs, especially for an OLTP application workload. This may be counter-intuitive, but SSDs may not be able to relieve this performance bottleneck because it’s implicit in the nature of the LogWriter (LGWR) background process to write out online redo log entries as sequential writes.
To illustrate this paradox, I first migrated all of my database’s online redo logs to SSD storage, as shown in the code in Listing 8.2. I then reran the 50-user random-write workload I used in the first set of scenarios. I’ve captured the pertinent redo log statistics from the AWR reports from both test runs in Table 8.4 below.
Table 8-4. Online Redo Log Performance Improvements |
||||||
Test Case:
|
Redo On HDD
|
Redo On SSD
|
||||
Statistic
|
Total
|
per Second
|
per Trans
|
Total
|
per Second
|
per Trans
|
redo blocks written
|
239,509
|
264.9
|
8.55
|
238,390
|
263.48
|
8.54
|
redo entries
|
265,946
|
294.14
|
9.49
|
257,560
|
284.67
|
9.23
|
redo size
|
111,990,896
|
123,861.81
|
3,996.53
|
111,345,276
|
123,065.27
|
3,989.87
|
redo size for direct writes
|
0
|
0
|
0
|
57,596
|
63.66
|
2.06
|
redo wastage
|
6,503,176
|
7,192.51
|
232.07
|
6,574,024
|
7,265.99
|
235.57
|
redo write time |
9,974 |
11.03 |
0.36 |
3,986 |
4.41 |
0.14 |
redo writes
|
27,974
|
30.94
|
1
|
28,521
|
31.52
|
1.02
|
I was actually pleasantly surprised to find a moderate performance improvement for this scenario. While almost all of the redo log performance statistics compare pretty closely, the redo write time did decrease by approximately 250% (9,974 ms vs. 3,986 ms). However, this performance improvement is more likely due to the fact that my application workload scenarios committed extremely frequently – essentially, after each UPDATE statement was issued – and this means that the LogWriter background process was writing to the online redo logs much more frequently than during a less homogenous or “hybrid” database application workload. As a result, the SSD media processed these LGWR transactions as random writes instead of sequential writes.
Conclusion
In this article series, I’ve done my best to cover the basic best practices behind I/O performance tuning for Oracle databases, especially in an Oracle 11gR2 environment. Hopefully I’ve shed some light on the pratfalls and difficulties of tuning individual storage devices and the related storage networks they comprise, especially when trying to obtain the best level of application workload performance in “hybrid” Oracle database environments.
References and Additional Reading
Before you proceed to experiment with any features I’ve described in this article series, I strongly suggest that you first look over the corresponding detailed Oracle documentation before trying them out for the first time. I’ve drawn upon the following Oracle Database 11g Release 2 documents for this article’s technical details:
E16508-05 Oracle Database 11gR2 Concepts
E16638-05 Oracle Database 11gR2 Performance Tuning Guide
E17110-08 Oracle Database 11gR2 Reference
E17120-07 Oracle Database 11gR2 Administrator’s Guide
E17128-04 Oracle Database 11gR2 New Features