By John Schultz
Overview
In early 2013 while working at AOL we discovered a serious problem with an application developed to use MongoDB. The database for this application would consist of billions of small documents representing pieces of user data. While any one document was not large, 1200-1400 bytes, the very large number of documents clearly required many terabytes of persistent storage. More than would ever, cost effectively, fit into our database host’s memory. Any one user’s data could easily fit into a host’s memory but the way the documents were getting written over time meant that all users’ documents were stored jumbled together on any particular memory page. In practice this meant that an attempt to retrieve all of the documents for a particular user could and often would cause every page in the database to be touched. This was of course problematic from a cost point of view since persistent storage bandwidth quickly became a major bottleneck.
Efforts were made to reduce document sizes by shrinking the length of field names and eliminating fields that could assume a default value but that was not nearly enough. What we needed was a way to either compress the database to fit into memory or a storage mechanism under MongoDB that would store documents for a particular user clustered together into pages to minimize storage I/O when retrieving all documents for a particular user to the minimum physically possible.
At this point we became aware of a fork of the MongoDB database server called TokuMX. TokuMX is a fork of the 2.2 version of MongoDB with many 2.4 features added. The Tokutek people claim that their database server compresses the typical MongoDB database by up to 4X or more. They also support clustered indexes to minimize I/O in scanning through related documents.
Those two features together sounded very promising to us. So we decided to do a benchmark to compare MongoDB to TokuMX comparing those two features. The rest of this article tells the story of our experiences with that benchmark test.
The Use Case
As outlined above, the application database we wanted to benchmark contains billions of relatively small documents for users. As an example of the kinds of data we were planning to store, think of each document representing a single piece of email, RSS post, Facebook, or twitter post for a user. In this environment each user is going to have many thousands of documents in this database. In order to make sure that we have all of the data for a user we periodically have to sync the data stored for each user with the origin sources for that data. This essentially involves scanning through all of the users’ data in the local database and comparing it to data stored in the origin services. Then any changes are applied. This results in passing over user documents on a regular basis to keep the local store in sync with the origin services.
Our benchmark test used this use case to compare MongoDB to TokuMX.
The Test Environment
The test environment consisted of three separate application driver/database host sets. For each database host tested we used two application driver hosts. The driver software was written in python using the pymongo library to connect to the database hosts.
In all there were six driver hosts, two per environment, and three database hosts. The database hosts were as shown the table below:
Description |
Manufacture |
Cores |
Memory GB |
Disk type |
Disk Space |
Low end host |
Penguin |
6 |
36 |
SATA |
1TB |
Virtual Host |
Virtual Host |
8 |
64 |
SAS |
1.2TB |
High End |
HP DL/380 |
12 |
144 |
SSD |
1.2TB |
In order to create our test data we took 16.8 million documents from one of the production database shards and slightly altered the data to make it effectively anonymous. We also created six thousand synthetic data owners to which the test documents would be assigned. This gave us the potential of 100 billion total posted documents if we were to post all 16.8 million documents to each owner. We had no intention of going that far. In any of the tests the most documents posted to any one user was 166,000.
In order to get the clustering effect that we desired we changed the default _id that the application used to make the _id contain the user and a fabricated logical id. If clustering would help then we would now have all documents with a specific owner clustered together in the database.
The software and versions used in the test are listed below.
Description |
Version |
MongoDB |
2.2.3 |
TokuMX |
1.0RC |
Operating System |
Centos 6.2 |
Python |
2.7 |
PyMongo |
2.5.2 |
The Test
Posting enough documents to consume all of memory and beyond on these hosts, the smallest of which had 36 gigabytes of memory was going to take some time. To make the process go as fast as possible we ran many copies of the test driver at the same time on both hosts. During the load phase we skipped the scan part of the use case and just loaded data as fast as we could. After the load phase, when the database had exceeded the size of the database host’s memory, we switched the drivers to do a scan for each document inserted.
The Results
After running the tests in an attempt to get all databases to at least twice the size of disk space we got the following results. The tables below show averages which tell only part of the story. After the tables there are some graphs which show the degradation in performance as the number of documents increased.
Host |
TokuMX Documents |
Mongo Documents |
TokuMX Disk Size |
Mongo Disk Size |
TokuMX Effective Stored Doc size |
Mongo Effective Stored Doc size |
Ratio |
Penguin |
103M |
36M |
50GB |
57GB |
490 |
1,559 |
3.18 |
Virtual 8 CPU |
648M |
90M |
239GB |
126GB |
369 |
1,404 |
3.80 |
HP DL380 |
1,034M |
206M |
367GB |
274GB |
356 |
1,330 |
3.74 |
Data Loads |
TokuMX Inserts per second |
Mongo Inserts per second |
TokuMX Speedup |
Penguin |
3,211 |
N/A |
N/A |
High Tier 8 CPU |
8,159 |
1,189 |
6.86 |
IOPS3 |
12,837 |
3,228 |
3.98 |
Sync Scan Rates |
TokuMX Docs per second |
Mongo Docs per second |
TokuMX Speedup |
Penguin |
261,685 |
137,911 |
1.90 |
High Tier 8 CPU |
266,122 |
160,953 |
1.65 |
IOPS3 |
559,764 |
358,263 |
1.56 |
The graphs below are only for the HP DL380 hosted database. The graphs for the other two hosts are very similar. It is interesting to note that the effective scan rate performance of MongoDB rapidly declines, but TokuMX does not. Note the glitch near the right end of the TokuMX line happened because one set of the drivers quit before the test completed and needed to be restarted.
Documents Scanned Per Second
Both TokuMX and MongoDB show a decrease in performance when inserts are graphed. But the MongoDB drop in performance is more rapid to begin with. Since more documents are stored in each owner’s document set over time, these degradations in performance are pretty much to be expected. Again note the drop where one of the drivers stopped during the TokuMX test.
Document Inserts Per Second
In the next graph the “disk” I/O utiliztion is graphed. MongoDB “disk” utiliiztion continues to grow as the database gets larger, eventually becoming saturated. TokuMX “disk” utilization does not, instead remaining pretty much steady throughout the test. Note the only noticable increase for TokuMX takes place when one of the drivers is shut down. During this time period the TokuMX server takes advantage of the reduceed workload to take care of “disk” housekeeping activities.
Persistant Storage Device Utilization
Finally we see CPU utilization compared. Note here it’s TokuMX that consumes all of the available resources. Managing the Fractal tree structure in addition to all of the compressing and decompressing of the data results in CPU becoming the likely bottleneck for most TokuMX installations.
TokuMX vs Mongo CPU Utilization
Summary
TokuMX does in fact manage to significantly decrease the size of a MongoDB database by somewhere between 3 and 4 times. In addition even when the database exceeds the size of memory TokuMX does a much better job of managing the space consumed by the working set by implementing Clustered Indexes.
About the Author
John Schulz is an independent database consultant who specializes in evaluation and proof-of-concept projects. With over 30 years of experience in database technology, he has designed and deployed applications scaling from hundreds to millions of users, while maintaining high availability and consistent response times. Prior to becoming an independent consultant in February of 2014 John was a database developer and architect with AOL.