The open source NoSQL Apache CouchDB database is set to get a major performance boost thanks to commercial services vendor Couchbase.
Couchbase this week announced multiple performance contributions to CouchDB that will first appear in the Couchbase Single Server 2.0 distribution of CouchDB. Among the enhancements are disk write and indexing performance gains that offer the promise of a 3x boost.
“We added a feature to the core storage module of CouchDB to allow compression of the data we are storing on disk,” Couchbase co-founder Jan Lehnardt told InternetNews.com. “The feature allows the database to pick a compression algorithm or to disable the compression completely.”
Lehnardt noted that Couchbase also added gzip and Google’s new Snappy compression algorithms. The whole compression approach is modular so CouchDB developers can add more compression types down the road.
“We made Snappy the default because it resulted in the best CPU time / compression ratio,” Lehnardt said. “The result of the compression is that the CPU has to shove around less data when writing to disk, allowing for more actual data throughput.”
Couchbase developers also found and removed a data serialization bottleneck that improves performance. Another part of the speed improvements noted by Lehnardt, come from making the file write subsystem work asynchronously.
“That way, if the disk subsystem is still sending data to the disk, the CouchDB end can prepare data to be stored in the meantime,” Lehnardt said. “This ensures optimal and even throughput in high-update situations. We’re making full use of Erlang’s great support for concurrency here.”
Couchbase has gone a step further with improvements to disk storage usage using what Lehnardt referred to as a simple storage technique. According to Couchbase, the on-disk storage format reduces by half the amount of disk space required for database and index files, further increasing I/O efficiency.
The improvements to CouchDB are now available in the Couchbase Single Server 2.0 distribution of CouchDB which is in developer preview. Lehnardt noted that all of the patches that Couchbase has made have either been included or posted to the Apache CouchDB contribution review process.
Couchbase itself is a commercial company that was formed in Febuary with the merger of CouchOne and Membase. CouchOne was the home of CouchDB founder Damien Katz who is now with Couchbase.
The core Apache CouchDB project recently updated to version 1.1, which does not include the new improvements.
“The source for the improvement has been out for weeks already, the Apache CouchDB 1.1 release was branched way before we started our efforts, so we couldn’t have gotten these into 1.1.0 without subverting the process that is known to produce reliably stable software,” Lehnardt said. “We didn’t feel like doing that. We hope the Apache CouchDB developer community finds the time to consider our contributions for the 1.2.0 release.”
Lehnardt explained that the Couchbase Single Server 2.0 Developer Preview is based on Apache CouchDB trunk (currently slated to be the basis for Apache CouchDB 1.2 later this year). In addition, Couchbase added an auto-compaction feature that allows a more hands-off operation of CouchDB that has also been submitted to the community review process.
While Couchbase has numbered their upcoming release as 2.0 that doesn’t have a direct relation to the Apache CouchDB project’s version numbering.
“CouchDB 2.0 is not yet on any roadmap, we just chose to differentiate with our version numbers to avoid confusion with Apache CouchDB releases as we’ve seen in the past,” Lehnardt said. “We have a version matrix on our website so our users can find out what Apache CouchDB release is underlying a respective Couchbase release.”