The Lure of Open Source Databases
July 2, 2007Alex Polishchuk and Michael Procopio
Wouldn't it great to have a database to store all your data, not have to pay for the software licensing and get a low annual support cost per machine not per CPU, or users, or company revenue? Absolutely perfect! Better yet, such things do exist, they are called Open Source Databases (OSDB), but as with everything else, it comes at another type of price that you need to pay. It may be low transaction volume, lack of features, limited GUI management, or some other issue compared to commercial databases.
The major commercial databases are Oracle, Microsoft SQL Server, IBM DB2, and Sybase Adaptive SQL Server. On the open source side, there are three major databases: MySQL, PostgreSQL, and Ingres with a combined installed base in the tens of millions. All these database management systems (DBMS) have existed on the market for over a decade. Both categories have proven themselves as reliable for data storage and management. Companies ranging from the smallest startups to Fortune 500 companies and government organizations around the world use a mix of these products every day and trust these systems with their data.
This paper will cover our interviews with customers and vendors of open source and commercial databases. Well look at various aspects of what open source databases are offering today at a high level and how they stack up against their commercial counterparts from a business perspective. Further we will discuss the reasons open source databases are chosen (or not) for various use cases. Our goal is to give you a starting point in your investigation on OSDBs or introduce you to the possibility of using them in your business.
Four items became quickly apparent. Most of our interviewees use more than one database. Ease of use, or lack thereof was a key deciding factor for the application being delivered. Management GUIs were mentioned prominently. The depth of features plays an important part in deciding which database to use for each application. And, surprisingly to us, support was not a major differentiator. Cost of the software did factor in for some decisions but in only one interview was it the primary factor.
Overview of Commercial Databases and OSDBs
Oracle is the leader of the commercial pack with a wide variety of features provided, platforms supported and breadth of offerings. In a 2006 IDC study, Oracle was listed with a 44% market share and 14% growth. Its features range from table partitioning and a wealth of statistical functions to high availability in the form of native replication, data mirroring and Real Application Clusters RAC). Another feature is online table and index rebuilding. In many DBMSs rebuilding is an offline operation, that is, no access to the data is allowed while the indexes are being rebuilt because this operation requires exclusive access to the object. This is not a comprehensive list just a sample of the advanced features Oracle offers.
MySQL is leading the OSDBs by the number of installations and is used by small startup companies all the way up to huge web sites utilizing OS clustering. MySQL has a number of pluggable storage engines to allow various required functionalities, instead of having all features available at once. This is another factor that allows it to be the data retrieval performance leader.
We have (since the first release) a storage engine interface. This gives a user the opportunity to choose data store depending on their need. We have both ACID transactional ones (InnoDB, Falcon), non transactional useful for DW (MyISAM), logging engine (no updates or deletes, Archive) and Cluster that is a high availability shared nothing distributed system, said David Axmark, who is one of the MySQL co-founders.
We also have proprietary third party engines like InfoBright (super high compression for DW), Nitro (extremely high update rates with concurrent reading). And the IBM DB2 engine will be available on AS/400 later this year.
We have many more engines for special purposes, although it is not recommended to use all of them. And companies like Google has written their own to speed up their common operations added David.
One of the advantages MySQL has is a very small footprint compared to Oracle. MySQL is also the fastest DBMS for data retrieval, but it does not come even close to the number of features provided by Oracle. David Axmark said It was never trying to compete feature-wise with the other databases. The goal is ease of use, reliability, and performance. MySQL has huge market popularity and developer support.
Emma K. McGrattan of Ingres says that many new features were introduced in the 2006 release. Some of those features are large-scale deployment capability, key range table partitioning, parallel query support. Bread and butter features, such as, stored procedures, triggers, and views have been part of Ingres for a long time. according to Emma. Ingres does not have built-in job support yet.
Microsoft SQL Server
Microsoft SQL Server only runs on MS Windows, unlike any other database vendors mentioned in this article. Because of pervasiveness of the Windows operating system, its lack of multiplatform support is not an inhibitor to its growth. It is the fastest growing in market share and number three overall, next to DB2, with 18.6% according to an IDC 2006 study. Its pricing is disruptive to the database market and is lowering the average selling price of commercial databases. In the 2005 release of SQL Server, a number of new features were added that make it more competitive with Oracle. These include table partitioning, online operations for indexes and numerous other management features. As you will see later in the article, multiple interviews gave kudos to SQL Server for its intuitive, easy to use management interface.
PostgreSQL is known among open source databases for its extensive feature set, including stored procedures, partitioning, multiple procedural languages, and an array of exotic data types and indexes. It's also known for standards compliance and a stringent approach to security. While OLTP is its most popular workload, PostgreSQL is the leading OSDB in data warehousing, for which database sizes range from 2 terabytes up to 32 terabytes, according to Josh Berkus who is the PostgreSQL Lead at Sun Microsystems.
Since PostgreSQL is a community-controlled open source project supported by a variety of companies, support is available from several companies including Sun, Fujitsu, Unisys, SRA and Red Hat as well as numerous start-ups and consulting companies. While PostgreSQL has its largest rate of adoption in Japan (where it's the second most popular RDBMS after Oracle), it's used worldwide.
The latest version of Sybase Adaptive Server has a richer feature set compared to previous releases, but still less than the other top commercial databases, for example lack of clustering support for high availability and online index rebuilding. This may account for it being number four in market share with a small 3.2% market share according to the IDC 2006 study. Where Sybase is strong is in its small footprint and economical use of resources (see section on Embedded database applications). In furtherance of this goal, Sybase 15 has introduced new patented query processing technology that has shown increased performance and reduced hardware resource consumption. according to the Sybase web site.
On the other hand, Sybase is making a push in the mobile arena with their SQL Anywhere database server. One of SQL Anywheres features is its Ultralite database management system which is memory resident and runs on Windows Mobile, Windows, Palm OS and Symbian.
It is an interesting historical note that Microsofts SQL Server was developed from Sybase. Microsoft and Sybase entered into a contract whereby Microsoft was granted rights to resell a branded version of Sybase and to make its own new versions. While Microsoft has made many changes since then, you can still see this heritage in the command structure for SQL Server.
In 2004, DB2 had most of its sales on IBM mainframes and on their AS/400 series systems. The good news for DB2 then was that it was only $30M shy of Oracles multi-billon dollar sales according to a Gartner report. The bad news is that based on the 2006 IDC study DB2 is selling at less than half the rate of Oracle; although it still holds the number two position. DB2 now runs on Windows, Linux and UNIX platforms in addition to IBM proprietary operating systems. One of the areas that DB2 v9, the current version, focuses on is its capabilities with XML. In a recent project with another database, we had to do a lot of extra work manipulating XML to get it into that database. If this feature had been present in the database we were using, it would have made the job so much easier. IBM is also pushing into the mobile space with DB2 Everyplace that supports PDAs and smart phones.
Embedded Database Applications
A very different use case for databases is the embedded application where a small footprint is of the essence. This is where a database is used by another piece of software or hardware for its internal working without the user interacting directly with the database. We both worked on a performance-monitoring product that was sold to customers, which included an embedded database to hold the collected and summarized data. We know of medical equipment and network appliances that have embedded databases as well. MySQL has been used extensively in embedded systems since its conception. Microsoft offers the SQL Server 2005 Express Edition, which is the next version of Microsoft Database Engine (MSDE) that is used as an embedded database, in part because it is royalty free. Oracle now has TimesTen In-Memory Database that fits entirely in physical memory and utilizes the standard SQL interface.
Linux is the operating system of choice for Open Source Databases, because both are open source and royalty free products. Linux variants are widely spread in consumer electronics these days. For example, many digital video recorders (DVR) use Linux for their operation, which makes Linux an ideal platform for OSDBs, due to its small footprint. In this case, an OSDB potentially can be used as TV guide, usage statistics, and recorded programs storage.
Pick your task, pick your database
Dominique Jean, the Vice President of Technology at PriceGrabber, probably put it best by saying that there is no perfect database today. We found the primary reason for choosing a database was how its features worked for the task it was being used to manage. The tasks our interviewees are using a database for are: web serving, transaction processing, text search, and situations where replication is a key requirement.
For transaction processing, Oracle seems to lead, followed by MS SQL Server in the interviews we conducted. None of the OSDBs seem to have traction here. Other areas where a feature rich environment was required did not bode well for OSDBs. Items mentioned in favor of commercial databases were stored procedures, triggers and security among others. That said, Ingres Emma K. McGrattan told us that triggers and stored procedures have been in Ingres almost since its inception. PostgreSQL similarly has had these features for a while. Stored procedures and triggers have recently been introduced to MySQL.
For web applications, MySQL seems to lead with our interviewees. Web applications are primarily database reads. MySQL was praised for its speed in this area; Ed Presz, the Director of Database Engineering at Ticketmaster said, MySQL provides fast read operations that are as fast if not faster than Oracle. When the task moved from lookup to processing a customer purchase as it did with two of our interviewees, Ticketmaster and Live Nation, the database switches from MySQL to Oracle.
Replication and clustering got mixed results. Ed Presz told us One-way replication is terrific in MySQL. Dominique said that MySQL clustering was not deployed because it is in-memory only.
Ease of Use and Management
Open source DBMS are mostly managed from command line tools. Some GUI tools are available, but those are not as flexible and robust as the one from Oracle 10g, for example.
The winner in ease of management is clearly Microsoft. Microsofts primary GUI is not web based but a native console. This gives it the ability to include a number of powerful capabilities not available to web applications. Dave Kochbeck, the Senior Vice President Interactive Technology Group at Live Nation said, I wish administration in Oracle was as good as SQL Server. From our experience, you could put any database name in place of Oracle in that sentence.
One of the biggest questions we hear while discussing open source is where will I get support? MySQL provides its own customer support 24 x 7 with 30 minutes response time. There are various levels of paid support up to unlimited. As noted above, PostgreSQL support is available from a variety of platform vendors. Support terms, costs, geographic areas and bundling requirements vary by vendor.
Source code for open databases is available for anyone to view and modify! This leads to several potential issues. First, what if a legitimate customer makes changes? Josh Berkus says that Sun will not provide support in such cases, while David Axmark said that MySQL will support it, but under very specific terms and conditions. Furthermore, according to both of them, the number of customers who are doing this is in single digits. This has to do with the product complexity.
Another open source issue is the fact that anyone can modify it and potentially introduce a bug or intentional or unintentional security breach. According to Josh Berkus such things have happened before and customers need to make sure that they download updates from known sources and verify those with MD5 checksums.
None of our interviewees saw support as an issue. When we asked about support, here is what we heard. Dominiques comments on MySQL were, Great support, very helpful. From Dave Kochbeck, We are a Platinum customer with MySQL. They provide development support and support for custom PHP. We get much more support from MySQL than commercial database vendors.
The cost equation in the database world is fairly simple. With OSDBs, you get the license for free but features are sacrificed. With commercial databases, a much broader set of features is available, but there are per copy or per CPU license fees.
On the topic of maintenance support, while it appears you have a choice other than purchasing a support contract from a vendor because you have the source code, in practice your really dont. While modification of the source code is an option, doing so will quickly make your installation unsupportable because as new versions of the database come out your changes would need to be reapplied and may have been made unnecessary or impractical by the new version. It seems to us the only practical other choice is to become part of the developer community for the database, which will require a significant time commitment, which probably outweighs the cost of the support contract. With commercial vendors the only choice, if you want maintenance, is to purchase a contract.
What we found overall is that you pick the database to fit your application or needs. Web serving, which is primarily reading, is the strong suit for MySQL. Transaction processing and complex database feature requirements are the domain of Oracle and Microsoft SQL Server. If you need some of the advanced features of commercial databases but dont want to pay license fees then look into PostgreSQL or Ingres. For embedded database applications, MySQL and Sybase have the small footprint typically required.
If you have stayed away from open source for fear of lack of support, then our interviews show that you can put those fears aside, as the three major open source databases are backed by strong support organizations with response times and professionalism that can match the commercial vendors.
About the authors
Alex Polishchuk is the founder and president of Advanced Computer Consulting (www.advcomputerconsulting.com). Alex has over fifteen years of professional experience designing, developing, and implementing database applications in various industries and companies ranging from small to Fortune 50 corporations. Alexs primary areas of expertise are in database security and performance optimization and tuning. He can be contacted at firstname.lastname@example.org.
Michael Procopio is a Senior Product Manager at HP. He has over 25 years of experience in computer systems and networking. He has held positions in consulting, product management, technical marketing, training and IT management. Michael has been a speaker at numerous IT conferences and is a member of the IEEE. He can be reached at email@example.com.