The Lure of Open Source Databases


Alex Polishchuk and Michael Procopio

Wouldn’t it great to have a database to store all your data, not
have to pay for the software licensing and get a low annual support cost per
machine not per CPU, or users, or company revenue? Absolutely perfect! Better
yet, such things do exist, they are called Open Source Databases (OSDB), but as
with everything else, it comes at another type of price that you need to pay.
It may be low transaction volume, lack of features, limited GUI management, or
some other issue compared to commercial databases.

The major commercial databases are Oracle, Microsoft SQL Server,
IBM DB2, and Sybase Adaptive SQL Server. On the open source side, there are
three major databases: MySQL, PostgreSQL, and Ingres with a combined installed
base in the tens of millions. All these database management systems (DBMS) have
existed on the market for over a decade. Both categories have proven themselves
as reliable for data storage and management. Companies ranging from the
smallest startups to Fortune 500 companies and government organizations around
the world use a mix of these products every day and trust these systems with
their data.

This paper will cover our interviews with customers and vendors of
open source and commercial databases. We’ll look at various aspects of what
open source databases are offering today at a high level and how they stack up
against their commercial counterparts from a business perspective. Further we
will discuss the reasons open source databases are chosen (or not) for various
use cases. Our goal is to give you a starting point in your investigation on
OSDBs or introduce you to the possibility of using them in your business.

Four items became quickly apparent. Most of our interviewees use
more than one database. Ease of use, or lack thereof was a key deciding factor
for the application being delivered. Management GUIs were mentioned
prominently. The depth of features plays an important part in deciding which
database to use for each application. And, surprisingly to us, support was not
a major differentiator. Cost of the software did factor in for some decisions
but in only one interview was it the primary factor.

Overview of Commercial Databases and OSDBs

Oracle

Oracle is the leader of the commercial pack with a wide variety of
features provided, platforms supported and breadth of offerings. In a 2006 IDC
study, Oracle was listed with a 44% market share and 14% growth. Its features
range from table partitioning and a wealth of statistical functions to high
availability in the form of native replication, data mirroring and Real
Application Clusters RAC). Another feature is online table and index
rebuilding. In many DBMSs rebuilding is an offline operation, that is, no access
to the data is allowed while the indexes are being rebuilt because this
operation requires exclusive access to the object. This is not a comprehensive
list just a sample of the advanced features Oracle offers.

MySQL

MySQL is leading the OSDBs by the number of installations and is
used by small startup companies all the way up to huge web sites utilizing OS
clustering. MySQL has a number of pluggable storage engines to allow various
required functionalities, instead of having all features available at once.
This is another factor that allows it to be the data retrieval performance
leader.

“We have (since the first release) a storage engine interface.
This gives a user the opportunity to choose data store depending on their need.
We have both ACID transactional ones (InnoDB, Falcon), non transactional useful
for DW (MyISAM), logging engine (no updates or deletes, Archive) and Cluster
that is a high availability shared nothing distributed system”, said David Axmark,
who is one of the MySQL co-founders.

“We also have proprietary third party engines like InfoBright
(super high compression for DW), Nitro (extremely high update rates with
concurrent reading). And the IBM DB2 engine will be available on AS/400 later
this year.

We have many more engines for special purposes, although it is not
recommended to use all of them. And companies like Google has written their own
to speed up their common operations” added David.

One of the advantages MySQL has is a very small footprint compared
to Oracle. MySQL is also the fastest DBMS for data retrieval, but it does not come
even close to the number of features provided by Oracle. David Axmark said “It
was never trying to compete feature-wise with the other databases. The goal is
ease of use, reliability, and performance. MySQL has huge market popularity and
developer support.”

Ingres

Emma K. McGrattan of Ingres says that many new features were
introduced in the 2006 release. Some of those features are large-scale
deployment capability, key range table partitioning, parallel query support.
“Bread and butter features, such as, stored procedures, triggers, and views
have been part of Ingres for a long time.” according to Emma. Ingres does not
have built-in job support yet.

Microsoft SQL Server

Microsoft SQL Server only runs on MS Windows, unlike any other
database vendors mentioned in this article. Because of pervasiveness of the
Windows operating system, its lack of multiplatform support is not an inhibitor
to its growth. It is the fastest growing in market share and number three
overall, next to DB2, with 18.6% according to an IDC 2006 study. Its pricing is
disruptive to the database market and is lowering the average selling price of
commercial databases. In the 2005 release of SQL Server, a number of new
features were added that make it more competitive with Oracle. These include
table partitioning, online operations for indexes and numerous other management
features. As you will see later in the article, multiple interviews gave kudos
to SQL Server for its intuitive, easy to use management interface.

PostgreSQL

PostgreSQL is known among open source databases for its extensive
feature set, including stored procedures, partitioning, multiple procedural languages,
and an array of exotic data types and indexes. It’s also known for standards
compliance and a stringent approach to security. While OLTP is its most popular
workload, PostgreSQL is the leading OSDB in data warehousing, for which
database sizes range from 2 terabytes up to 32 terabytes, according to Josh Berkus
who is the PostgreSQL Lead at Sun Microsystems.

Since PostgreSQL is a community-controlled open source project
supported by a variety of companies, support is available from several
companies including Sun, Fujitsu, Unisys, SRA and Red Hat as well as numerous
start-ups and consulting companies. While PostgreSQL has its largest rate of
adoption in Japan (where it’s the second most popular RDBMS after Oracle), it’s
used worldwide.

Sybase

The latest version of Sybase Adaptive Server has a richer feature
set compared to previous releases, but still less than the other top commercial
databases, for example lack of clustering support for high availability and
online index rebuilding. This may account for it being number four in market
share with a small 3.2% market share according to the IDC 2006 study. Where
Sybase is strong is in its small footprint and economical use of resources (see
section on Embedded database applications). In furtherance of this goal, Sybase
15 has introduced new “patented query processing technology that has
shown increased performance and reduced hardware resource consumption.”
according to the Sybase web site.

On the other hand, Sybase is making a push in the mobile arena
with their SQL Anywhere database server. One of SQL Anywhere’s features is its Ultralite
database management system which is memory resident and runs on Windows Mobile,
Windows, Palm OS and Symbian.

It is an interesting historical note that Microsoft’s SQL Server
was developed from Sybase. Microsoft and Sybase entered into a contract whereby
Microsoft was granted rights to resell a branded version of Sybase and to make
its own new versions. While Microsoft has made many changes since then, you can
still see this heritage in the command structure for SQL Server.

DB2

In 2004, DB2 had most of its sales on IBM mainframes and on their
AS/400 series systems. The good news for DB2 then was that it was only $30M shy
of Oracle’s multi-billon dollar sales according to a Gartner report. The bad
news is that based on the 2006 IDC study DB2 is selling at less than half the
rate of Oracle; although it still holds the number two position. DB2 now runs
on Windows, Linux and UNIX platforms in addition to IBM proprietary operating
systems. One of the areas that DB2 v9, the current version, focuses on is its
capabilities with XML. In a recent project with another database, we had to do
a lot of extra work manipulating XML to get it into that database. If this
feature had been present in the database we were using, it would have made the
job so much easier. IBM is also pushing into the mobile space with DB2
Everyplace that supports PDAs and smart phones.

Embedded Database Applications

A very different use case for databases is the embedded
application where a small footprint is of the essence. This is where a database
is used by another piece of software or hardware for its internal working
without the user interacting directly with the database. We both worked on a performance-monitoring
product that was sold to customers, which included an embedded database to hold
the collected and summarized data. We know of medical equipment and network
appliances that have embedded databases as well. MySQL has been used extensively
in embedded systems since its conception. Microsoft offers the SQL Server 2005
Express Edition, which is the next version of Microsoft Database Engine (MSDE)
that is used as an embedded database, in part because it is royalty free. Oracle
now has TimesTen In-Memory Database that fits entirely in physical memory and
utilizes the standard SQL interface.

Linux is the operating system of choice for Open Source Databases,
because both are open source and royalty free products. Linux variants are
widely spread in consumer electronics these days. For example, many digital
video recorders (DVR) use Linux for their operation, which makes Linux an ideal
platform for OSDBs, due to its small footprint. In this case, an OSDB
potentially can be used as TV guide, usage statistics, and recorded programs
storage.

Pick your task, pick your database

Dominique Jean, the Vice President of Technology at PriceGrabber,
probably put it best by saying that “there is no perfect database today.” We
found the primary reason for choosing a database was how its features worked
for the task it was being used to manage. The tasks our interviewees are using
a database for are: web serving, transaction processing, text search, and
situations where replication is a key requirement.

For transaction processing, Oracle seems to lead, followed by MS
SQL Server in the interviews we conducted. None of the OSDBs seem to have
traction here. Other areas where a feature rich environment was required did
not bode well for OSDBs. Items mentioned in favor of commercial databases were
stored procedures, triggers and security among others. That said, Ingres’ Emma
K. McGrattan told us that triggers and stored procedures have been in Ingres
almost since its inception. PostgreSQL similarly has had these features for a
while. Stored procedures and triggers have recently been introduced to MySQL.

For web applications, MySQL seems to lead with our interviewees.
Web applications are primarily database reads. MySQL was praised for its speed
in this area; Ed Presz, the Director of Database Engineering at Ticketmaster said,
“MySQL provides fast read operations that are as fast if not faster than
Oracle.” When the task moved from lookup to processing a customer purchase as it
did with two of our interviewees, Ticketmaster and Live Nation, the database
switches from MySQL to Oracle.

Replication and clustering got mixed results. Ed Presz told us
“One-way replication is terrific in MySQL.” Dominique said that MySQL clustering
was not deployed because it is in-memory only.

Ease of Use and Management

Open source DBMS are mostly managed from command line tools. Some
GUI tools are available, but those are not as flexible and robust as the one
from Oracle 10g, for example.

The winner in ease of management is clearly Microsoft. Microsoft’s
primary GUI is not web based but a native console. This gives it the ability to
include a number of powerful capabilities not available to web applications.
Dave Kochbeck, the Senior Vice President Interactive Technology Group at Live
Nation said, “I wish administration in Oracle was as good as SQL Server”. From
our experience, you could put any database name in place of Oracle in that
sentence.

Support

One of the biggest questions we hear while discussing open source
is “where will I get support?” MySQL provides its own customer support 24 x 7
with 30 minutes response time. There are various levels of paid support up to
unlimited. As noted above, PostgreSQL support is available from a variety of
platform vendors. Support terms, costs, geographic areas and bundling
requirements vary by vendor.

Source code for open databases is available for anyone to view and
modify! This leads to several potential issues. First, what if a legitimate
customer makes changes? Josh Berkus says that Sun will not provide support in
such cases, while David Axmark said that MySQL will support it, but under very
specific terms and conditions. Furthermore, according to both of them, the
number of customers who are doing this is in single digits. This has to do with
the product complexity.

Another open source issue is the fact that anyone can modify it
and potentially introduce a bug or intentional or unintentional security
breach. According to Josh Berkus “such things have happened before and
customers need to make sure that they download updates from known sources and
verify those with MD5 checksums”.

None of our interviewees saw support as an issue. When we asked
about support, here is what we heard. Dominique’s comments on MySQL were,
“Great support, very helpful.” From Dave Kochbeck, “We are a Platinum customer
with MySQL. They provide development support and support for custom PHP. We get
much more support from MySQL than commercial database vendors.”

Cost

The cost equation in the database world is fairly simple. With OSDBs,
you get the license for free but features are sacrificed. With commercial databases,
a much broader set of features is available, but there are per copy or per CPU
license fees.

On the topic of maintenance support, while it appears you have a
choice other than purchasing a support contract from a vendor because you have
the source code, in practice your really don’t. While modification of the
source code is an option, doing so will quickly make your installation
unsupportable because as new versions of the database come out your changes
would need to be reapplied and may have been made unnecessary or impractical by
the new version. It seems to us the only practical other choice is to become
part of the developer community for the database, which will require a
significant time commitment, which probably outweighs the cost of the support
contract. With commercial vendors the only choice, if you want maintenance, is
to purchase a contract.

Conclusion

What we found overall is that you pick the database to fit your
application or needs. Web serving, which is primarily reading, is the strong
suit for MySQL. Transaction processing and complex database feature requirements
are the domain of Oracle and Microsoft SQL Server. If you need some of the
advanced features of commercial databases but don’t want to pay license fees
then look into PostgreSQL or Ingres. For embedded database applications, MySQL
and Sybase have the small footprint typically required.

If you have stayed away from open source for fear of lack of
support, then our interviews show that you can put those fears aside, as the
three major open source databases are backed by strong support organizations
with response times and professionalism that can match the commercial vendors.

About the authors

Alex Polishchuk is the founder and president of Advanced Computer
Consulting (www.advcomputerconsulting.com). Alex has over fifteen years of
professional experience designing, developing, and implementing database
applications in various industries and companies ranging from small to Fortune
50 corporations. Alex’s primary areas of expertise are in database security and
performance optimization and tuning. He can be contacted at alex@advcomputerconsulting.com.

Michael Procopio is a Senior Product Manager at HP. He has over 25
years of experience in computer systems and networking. He has held positions
in consulting, product management, technical marketing, training and IT management.
Michael has been a speaker at numerous IT conferences and is a member of the
IEEE. He can be reached at michael@mprocopio.com.

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles