Introduction
This month I take a brief
look at Open Source Database Management Systems. I do not aim to find the
‘best’ one, or perform an in-depth feature-by-feature analysis, but rather a
introduce you to the variety of solutions out there. I am presuming readers of
my columns by now already have an understanding of what Open Source is, and why
one would use it. If not, see the resources at the end of the article.
So which are the Open
Source databases? Most readers have probably heard of MySQL. Its byline is The
World’s most popular Open Source database after all. There is also
PostgreSQL (The World’s Most Advanced Open Source Database), as well as
Firebird (The Relational Database for the New Millenium). However, there
are others too. BerkeleyDB, although not directly comparable, deserves a
mention, while Computer Associates recently released Ingres, and IBM did the
same to Cloudscape (under the name Derby).
So, in brief then, let’s
take a brief look at each of these products.
Berkeley DB
Unlike most of the others
listed here, Berkeley DB is not a full DBMS, complete with graphical reporting
tools and the like. It does not even contain a query-processing layer. Instead,
it is a lightweight embedded relational storage engine, meant for applications
that do not require a separate DBMS installation. Users of MySQL have the
option to use the Berkeley DB storage engine (in the early days it was the only
way to ensure transactional capability).
Derby
If Derby has to have a byline,
perhaps it should be World’s Most Popular Embedded Java Database. That
should give you an idea of the focus of this niched product. IBM released the
commercial Cloudscape database as Derby under an Open Source license to the
Apache foundation, where it remains under consideration as an incubator
project, shortly after Computer Associates released Ingres. Cloudscape is a
Java-oriented embedded database, meaning that the DBMS becomes part of the Java
application, has a very light footprint and is designed to run without the need
for a dedicated DBA. It does not aim to be ‘enterprise quality’, but rather to
serve the needs of Java developers in particular.
Firebird
Inprise (now Borland)
were one of the first commercial companies to release a database under an Open
Source license when they did so with their Interbase product in early 2000. Interbase
6 was released under a variant of the Mozilla license. Borland have since
released version 6.5, 7, 7.1 and 7.5 of the commercial product, but Firebird
has continued to develop based on the Interbase 6.0 source code. Firebird has a
small but active user base, and until recently, it was usually forgotten in the
MySQL
vs PostgreSQL debates. Firebird has always been more fully featured than
MySQL, and has, unlike PostgreSQL, it has always worked well on Windows as well
as Linux and other ‘Nix variants. Firebird was however slightly hamstrung by
Borland upon its release, as they did not release all the elements, and the
build scripts did not even work! One important lack was a replication engine
(though Firebird does have a shadowing feature, which keeps an identical copy
of the live database). There are projects afoot, many commercial, but the Open
Source ones do not seem that mature, or integrated into the main Firebird code.
Firebird development has also been relatively slow, with the recently released
1.5 aimed mainly at rewriting the code in C++ (from C), and not adding many new
features. However, Firebird was a relatively mature DBMS to start with, and its
aim remains replacing Oracle and SQL server in commercial applications.
There are two flavors
available: Classic and Super Server, with the Super server being less
resource-intensive, and more suitable for high-volume environments (as well as
where most of the development seems to be focused). There are also three
variants of SQL, varying in differing degrees in their adherence to ANSI
standards,
Firebird has a loyal
user-base, and is a good product to work on, and if it can increase its
development speed, as well as improve its marketing, it will be a strong
contender.
Ingres
Ingres began as a
relational database project at the University of California, Berkeley (the same
American university directly or indirectly responsible for so many
contributions to the Open Source world, including BSD Unix, Postgres, vi and Tcl.
It became one of the most influential projects in the database world. With code
released under a variant of the BSD license, the proprietary products Sybase,
SQL Server and Informix all have their roots in Ingres, as does PostgreSQL. In
1982, Michael Stonebraker, one of the founders of the project, started Ingres
Corporation, and attempted to commercialize the product. ASK and, eventually,
Computer Associates bought the product, and continued to market and develop it.
However, with Open Source databases making serious inroads in the commercial
database world, they released it under an Open Source license in 2004.
Ingres is known as a
reliable and scalable database, but suffers from a reputation that it lags
behind in features, with Computer Associates seen as not having done much
development since they bought it in 1994. Arguably, once ahead of Oracle, it is
still deployed in more enterprise environments than any other Open Source
database, and is ahead of MySQL in features, although online documentation is
poor in comparison to the others. Important unusual features include:
-
Parallel
queries (allowing superb scalability, as a single query can get broken into
components that run across all available resources). -
online table
reorganization (no need to take the database offline to perform routine
maintenance)
CA also
put some financial commitment behind the project in the form of the Million
Dollar Challenge, whereby the Open Source community was invited to create
solutions enabling Oracle, SQL Server, Sybase, DB2, Informix, and/or MySQL
users to migrate to Ingres. Six winners will be announced at CA’s annual user
conference, April 17-21, 2005. Once at the top of the pile, releasing Ingres
under an Open Source license has given it a new lease of life, and I expect to
see some rapid growth in the product.
MySQL
MySQL’s byline of most
popular is certainly correct, as, fuelled by the Internet boom, it has
become widely-used for websites. Initially fast and easy-to-use, but with
limited functionality, MySQL has worked on adding features to the product, and
although it still lags the enterprise-level databases, its popularity, and huge
user-base mean there are lots of tools and applications that work with MySQL,
with lots of unofficial support to add to MySQL’s official support. MySQL lacks
views, triggers and stored procedures, important features for many users. Have
a look at the full list of features, as well as what is lacking in the article "MySQL,
Still Tomorrow’s Contender?".
MySQL achieved an
important coup when SAP gave MySQL their SAPDB database. SAPDB is a
full-featured, mature DBMS that, although not yet well integrated into the rest
of MySQL, gives MySQL an enterprise-ready answer to their critics. MySQL has
tremendous momentum behind them, and if they can quickly add the missing
features, whilst maintaining the product’s ease of use and speed, they may have
an unbeatable combination.
PostgreSQL
PostgreSQL first existed
as a name in 1996, when Postgres95 was renamed, reflecting the importance of
the recently added SQL interpreter. Before that Postgres, also a project of the
famous Michael Stonebraker and designed as a successor to Ingres, used QUEL,
which was very similar to SQL, and arguably more consistent in structure.
However, the dominance of the SQL-based Oracle and DB2 meant that QUEL fell from
favor. Fortunately, Postgres was released under a BSD license, and even though
the Berkeley-led project had ended, the code was freely available, and the
project has continued to evolve into what it is today. Until the recently
released version 8.0, PostgreSQL was not available for Windows, but I expect
its market share to increase quickly as Windows users start to discover it. Its
claim of most advanced is not lightly made. It has almost all of the
features required by enterprise-level databases, as well as a number of other
unusual and potentially useful features:
-
User-defined
types and operators -
Table
inheritance (where a table inherits all of the columns from its parent, but
other columns can still be defined) -
Partial and
expressional indexes (indexes defined on only part of a table, as limited by a
condition, or created on the output of expressions) -
Multiple
stored-procedure languages (including the native PL/PgSQL, PL/PHP, PL/Perl and
PL/Python)
PostgreSQL was slow in
not providing native Windows binaries, expecting their full feature-set alone
to boost them. Now that they have rectified this, the momentum they have as the
number two Open Source database may see them experience some rapid growth.