The number of data management products available today is truly staggering. Never have there been so many choices. Besides the traditional commercial relational databases, there are a number of great open source products to choose from, as well as a whole other world of cloud and NoSQL databases to consider.
With so many options, what’s a database purchaser to do? The same thing that you would do before making any major purchase: research the market, narrow down the selection to products that suit your needs and budgetary constraints, and belly up to the bar once you’re ready to make an informed decision. That’s where this guide comes in. Its purpose is to help you narrow down the field of potential candidates to a short list that you can then research in more depth.
We’re going to begin this new Database Buyer’s Guide series by exploring the major database types and important features in parts one and two. Then we’ll be ready to take a look at some specific products.
What Type of Database?
There are three major database types, each with its own strengths and weaknesses. Your business requirements, budget, technical expertise and space limitations will all play a role in choosing your database platform. The three major database classifications are Desktop, Sever, Web-enabled, and Cloud DBaaS.
You’re probably familiar with at least one desktop database product. The market is dominated by brand-names like Microsoft Access, FileMaker Pro and Lotus Approach. These products are relatively inexpensive and are great for single-user or non-interactive Web applications.
If you’re planning a heavy-duty database application like an e-commerce site or a multi-user database, you’re going to need to call on one of the big players. Server databases like Microsoft SQL Server and Oracle provide real power but carry a correspondingly hefty price tag.
Nowadays, almost every database application calls for some kind of Web interaction. Many people assume that if you’re looking to publish your database on the Internet, you need to use a server database. That’s not necessarily true; a far less expensive desktop database could meet your needs.
Cloud Services, including Database as a Service, or DBaaS for short, consist of one or more databases that are hosted by a third party. Within this arrangement, there exist a number of architectural and business models to choose from. Some of the advantages of this route include automatic back ups, scalability, as well as vast computing resources and storage capacity accessible via a simple Web interface. In addition, this setup makes it easy to use replication to enhance availability and reliability for production databases and to scale out beyond the capacity of a single database deployment for heavy traffic. We’ll write more on cloud databases in a future article.
Commercial or Open Source?
As with most software, databases come in both commercial and open source flavors. The primary deciding factor here is the issue of support. Although open source databases have online communities that work together to resolve issues as they come up, for some people, nothing beats the peace of mind that comes with having a formal support contract.
Some specialized features of commercial databases like Oracle are not available in open source databases. Some advanced features include triggers, views, inheritance, sequences, stored procedures, cursors and user-defined data types, just to name a few common ones. Commercial products often have a more established reliability record as well.
Databases don’t all store their data in the same way. There are actually several types of storage mechanisms, including ISAM/MyISAM, BDB (BerkeleyDB), MERGE (also known as the MRG_MyISAM), as well as the more advanced InnoDB. When choosing a storage mechanism, make sure you read up on all of the features you plan on implementing. While researching this article, I found that some features exist in certain storage mechanisms, but not in others. Most notably, InnoDB and BDB are two of the few storage types that are transaction-safe. For more information on database storage engines, there is an excellent summary on the MySQL Dev site.
One of the critical features of any database engine is data integrity. “Atomic, Consistent, Isolated, Durable” (ACID) compliance is a qualification that assures data integrity. ACID essentially means that when a transaction is performed within a database, either the whole transaction is successful and the information is written to the database, or nothing is written. PostgreSQL, MySQL, Oracle, and Sybase are all examples of databases that support ACID-compliant transaction functionality.
Those databases also support partial rollbacks of transactions, and can take corrected measures in the event of a deadlock. For instance, MySQL uses traditional row-level locking. Multi Version Concurrency Control (MVCC) is another type of record locking. MVCC is a little different from row-level locking in that transactions on the database are performed on a snapshot of the data and then serialized.
SQL is the basic ANSI standard for accessing data in a relational database. Many databases support additional language features and variations that provide specialized capabilities. For example, T-SQL is the proprietary form of SQL used by Microsoft SQL Server. It includes useful functions like cast, convert, date, etc., that are not part of the ANSI standard. Transact-SQL/T-SQL was developed by Sybase for their database server, which Microsoft then licensed. You may have also come across plSQL, which is Oracle’s own version of SQL. There are others as well, such as Jet SQL, which is used by Microsoft Access.
It is important to note that even the ANSI standard for SQL has different releases, named after the year in which it was released, such as 92 or 99. Different database engines will advertise themselves as “mostly ANSI-92” compliant or “fully ANSI-99” compliant, with any exceptions documented. The rule of thumb is: the more recent the ANSI release, the more powerful the language.
Procedures and triggers can be written in other languages as well, such as PL/TCL, PL/Perl, and PL/python. These additional languages come in two basic flavors, safe and unsafe. Safe limits the programming language’s access to host system resources, such as the file system.
As you can see, there is much to ponder before investing in a DBMS; so much in fact, that there are more factors to consider still. We’ll be taking a look at those next!