Many IT organizations are just now evaluating a move from SQL Server 2000 to SQL Server 2008 and are even further away from embracing a non-relational database. A number of database administrators don’t really even know how they would use a NoSQL database — would it replace the RDBMS, work alongside it, or have a completely different function?
NoSQL databases are designed to easily scale out with growth, and that’s the primary reason IT shops choose them. They do a good job of performing analytical queries and many have exceptional write performance. NoSQL databases are also designed to handle hardware failure without requiring IT staff to build their own redundant solutions. They have a lot of flexibility when it comes to how data is stored for performance or change which queries are being run. This flexibility also facilitates rapid development — it becomes easy for developers to rapidly iterate through changes to their data model.
Here are five things a relational database administrator needs to know about NoSQL databases:
A good reason for adding a NoSQL database to your corporate infrastructure is that many are well suited to performing analytical queries. Developers can use the query languages they use for atomic queries to perform analytical queries with a NoSQL database. Typically this will be some variation of a MapReduce query, but it’s also possible to query data using Pig or Hive.
Many NoSQL systems boast phenomenal write performance. The combination of high write performance and batch processing makes it easy to pre-aggregate data, summarize results, and still guarantee ad hoc query performance.
While it’s possible to scale out with a relational database, it is not easy, it introduces significant limitations, and it requires significant engineering prowess. Unlike relational databases, NoSQL databases are designed to easily scale out as data grows. Typically, with a relational database like SQL Server or Oracle, you scale by purchasing larger and faster servers and storage, or by employing specialists to provide additional tuning. With a NoSQL database, data is natively partitioned and balanced across multiple nodes in a cluster, and aggregate queries are distributed across those nodes. Scaling is as easy as racking a new server and executing a few commands to add the new server to the cluster.
In addition to rapid scalability, NoSQL databases are designed to deal with redundancy. These databases were originally designed and built at massive scales, where even the rarest hardware problems go from being freak events to eventualities. Hardware will fail. Rather than treat hardware failure as an exceptional event, NoSQL databases are designed to handle it and continue operations. While hardware failure is still a serious concern, it is addressed at the architectural level of the database, and does not require developers, DBAs and operations staff to build their own redundant solutions.
What use is a database if it’s not flexible? While the data modeling issues are completely different in NoSQL, there is significant flexibility in the way data is stored for performance. It’s important to remember that not all NoSQL databases are the same — there are many options available and they have different properties. Databases modeled on Google’s Bigtable, such as Cassandra or HBase, provide flexibility for storing data on disk. It’s possible to create derived column families, so the database can be designed to duplicate frequently accessed data for rapid query response, as long as writes and storage space are cheap.
Databases based on Bigtable have an additional benefit: outside of key structure, it’s possible to store a variety of disparate data in the same table. Structure is largely irrelevant in this type of database. While relational databases have adopted features to solve similar problems — such as sparse columns in SQL Server — they carry overhead. The cost of storing vastly different columns in multiple rows of the same column family is practically invisible in many NoSQL databases.
Key-value stores also provide an incredible level of flexibility. Data is arbitrarily stored as a value. Key-value databases make it possible to store images, Word documents, strings, integers and serialized objects within the same database. While this requires more responsibility and creative thinking by application developers and architects, it also lets those who design the system to build a completely custom solution to meet their needs.
5. Rapid Development
Everyone wants their application to be faster and have more features, and they want it yesterday. NoSQL databases make it easy to change the way data is stored, or change the queries being run. Huge changes to data can be accomplished with simple refactoring and batch processing, rather than complex migration scripts and outages, and it’s even easier to take nodes in a cluster offline for changes and add them back as the new master server — replication features will take care of syncing up data and propagating the new data design out to the other servers in a cluster.
Non-relational databases have been around for a long time. The recent NoSQL movement has brought many new products to the market. Far from a passing fad, these new databases are being adopted across the board from young companies like Foursquare, Facebook and Netflix as well as at traditional companies like Yahoo, Google, Bank of America and Comcast. These companies are already deploying NoSQL databases in production to handle data analysis, fraud detection and bulk data processing.
Jeremiah Peschka is an emerging technology expert with Quest Software. Jeremiah works with different software to identify new techniques and trends in the world of data storage. Over the course of his career, Jeremiah has worked as a systems administrator, developer and DBA. Previously, Jeremiah spent two years at Cass Information Systems, a utility billing provider. Jeremiah is involved in the Professional Association for SQL Server. When he is not volunteering with the development community, he can be found blogging about data storage, relational databases and software development at Facility9. He can also be found on Twitter at @peschkaj and via email at [email protected].