By Seth Proctor, CTO, NuoDB
Today, organizations are spending a lot of time trying to simplify, align development with operations, and manage commodity or virtualized resources on-demand. This helps to reduce costs, improve efficiency and evolve solutions in a more agile fashion.
Over the past few years, I’ve been having conversation after conversation with architects, developers, and CTOs around Docker and container computing. As a long-time Sun Microsystems engineer, these are familiar topics, but the recent popularity is still impressive. These conversations are with large companies and small, in virtually every industry, at many different levels.
And of course, as CTO for a database company, the conversation inevitably comes to the role the database can – or should – play within a container-based environment. To address that, you need to ask yourself a few questions:
- Do you really need a database, or is there some other kind of data management solution you can rely on?
- Can your (new) architecture support a non-container-based database? Should it?
- Are you interested in containers for production or for dev-test self-service and efficiency?
- What are your production application needs around scale, resilience, and durability? To what extent must those needs be reflected in your database deployment?
Answering these questions will help you make sound architectural choices.
Let’s break down each of these questions.
So Why a Database Anyway?
Operational databases are core to many applications and services. By their nature, they tend to be carefully resourced and deployed, somewhat unchanging and upgraded infrequently.
Containers, on the other hand, are lightweight and transient – well-suited for stateless scale-out services. Those services, of course, may need to be layered on stateful services like operational databases. They may also rely on configuration, credentials, shared data or other state that is not typically provided by an operational database.
Because of the lightweight, on-demand nature of containers, they work well in support of microservices architectures where monolithic and tightly coupled components are broken down into their smallest elements. Containers offer agility and flexibility.
This agility is especially well-suited for scaling out front-end components like web servers or caches, or spinning up compute on-demand for AI, analytics or other resource-intensive tasks. Often these classes of application simply don’t need an operational database.
On the other hand, if you scale web-server instances to help scale rich application logic provided through a web service, then it’s likely the application is relying on a backing database. Ditto if the compute-intensive operation is doing ingest.
The first question you should ask is what kind of data service your application actually relies on.
What Are Your Architectural Constraints?
Let’s assume that your application needs an operational database.
Containers are being adopted at different rates and at different levels of the stack. So the next question you need to consider is – do you want or need to deploy your database in containers from the start?
For instance, you may be in a Platform as a Service (PaaS) environment where everything must run in a container. Or, as an organization, you may have made an architectural choice to deploy only in containers. In these cases you need to deploy your database the same way.
Alternatively, you may be in a cloud, like Amazon, where you’re choosing to use a database as a service (DBaaS) independent of the application tier. The choice to deploy your database within or outside of containers brings a number of trade-offs you should consider.
Choosing to deploy your database on bare metal or a long-lived virtual machine is familiar, and in many cases is simpler than running within a container. Databases tend to be long-lived services, which need consistent and controlled disk and network IO access. Traditionally, resources are provisioned so that the database has clearly defined MTBF expectations. When physical resources are contended, it can have a significant effect on database throughput or predictable latency. It may be easier to let your application containers be flexible while your database tier stays fixed.
As both container-based platforms and distributed databases evolve, however, it’s becoming easier to think about running databases within containers while still getting the predictable performance that your system of record needs to provide. The advantage to deploying and running your application and database tiers using the same tools is pretty obvious. As I’ll discuss in the next few sections, there are a number of additional advantages as well.
The key considerations are resource management and predictable behavior. You should ask what latencies, throughput, and sustainable spikes you need to support and prefer databases designed to work on commodity that don’t assume tight coupling with hardware. If this is your first attempt deploying a database within containers, start with a simpler workload and use that to get comfortable with the trade-offs.
Let’s assume that you want to deploy your database in a container.
Is This for Production or for a Test / Development Environment or Both?
If you’re only deploying your database for production, you can move along to the next question.
If it’s for a development or testing environment, containers are a way to make your team more self-sufficient and apply resources more efficiently. Tools like Docker provide a single, repeatable way to deploy and run your database across systems. Containers also make it simpler to pool and share resources.
In that case, you may not actually care if the database state is durable. You may just need a scratch database that can be spun up quickly for testing and then thrown away just as quickly. Containers are a great tool for this kind of efficiency, where you don’t need production-level persistence and reliability.
What Capabilities do You Want Your Production Application to Have That Need to be Reflected in the Database?
On the other hand, there are characteristics of and service level agreements (SLAs) for your applications in production. Containers are increasingly a key element for meeting these application needs. The question, then, is how those needs impact database deployment.
For example, if elastic scale is a primary goal, but your application is read-mostly, then you might use a caching tier and avoid scaling the database. If the application is write-intensive, you may need to scale the database itself (through sharding, replication or other techniques) and containers will help. In both cases, in-memory database technologies will be a good fit.
Instead of scale, your focus may be service availability, resilience, and failure handling. In that case you might use containers to manage your database tier, replicating data and responding automatically to failure.
Regardless, fully understanding your priorities and requirements will help you determine the right database architecture for your needs. Databases that can scale out while providing a single, logical view of your data will greatly enhance the value of deploying within containers.
Putting It All Together
All of the considerations outlined above will help you decide if and when you should deploy databases within containers.
If you’re making dev-test more effective, or if you can replicate across multiple container-local filesystems, then deploying your database inside containers with no external, durable storage may be the right approach.
If you want to keep the database tier operating in a familiar fashion and are willing to sacrifice some of the benefits of container deployment, then you may want your database tier operating independent of your containerized application.
If you need a durable database that can scale out with your application and respond to failure efficiently, then you should be looking at memory-centric solutions that fit well with the dynamic nature of container lifecycles.
Ultimately, there isn’t one right answer, but understanding how different requirements factor into your decision is critical when deciding how to deploy and operate databases in container-based architectures.
About Seth Proctor
Seth has 15+ years of experience in the research, design and implementation of scalable systems. That experience includes work on distributed computing, networks, security, languages, operating systems and databases all of which are integral to NuoDB. His particular focus is on how to make technology scale and how to make users scale effectively with their systems.
Prior to NuoDB Seth worked at Nokia on their private cloud architecture. Before that he was at Sun Microsystems Laboratories and collaborated with several product groups and universities. His previous work includes contributions to the Java security framework, the Solaris operating system and several open source projects, in addition to the design of new distributed security and resource management systems. Seth developed new ways of looking at distributed transactions, caching, resource management and profiling in his contributions to Project Darkstar. Darkstar was a key effort at Sun which provided greater insights into how to distribute databases.
Seth holds eight patents for his cutting edge work across several technical disciplines. He has several additional patents awaiting approval related to achieving greater database efficiency and end-user agility