Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

SQL etc

Posted Aug 18, 2010

Introducing Amazon SimpleDB - Page 3

By DatabaseJournal.com Staff

Other Pieces of the Puzzle

In the world of cloud computing, there are a growing number of companies and services from which to choose. Each service provider seeks to align its offerings with a broader strategy. With Amazon, that strategy includes providing very basic infrastructure building blocks for users to assemble customized solutions. AWS tries to get you to use more than one service offering by making the different services useful with each other and by offering fast and free data transfer between services in the same region. This section describes three other Amazon Web Services, along with some ways you might find them to be useful in conjunction with SimpleDB.

Adding Compute Power with Amazon EC2

AWS sells computing power by the hour via the Amazon Elastic Compute Cloud (Amazon EC2). This computing power takes the form of virtual server instances running on top of physical servers within Amazon data centers. These server instances come in varying amounts of processor horsepower and memory, depending on your needs and budget. What makes this compute cloud elastic is the fact that users can start up, and shut down, dozens of virtual instances at a moment’s notice.

These general-purpose servers can fulfill the role of just about any server. Some of the popular choices include web server, database server, batch-processing server, and media server. The use of EC2 can result in a large reduction in ongoing infrastructure maintenance when compared to managing private in-house servers. Another big benefit is the elimination of up-front capital expenditures on hardware in favor of paying for only the compute power that is used.

The sweet spot between SimpleDB and EC2 comes for high-data bandwidth applications. For those apps that need fast access to high volumes of data in SimpleDB, EC2 is the platform of choice. The free same region data transfer can add up to a sizable cost savings for large data sets, but the biggest win comes from the consistently low latency. AWS does not guarantee any particular latency numbers but typically, round-tripping times are in the neighborhood of 2 to 7 milliseconds between EC2 instances and SimpleDB in the same region. These numbers are on par with the latencies others have reported between EC2 instances. For contrast, additional latencies of 50 to 200 milliseconds or more are common when using SimpleDB across the open Internet. When you need fast SimpleDB, EC2 has a lot to offer.

Storing Large Objects with Amazon S3

Amazon Simple Storage Service (Amazon S3) is a web service that enables you to store an unlimited number of files and charges you (low) fees for the actual storage space you use and the data transfer you use. As you might expect, data transfer between S3 and other Amazon Web Services is fast and free. S3 is easy to understand, easy to use, and has a multitude of great uses. You can keep the files you store in S3 private, but you can also make them publicly available from the web. Many websites are using S3 as a media-hosting service to reduce the load on web servers.

EC2 virtual machine images are stored and loaded from S3. EC2 copies storage volumes to and loads storage volumes from S3. The Amazon CloudFront content delivery network can serve frequently accessed web files in S3. The Amazon Elastic MapReduce service runs MapReduce jobs stored in S3. Publicly visible files in S3 can be served up via the BitTorrent peer-to-peer protocol. The list of uses goes on and on.... S3 is really a common denominator cloud service.

SimpleDB users can also find good uses for S3. Because of the high speed within the Amazon cloud, S3 is an obvious storage location choice for SimpleDB import and export data. It is also a solid location to place SimpleDB backup files.

Queuing Up Tasks with Amazon SQS

Amazon Simple Queue Service (Amazon SQS) is a web service that reliably stores messages between distributed computers. Placing a robust queue between the computers
allows them to work independently. It also opens the door to dynamically scaling the number of machines that push messages and the number that retrieve messages.

Although there is no direct connection between SQS and SimpleDB, SQS does have some complementary features that can be useful in SimpleDB-based applications. The semantics of reliable messaging can make it easier to coordinate multiple concurrent clients than when using SimpleDB alone. In cases where there are multiple SimpleDB clients, you can coordinate clients using a reliable SQS queue. For example, you might have multiple servers that are encoding video files and storing information about those videos in SimpleDB. SimpleDB makes a great place to store that data, but it could be cumbersome for use in telling each server which file to process next. The reliable message delivery of SQS would be much more appropriate for that task.

Comparing SimpleDB to Other Products and Services

Numerous new types of products and services are now available or will soon be available in the database/data service space. Some of these are similar to SimpleDB, and others are tangential. A few of them are listed here, along with a brief description and comparison to SimpleDB.

Windows Azure Platform

The Windows Azure Platform is Microsoft’s entry into the cloud-computing fray. Azure defines a raft of service offerings that includes virtual computing, cloud storage, and reliable message queuing. Most of these services are counterparts to Amazon services. At the time of this writing, the Azure services are available as a Community Technology Preview. To date, Microsoft has been struggling to gain its footing in the cloud services arena.

There have been numerous, somewhat confusing, changes in product direction and product naming. Although Microsoft’s cloud platform has been lagging behind AWS a bit, it seems that customer feedback is driving the recent Azure changes. There is every reason to suspect that once Azure becomes generally available, it will be a solid alternative to AWS.

Among the services falling under the Azure umbrella, there is one (currently) named Windows Azure Table. Azure Table is a distributed key-value store with explicit support for partitioning across storage nodes. It is designed for scalability and is in many ways similar to SimpleDB. The following is a list of similarities between Azure Table and SimpleDB:

  • All access to the service is in the form of web requests. As a result, any programming language can be used.
  • Requests are authenticated with encrypted signatures.
  • Consistency is loosened to some degree.
  • Unique primary keys are required for each data entity.
  • Data within each entity is stored as a set of properties, each of which is a name-value pair.
  • There is a limit of 256 properties per entity.
  • A flexible schema allows different entities to have different properties.
  • There is a limit on how much data can be stored in each entity.
  • The number of entities you can get back from a query is limited and a query continuation token must be used to get the next page of results.
  • Service versioning is in place so older versions of the service API can still be used after new versions are rolled out.
  • Scalability is achieved through the horizontal partitioning of data.

There are also differences between the services, as listed here:

  • Azure Table uses a composite key comprised of a partition key followed by a row key, whereas SimpleDB uses a single item name.
  • Azure Table keeps all data with the same partition key on a single storage node. Entities with different partition keys may be automatically spread across hundreds of storage nodes to achieve scalability. With SimpleDB, items must be explicitly placed into multiple domains to get horizontal scaling.
  • The only index in Azure Table is based on the composite key. Any properties you want to query or sort must be included as part of the partition key or row key. In contrast, SimpleDB creates an index for each attribute name, and a SQL-like query language allows query and sort on any attribute.
  • To resolve conflicts resulting from concurrent updates with Azure Table, you have a choice of either last-write-wins or resolving on the client. With SimpleDB, last-write-wins is the only option.
  • Transactions are supported in Azure Table at the entity level as well as for entity groups with the same partition key. SimpleDB applies updates atomically only within the scope of a single item.

Windows Azure Table overall is very SimpleDB-like, with some significant differences in the scalability approach. Neither service has reached maturity yet, so we may still see enhancements aimed at easing the transition from relational databases.

It is worth noting that Microsoft also has another database service in the Windows Azure fold. Microsoft SQL Azure is a cloud database service with full replication across physical servers, transparent automated backups, and support for the full relational data model. This technology is based on SQL Server, and it includes support for T-SQL, stored procedures, views, and indexes. This service is intended to enable direct porting of existing SQL-based applications to the Microsoft cloud.

Google App Engine

App Engine is a service offered by Google that lets you run web applications, written in Java or Python, on Google’s infrastructure. As an application-hosting platform, App Engine includes many non-database functions, but the App Engine data store has similarities to SimpleDB. The non-database functions include a number of different services, all of which are available via API calls. The APIs include service calls to Memcached, email, XMPP, and URL fetching.

App Engine includes an API for data storage based on Google Big Table and in some ways is comparable to SimpleDB. Although Big Table is not directly accessible to App Engine applications, there is support in the data store API for a number of features not available in SimpleDB. These features include data relations, object mapping, transactions, and a user-defined index for each query.

App Engine also has a number of restrictions, some of which are similar to SimpleDB restrictions, like query run time. By default, the App Engine data store is strongly consistent. Once a transaction commits, all subsequent reads will reflect the changes in that transaction. It also means that if the primary storage node you are using goes down, App Engine will fail any update attempts you make until a suitable replacement takes over. To alleviate this issue, App Engine has recently added support for the same type of eventual consistency that SimpleDB has had all along. This move in the direction of SimpleDB gives App Engine apps the same ability as SimpleDB apps to run with strong consistency with option to fall back on eventual consistency to continue with a degraded level of service.

Apache CouchDB

Apache CouchDB is a document database where a self-contained document with metadata is the basic unit of data. CouchDB documents, like SimpleDB items, consist of a group of named fields. Each document has a unique ID in the same way that each SimpleDB item has a unique item name. CouchDB does not use a schema to define or validate documents. Different types of documents can be stored in the same database. For querying, CouchDB uses a system of JavaScript views and map-reduce. The loosely structured data in CouchDB documents is similar to SimpleDB data but does not place limits on the amount of data you can store in each document or on the size of the data fields.

CouchDB is an open-source product that you install and manage yourself. It allows distributed replication among peer servers and has full support for robust clustering. CouchDB was designed from the start to handle high levels of concurrency and to maintain high levels of availability. It seeks to solve many of the same problems as SimpleDB, but from the standpoint of an open-source product offering rather than a pay-as-you-go service.

Dynamo-Like Products

Amazon Dynamo is a data store used internally within Amazon that is not available to the public. Amazon has published information about Dynamo that includes design goals, run-time characteristics, and examples of how it is used. From the published information, we know that SimpleDB has some things in common with Dynamo, most notably the eventual consistency.

Since the publication of Dynamo information, a number of distributed key-value stores have been developed that are in the same vein as Dynamo. Three open-source products that fit into this category are Project Voldemort, Dynomite, and Cassandra. Each of these projects takes a different approach to the technology, but when you compare them to SimpleDB, they generally fall into the same category. They give you a chance to have highly available key-value access distributed across machines. You get more control over the servers and the implementation that comes with the maintenance cost of managing the setup and the machines. If you are looking for something in this class of data storage, SimpleDB is a likely touch-free hosted option, and these projects are hands-on self-hosted alternatives.

Compelling Use Cases for SimpleDB

SimpleDB is not a replacement for relational databases. You need to give careful consideration to the type of data storage solution that is appropriate for a given application. This section includes a discussion of some of the use cases that match up well with SimpleDB.

Web Services for Connected Systems

IT departments in the enterprise are tasked with delivering business value and support in an efficient way. In recent years, there has been movement toward both service orientation and cloud computing. One of the driving forces behind service orientation is a desire to make more effective use of existing applications. Simple Object Access Protocol (SOAP) has emerged as an important standard for message passing between these connected systems as a means of enabling forward compatibility. For new services deployed in the cloud, SimpleDB is a compelling data storage option.

Data transfer between EC2 instances and the SimpleDB endpoint in the same region is fast and free. The consistent speed and high availability of SimpleDB are helpful when defining a Service Level Agreement (SLA) between IT and business units. All this meshes with the ability of EC2 to scale out additional instances on demand.

Low-Usage Application

There are applications in the enterprise and on the open web that do not see a consistent heavy load. They can be low usage in general with periodic or seasonal spikes—for instance, at the end of the month or during the holidays. Sometimes there are few users at all times by design or simply by lack of popularity.

For these types of applications, it can be difficult to justify an entire database server for the one application. The typical answer in organizations with sufficient infrastructure is to host multiple databases on the same server. This can work well but may not be an option for small organizations or for individuals. Shared database hosting is available from hosting companies, but service levels are notoriously unpredictable. With SimpleDB, low-usage applications can run within the free tier of service while maintaining the ability to scale up to large request volumes when necessary. This can be an attractive option even when database-sharing options are available.

Clustered Databases Without the Time Sink

Clustering databases for scalability or for availability is no easy task. If you already have the heavy data access load or if you have the quantifiable need for uptime, it is obviously a task worth taking on. Moreover, if you already have the expertise to deploy and manage clusters of replicated databases, SimpleDB may not be something you need. However, if you do have the experience, you know many other things as well: you know the cost to roll the clusters into production, to roll out schema updates, and to handle outages. This information can actually make it easier to decide whether new applications will provide enough revenue or business value to merit the time and cost. You also have a great knowledge base to make comparisons between in-house solutions and SimpleDB for the features it provides.

You may have a real need for scalability or uptime but not the expertise. In this case, SimpleDB can enable you to outsource the potentially expensive ongoing database maintenance costs.

Dynamic Data Application

Rigid and highly structured data models serve as the foundation of many applications, while others need to be more dynamic. It is becoming much more important for new applications to include some sort of social component than it was in the past. Along with these social aspects, there are requirements to support various types of user input and customization, like tagging, voting, and sharing. Many types of social applications require community building, and can benefit from a platform, which allows data to be stored in new ways, without breaking the old data. Customer-facing applications, even those without a social component, need to be attentive to user feedback.

Whether it is dynamic data coming from users or dynamic changes made in response to user feedback, a flexible data store can enable faster innovation.

Amazon S3 Content Search

Amazon S3 has become a popular solution for storing web-accessible media files. Applications that deal with audio, video, or images can access the media files from EC2 with no transfer costs and allow end users to download or stream them on a large scale without needing to handle the additional load. When there are a large number of files in S3, and there is a need to search the content along various attributes, SimpleDB can be an excellent solution.

It is easy to store attributes in SimpleDB, along with pointers to where the media is stored in S3. SimpleDB creates an index for every attribute for quick searching. Different file types can have different attributes in the same SimpleDB domain. New file types or new attributes on existing file types can be added at any time without requiring existing records to be updated.

Empowering the Power Users

For a long time, databases have been just beyond the edge of what highly technical users can effectively reach. Many business analysts, managers, and information workers have technical aptitude but not the skills of a developer or DBA. These power users make use of tools like spreadsheet software and desktop databases to solve problems. Unfortunately, these tools work best on a single workstation, and attempts at sharing or concurrent use frequently cause difficulty and frustration; enterprise-capable database software requires a level of expertise and time commitment beyond what these users are willing to spend.

The flexibility and scalability of SimpleDB can be a great boon to a new class of applications designed for power users. SimpleDB itself still requires programming on the client and is not itself directly usable by power users. However, the ability to store data directly without a predefined schema and create queries is an enabling feature. For applications that seek to empower the power users, by creating simple, open-ended applications with dynamic capabilities, SimpleDB can make a great back end.

Existing AWS Customers

This chapter pointed out earlier the benefits of using EC2 for high-bandwidth applications. However, if you are already using one or more of the Amazon Web Services, SimpleDB can be a strong candidate for queryable data storage across a wide range of applications. Of course, running a relational database on an EC2 instance is also a viable and popular choice. Moreover, you would do well to consider both options. SimpleDB requires you to make certain trade-offs, but if the choices provide a net benefit to your application, you will have gained some great features from AWS that are difficult and time consuming to develop on your own.

Summary

Amazon SimpleDB is a web service that enables you to store semi-structured data within Amazon’s data centers. The service provides automatic, geographically diverse data replication and internal routing around failed storage nodes. It offers high availability and enables horizontal scalability. The service allows you to offload hardware maintenance and database management tasks.

You can use SimpleDB as a distributed key-value store using the GetAttributes, PutAttributes, and DeleteAttributes API calls. You also have the option to query for your data along any of its attributes using the Select API call. SimpleDB is not a relational database, so there are no joins, foreign keys, schema definitions, or relational constraints that you can specify. SimpleDB also has limited support for transactions, and updates propagate between replicas in the background. SimpleDB supports strong consistency, where read operations immediately reflect the results of all completed and eventual consistency, where storage nodes are updated asynchronously in the background.

The normal window of time for all storage nodes to reach consistency in the background is typically small. During a server or network failure, consistency may not be reached for longer periods of time, but eventually all updates will propagate. SimpleDB is best used by applications able to deal with eventual consistency and benefit from the ability to remain available in the midst of a failure.

A Developer's Guide to Amazon SimpleDB
A Developer's Guide to Amazon SimpleDB
By Mocky Habeeb
Published Feb 9, 2010 by Addison-Wesley Professional. Part of the Developer's Library series.
ISBN-10: 0-321-68597-0
ISBN-13: 978-0-321-68597-1

Additional Resources

Hands-on Tutorial for Getting Started with Amazon SimpleDB
Amazon Web Services: A Developer Primer



SQL etc Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
SQL etc Forum
Topic By Replies Updated
MySQL rollback UAL225 0 August 21st, 09:56 PM
Complex Search Query Galway 0 May 20th, 10:04 PM
change collation at once supercain 2 May 15th, 06:18 AM
SQL Features, tools and utilities question Neomite 1 April 10th, 09:13 AM