DynamoDB: A Short History

DynamoDB is one of the most capable NoSQL databases on the planet. As such, it has found its way into several of the core services at Amazon as well as those of the many customers on AWS. Over the years, it has grown into one of Amazon’s most important services. So much so that the online retail giant can no longer function without it.

DynamoDB wasn’t always the super hero database that it is now and it wasn’t always around to save the day.

Amazon was eighteen years old before the release of DynamoDB. Ever since, the database has seen steady improvement, learning new tricks almost every month. In future posts we will talk about all of the features that make DynamoDB great. For now, lets focus on how it got there.

2004

2004 was a big year for Amazon. The company had just turned ten and was stronger than ever. It was experiencing record sales and that holiday season was no exception.

Shoppers where out in full force. As people were flooding online to get their last minute gifts, Amazon was experiencing punishingly heavy traffic and was having difficulty keeping up.

Finally, on Monday, December 6th, the site reached its breaking point. Early that morning, Amazon had experienced outages that lasted several hours. Roughly twenty percent of users were not able to reach the site while the rest were faced with slowdowns up to 60 seconds.

Going down for a few hours might not seem like much however, if you are Amazon, going down for even a few minutes during the most important time of the year can mean millions in lost revenue. They needed to do something.

Dynamo

In a post-mortem of the December outages, engineers at Amazon were able to trace back the cause to scaling commercial technologies beyond their capabilities. The site had grown so popular and the scale of its operations was so large that no commercial technology was spec’d to support it.

It was time for an in-house solution: Dynamo.

Dynamo was a precursor to the DynamoDB that we know today. It was born out of necessity as Amazon needed a database to support the incredible traffic that the site was generating.

Two core services were the main focus as Dynamo was being developed: the shopping cart and the session service. These were particularly important because any downtime would directly impact Amazon’s finances. Many meetings later, the engineers and managers at Amazon decided that the database would need to satisfy a few critical requirements.

Scalability: As more and more people turned to online shopping, the site would continue to grow. With an eye on the future, they wanted a database that could scale to any size. Not only that but it needed to be able to scale incrementally up and down to support the ebb and flow traffic throughout the year.
Availability: Since core services would be relying on this database, it was critical that it stayed online no matter what.
Predictability: The shopping cart and session service are user-facing which means customers on the site can feel when they are running slower than normal. To account for this, the database needed to have predictable performance at the median as well as the 99.9th percentile.

Any one of these goals would be tough for a new database to get right; all three together should be nearly impossible. Regardless, the researchers and engineers at Amazon nailed it and Dynamo was born.

A little while later, in 2007, Dynamo would be introduced to the world in a paper that is now famously known as The Dynamo Paper. You can download and read the full pdf here. In future posts we will break down and explore its individual components so that we can understand and appreciate all that went into the making of Dynamo.

Due to the careful planning and demanding requirements, Dynamo was an incredibly useful tool. Many of the ideas published in The Dynamo Paper were incorporated into the design of future databases including Cassandra, Voldemort, and Riak. Despite adoption into other databases, Dynamo itself would not receive much adoption within Amazon. There were a few reasons for this and after receiving feedback from many of his engineers, Amazon CTO Werner Vogels summarized them as follows

Dynamo gave them a system that met their reliability, performance, and scalability needs, it did nothing to reduce the operational complexity of running large database systems. Since they were responsible for running their own Dynamo installations, they had to become experts on the various components running in multiple data centers. Also, they needed to make complex tradeoff decisions between consistency, performance, and reliability. This operational complexity was a barrier that kept them from adopting Dynamo.

Even though Dynamo was more than capable, it was too complicated to manage. Many engineers within Amazon realized this and decided to go in a different direction.

SimpleDB

SimpleDB was another NoSQL database created by Amazon however, unlike Dynamo, it was a service. This meant that engineers could get the benefits of using a NoSQL database and they didn’t need to suffer the headache of managing a distributed system. For many, this was a no-brainer.

SimpleDB had many things going for it. One of which was a table interface. Tables where more intuitive for most engineers at the time so this, combined with a restricted query language, was favored over a simple key/value store. It also provided a flexible data model which meant there was no need to worry about schema updates. As a service, SimpleDB could provide operational benefits which would otherwise be complex to set up and maintain such as multi-data center replication and high availability.

Overall, SimpleDB was a rock-solid choice. There were just a couple of downsides.

Scalability: SimpleDB required users to store their data in containers called Domains. The issue was each domain had a relatively small storage capacity of just 10 GB as well as a finite request throughput. To achieve scalability, engineers were required to partition their data across several domains and manage what goes where on their own.
Predictability: To keep thing simple, SimpleDB would index all attributes of every item you stored in a domain. This provided a flexible query system but it came at the cost of performance predictability. As datasets grew, the working set of indexes would also grow and in many cases, they become so large that they no longer fit in memory.

These were pretty big tradeoffs, but most people just worked around them to get the benefits of using a service. Surely there had to be a better option.

DynamoDB

If we look at the pros of Dynamo against the cons of SimpleDB, its striking how much they overlap; scalability and predictability appear in both. So, can we take the best parts of each database and combine them?

The answer is DynamoDB.

DynamoDB is a managed service that is scalable to any size with predictable performance.

Over the years, development on DynamoDB has continued. Today, it is bristling with features like DynamoDB Steams, for handling change events, auto-scaling, TTL support, completely transparent encryption at rest, item and attribute level access control, nearly instantaneous backups, an in-memory caching layer called DAX, global replication, local secondary indexes, global secondary indexes and more.

We will be going over each of these features and how to get the most bang for your buck with DynamoDB in future posts.

Make sure to subscribe for all of the updates!

Let me know in the comments if you liked this content or if you have any questions.

Thomas Lee

August 1st, 2018

< PREV POST NEXT POST >

Facebook Twitter Google+Mail