AWS DynamoDB at a high-level
Amazon Web Services’ DynamoDB is a beast, and if one wanted to write an ultimate guide to it, the post would be quite lengthy. Instead, I’ll cover some basic features in this article, and the more advanced stuff (e.g. DAX) will be covered in different posts.
1. What is DynamoDB?
DynamoDB is a fully managed, highly available NoSQL database, which provides a storage facility for document and key/value type data with single-digit millisecond latency. It’s secure, enables backups and restores, and has an in-memory caching solution.
1.1. Fully managed
DynamoDB is fully managed by AWS, i.e. we don’t have to worry about provisioning and managing the underlying server instances. AWS is responsible for the health of the infrastructure, and they update and patch the software on the database instance.
As such, we don’t have access to the instance, for example, we can’t SSH into it and can’t select the size of the instance.
1.2. Highly available
Data in DynamoDB are saved across multiple data centres and availability zones (AZs). AZs are geographically separated from each other within the same region, and this ensures that data can be retrieved in the unlikely event of an AZ going down.
DynamoDB is a highly available and durable database solution, which is confined to a region. One account can have two tables of the same name in different regions without any problems.
It’s possible though to have global tables, but this concept is beyond the scope of this post.
1.3. NoSQL database
DynamoDB is a NoSQL i.e. non-relational database. Rules for the records are nearly not as strict as in a relational (e.g. MySQL or Postgres) database.
DynamoDB is schemaless, which means that we don’t need to pre-define any the keys or data types (except one, see below). We can add new attributes on the fly, and the structure of each item doesn’t have to match.
This gives us a lot of flexibility, however, it has its downside as DynamoDB is not really suitable for complex queries like an RDS Postgres database.
1.4. Low latency
DynamoDB provides very low latency, and we are talking about single digit milliseconds (typically 5-6 ms) here. Latency depends on the type of read consistency (more on that below).
The good thing is that DynamoDB can sustain the low latency regardless of the size of the database at any scale. AWS automatically scales DynamoDB as the size grows, and the latency can be kept at a steady, low level.
1.5. JSON and HTTPS
DynamoDB supports documents using JSON. A valid item can look like this:
{
"FruitName": "Banana",
"QuantityInGrams": 100,
"Protein": 1.1,
"Calories": 89,
"Carbs": 22.9,
"Sugar": 12.2,
"Fiber": 2.6,
"Fat": 0.3
}
Data source: https://www.healthline.com/nutrition/foods/bananas#section1
Another item can contain additional attributes, and can lack some of the above ones.
DynamoDB uses HTTPS for data transfer.
2. Building blocks
Let’s go over the building blocks of a DynamoDB record.
2.1. Tables
The biggest unit in the database is the table. This is similar to relational databases, in the sense that they also have tables. The similar concept is called a collection in MongoDB. For example, the above entry may be found in a table called Fruits
.
We can have up to 256 tables per account in our DynamoDB database.
2.2. Items
In any table, we can have unlimited number of items with no size limit. For example, the following records can be two items in the Fruits
table:
{
"FruitName": "Banana",
"QuantityInGrams": 100,
"Protein": 1.1,
"Calories": 89,
"Carbs": 22.9,
"Sugar": 12.2,
"Fiber": 2.6,
"Fat": 0.3
},
{
"FruitName": "Orange",
"QuantityInGrams": 100,
"Protein": 0.9,
"Calories": 47,
"Carbs": 11.8,
"Sugar": 9.4,
"Fiber": 2.4,
"Fat": 0.1
},
Data source: https://www.healthline.com/nutrition/foods/bananas#section1, https://www.healthline.com/nutrition/foods/oranges#section1
Although these two items have the same keys, it’s not a requirement for the items to look exactly the same.
Items correspond to rows in relational databases or documents in MongoDB.
2.3. Attributes
The individual key entries are called attributes. FruitName
is an attribute of both items above. Attributes correspond to columns in relational databases and to fields in MongoDB.
The only compulsory attribute in an item is the primary key. For example, for the above records, the primary key can be FruitName
. When a table is created, the primary key must be defined.
The primary key uniquely identifies the item in the database, so no two items can have the same primary key. The value of the primary key will determine the partition inside DynamoDB (not visible or accessible to us) where the item is stored.
It’s also possible to create a secondary or sorting key, which together with the primary key make a composite primary key.
When a sorting key is defined, it’s possible that two items have the same primary key. In this case, these items are stored in the same partition, and are sorted by the sorting key (hence the name).
In the picture above, the primary key is FruitName
, and the sorting key is FruitType
. In the case of bananas, it’s possible to have two or more bananas on the list. I can easily imagine that different types of bananas have different nutrition data (Lady Finger
usually contains more sugar than Cavendish
):
{
"FruitName": "Banana",
"FrutiType": "Cavendish",
...
},
{
"FruitName": "Banana",
"FrutiType": "Lady Finger",
...
},
When the items are queried, quering for Banana
as FruitName
won’t be enough; the return value of this query will include both the Cavendish
and the Lady Finger
type. If we want to return only one item, we’ll need to provide the sorting key as well.
3. Consistency
When the Fruits
(or any other) table is queried, we make a read operation. But, as mentioned above, DynamoDB replicates data across multiple AZs within a region, and saving data to multiple locations takes time.
Replication usually takes less than 1 second. But it can happen that a recently written item is queried (read) straight away. If the item hasn’t been replicated across the AZs yet, we might not get the same item we have just written to the database, which can be a problem.
Because of way DynamoDB works, AWS supports two types of read consistencies, and one of them brings the solution to the end-of-the-world issue from the last paragraph.
3.1. Eventual consistency
This is the default read type. It means that the queried item might not reflect the latest completed write operation. This is the sad situation I wrote about in more detail above.
In this case, data will eventually be written to the table, usually within 1 second.
3.2. Strong consistency
It’s possible to set up the read operations in strong consistency mode, when the read operation reflects the most recent completed write operation.
With strong consistency, we’ll wait for DynamoDB to replicate data across all AZs, and only then will the queried item be returned. This read type guarantees that fast hands will get their data back in the exact same way as it was written to the database.
The trade-off is that strong consistent reads result in higher latencies (because DynamoDB needs to check if data is consistent across the AZs). Due to the nature of data replication, strong consistency read type might not be available in case of a network issue.
4. Capacity mode
Read and write operations are wrapped in capacity units and requests, which represent a pre-defined size of data (usually up to 4 kB in item size for read and 1 kB for write operation), and they can be used in different capacity modes.
When creating a DynamoDB table, we can choose between provisioned and on-demand capacity modes.
4.1. Provisioned capacity
This is the default (see the picture above) setting.
When this mode is selected, we plan the read and write capacities our application needs ahead of time. By default, it’s 5-5 read and write capacity units, but this can be changed at table creation, or even on the fly. The way the capacity units are calculated is beyond the high-level approach I strive to follow in this post, but if you are interested, you can read more about it here.
The capacity units are chargeable even if they are not used, so careful planning is definitely needed. The provisioned capacity mode is suitable for application with consistent and predictable traffic. AWS free tier applies, and free stuff is always good.
4.2. On-demand capacity
Here we pay for what we use, so no planning is necessary. This capacity mode costs more, but in return we don’t have to pay for unused capacity.
On-demand capacity suits new tables with unknown workload or tables with inconsistent traffic.
5. Transaction mode
Similarly to relational databases, DynamoDB provides a way to make read and write transactions. We can group multiple (up to 10) Get
, Put
, Update
or Delete
operations together into one unit, and submit them as one transaction.
It’s an all-or-nothing type of operation. If one operation in the group is unsuccessful, the whole transaction will also be unsuccessful.
Transactions themselves don’t cost anything, but the downside is that they consume two read/write operations: one for preparing the transaction, and one for actually perform it.
This page has more detail about transactions, and how they work.
6. Security
DynamoDB offers encryption at rest, which means that data are encrypted in the database. We have to enable it when a table is created.
DynamoDB uses the Key Management Service (KMS) to encrypt data. By default, DynamoDB manages the key with which data are encrypted, but it’s also possible to create a custom managed key in KMS, and use that for encryption (extra charges apply).
7. Backups
If the high availability (data replication across multiple AZs) is not enough, more paranoid users can create backups of their DynamoDB tables.
Only complete tables can be backed up, parts of the table can’t. Creating backups don’t use any capacity units; neither does restoring data from backups. The data to be backed up is always restored into a new table.
8. Conclusion
DynamoDB is a non-relational database that supports document and key/value type data.
It can serve millions of users with extremely low latency. It’s highly available, and scales without limits.
Typical use cases are storing web sessions and JSON documents, it can be used in gaming applications (where a large amount of data needs to be handled with low latency), and DynamoDB is a very important building block for serverless architecture and mobile backends.
Thanks for reading, and see you next time.