Let’s start by assuming your architecture is on Amazon Web Services (AWS), and you need a fast key-value DB. You do a cursory search and find the super simple DynamoDB and the great performing Aerospike. Then you check out some benchmarks and feature set comparisons, and you start to get an idea of which is best for your business or project. However, once you’re up and running, you discover certain shortcomings or pitfalls of the technology you chose and wonder why no one noted them in at least one of the many articles you read. It could have saved you a lot of time and energy, right?
At Eyeview, we’ve used DynamoDB for three years and Aerospike for one. In that time, we’ve explored, tested and tried a number of capabilities with both technologies. Below we will share our findings and offer an in-depth comparison of the two databases to help you out when making your decision.
As an AWS Service, DynamoDB says it has “unlimited scale;” this assertion has been pretty true in our experience. Also, setting up auto-scaling using a third-party library like dynamic-dynamo makes it even easier.
However, potential users should know about the DynamoDB sharding model. The throughput capacity you’re buying is divided between all of your table’s shards. The call of what records are in which shards is decided by the key of the record. To make sure things run smoothly, a best practice is to have a relatively distributed key set across writes and reads. If your table is not modeled “right,” expect to get a lot of throughput issues even if you do not reach your provisioned throughput capacity.
A common solution if you have hot keys (e.g. a time series and you are always hitting today) is to add a predefined set suffix to the key, such as a number from 1–200. These shortcomings prove that the scalability of DynamoDB is not coming free and has quite a few traps.
Although DynamoDB will make you think about “hot-shards,” Aerospike will only have hot-keys issues with increased latencies. Other than that, you should probably over-provision your Aerospike cluster so you won’t meet the throughput/network limits. Having said that, Aerospike’s throughput is pretty impressive and scaling out Aerospike is pretty easy. Adding a node is the only action you will need to take to linearly scale your DB. You should keep in mind the migrations that are happening when you are adding a node, but unlike other databases, Aerospike balances migrations versus normal usage pretty well.
DynamoDB allegedly requires no maintenance. But what if you don’t want to keep your data forever? If you are like us, your data is only relevant for a certain period of time until its usefulness expires. Data regarding cookies or device IDs is usually relevant for 1-3 months, and therefore, deleting data is necessary.
TTL is the ideal feature for that kind of structure, but DynamoDB does not support TTL. The best practice for that scenario is to have period-level DynamoDB tables, such as “MyTable_August”, “MyTable_September”, and so on. In this setup, writes are obviously done only on the current month’s table while reads, such as getting all of your data for a certain cookie from the past 60 days, are going to all tables within the period. Once a table is no longer being read, you can drop it entirely. If you haven’t done that, deletion could be done through scanning, but that will be expensive and not recommended. In general, scanning, exporting and analyzing the data within the table are all extremely hard operations.
On the other hand, Aerospike needs a more classic style of maintenance. Unlike DynamoDB, EC2 instances probably need to go down for maintenance. Instance-level failures are much more likely to happen than a shutdown of an AWS-managed service like DynamoDB. You would also probably need to upgrade versions for bug fixes and enhanced capabilities, which are transparently given in DynamoDB.
The most common and maintenance-heavy issue we ever had with an Aerospike cluster is a “split brain” (network partitioning). In some cases, a single node or more decides that it is on a different cluster and creates data inconsistency. This actually happened more often than we would expect, and the Aerospike team concluded as a cause of an unstable EC2 network. This is also the reason why a cluster should be hosted in a single zone as the split-brain issue happens frequently if the cluster is spread between Availability Zones.
DynamoDB is automatically replicated, and you do not need to worry about an Availability Zone failure. DynamoDB is a region-level service, so you only need to implement replication if you want to get protected from a full region failure or in the case you actually need the data on the other region. Nowadays, DynamoDB update streams allow you to do cross-region replication fairly easy.
Using Aerospike, you need to have four copies of the data to have a real, reliable solution with Aerospike on AWS. It is recommended that an Aerospike single cluster be deployed on a single EC2 Availability Zone. Since you don’t want to lose data on the cluster, you would probably have a replication factor of at least 2.
But what if a zone goes down? Well, you should probably plan for that and have an additional (hot or cold) cluster in another zone, which, again, will have a replication factor of 2. This is an important piece as we get to the cost section.
Once you have two clusters, you will need to keep them in sync. If you have the enterprise solution for Aerospike, you can use XDR. Otherwise, you will need to find your own solution.
In our workloads, we mostly care about read latency. Building a Real-Time Bidder requires us to respond to auctions in less than 100ms, including network latency. When time is the most important factor, DynamoDB doesn’t fare very well for us with average reads of 8ms, and the slowest 90 percent are 42ms.
An important note around our DynamoDB reads is that we are using “query” rather than “get” as we are trying to get only data in a specific date range, so this might affect our timing compared to others using “get”.
Read performance is clearly Aerospike’s strength. Average reads are 2ms and even the slowest 90 percent are 14ms — both much faster than DynamoDB. And although it doesn’t appear here, updates are also faster in Aerospike (1–2ms versus DynamoDB’s 4–5ms).
It’s all about the money, isn’t it? Not always, but money should be an important part of your decision — it definitely was for us.
To get to the root of the cost equation, you need to ask yourself one question: Do I want my costs driven by storage or by reads/writes?
Starting up with DynamoDB is perfect – you only pay per usage unlike setting up servers and installing any self-managed database. If you are just starting up and have no traffic/data, DynamoDB is a great decision.
Otherwise DynamoDB costs are mostly driven by reads and writes (mostly writes). If you have a strong correlation between writes and your business’ revenue, DynamoDB could be a great fit.
Writes could become cost-prohibitively expensive. Because of the tremendous costs at Eyeview’s scale, we manipulated our DynamoDB data model and the application stack to split our data and fetch only what is absolutely required at every point of our application. This actually was a pretty big issue for us as it increased our total application latency.
Aerospike’s cost is driven mostly by storage though you could reach throughput/network limits. For us, having more than 20TB of data and 250K reads per second, storage is still the bottleneck that will force us to increase our cluster.
If you do not have the correlation between writes and the business revenue, Aerospike will probably be a more cost-efficient solution. But keep in mind that having four copies of the data is making it a real consideration. My advice is to do your math at the outset for both Aerospike and DynamoDB.
Winner: DynamoDB (if you’re starting) but as you scale — Aerospike (By a mile!)
I must emphasize that going with DynamoDB is the easiest decision you will make as you are starting up with fast key-value data stores. It is easy to set up, pricing is only according to demand and you do not need a lot of maintenance. As your usage grows, you might hit the cost factor hard and you should probably consider Aerospike more seriously. Yes, maintenance will require some effort from your infrastructure team, and you will hit issues around cluster stability, network and DR planning. Even with those early hiccups, we switched to Aerospike a year ago. It feels like the right decision for our business. Hopefully after this article, you have a better idea of what is right for your business.