Companies are trying hard to succeed at building large-scale, distributed systems-based scalable databases. Let’s talk about one of the most powerful databases, Amazon DynamoDB and how it compares with the best of breed open-source database Apache Cassandra. In this article, we will compare two database systems and help with pointers that one should keep in mind while deciding on which database to use for your applications.
DynamoDB and Apache Cassandra are both well known distributed data store technologies. Both are utilised in numerous applications and have proven their efficiency at an unprecedented scale. Both databases present the ability to manage data without a specific column schema. The concept of tables still exists, but there’s no requirement for a specific set of columns as you might find with a MySQL or SQL server.
Apache Cassandra is open-source software, governed by the Apache Software Foundation. Given its open-source nature, it can run in any cloud or on-premise environment. Plus, it is developed and maintained for the benefit of the community. Even though it is open-source, if developers choose to run Cassandra cluster by themselves, there can be significant operational overhead and challenges related to deployments, updates, patches. The great news is that companies and teams can buy managed Cassandra clusters also. Overall, Cassandra can handle large volumes of data across distributed and decentralised servers. Due to a vast community and support for clusters spanning multiple data centers and cloud, this ensures a zero point of failure in most cases.
NoSQL databases are designed for scale, but their architectures are sophisticated, and DynamoDB is a managed NoSQL database service provided by Amazon Web Services. As it is managed by Amazon, users do not have to worry about operations such as hardware provisioning, configuration and scaling. DynamoDB is super easy and flexible for developers if they need a key-value store with a dynamic schema and no infrastructure.
Unlike Cassandra instance provisioning, DynamoDB provisioning is not fixed, and it’s through auto-scaling that helps to put a check on the database resources. When the complexity of maintaining a highly scalable distributed NoSQL database is taken care of, it enables developers to focus on building applications rather than managing infrastructure. But DynamoDB is only available in AWS and nowhere else. It also is super straightforward to integrate with AWS Lambdas and API Gateway.
However, it is possible that it can lock your application to the broader AWS ecosystem.
Cassandra and DynamoDB both origin from the same paper: Dynamo: Amazon’s Highly Available Key-value store. A chunk of the differences between Cassandra & Dynamo stems from the fact that the data-model of Dynamo is a key-value store. At the same time, Cassandra is designed as a column-family data store.
Dynamo’s data model is quite simple, represented by binary objects identified by a key. If the application is of unstructured data, DynamoDB would be a better choice. Quite the opposite is the case with Cassandra which was primarily developed for structured data when it has a multi-dimensional sorted map. Cassandra allows applications to access the data using multiple attributes.
Both data storage systems provide similar functionality, but they handle data storage differently. This causes differences in the way data is managed, stored, and distributed across the two database systems. These two databases approach their unspecified columns in a different manner, even though both are considered as wide column systems. Cassandra is considered a wide-column store, which manages data in column families. Rows in a wide-column database don’t need to have the same columns, enabling developers to dynamically add and remove new columns without impacting the underlying table.
In comparison, DynamoDB enables users to store dynamic data. It stores the data in JSON, utilising document-based storage. Instead of storing columns separately, DynamoDB stores all of them together in one document. DynamoDB’s database local persistent store is a pluggable system, where you can select storage depending upon the application use.
While DynamoDB’s pricing is complex, a managed Cassandra pricing is simple to determine, and as you scale out, you can anticipate observing the average cost per node drop. On the other hand, DynamoDB’s cost structure involves a range of variables, from the network, to read and write throughput and storage. For several use cases, Apache Cassandra can allow a significant cost saving over DynamoDB, especially in case of workloads which are write-heavy. In Cassandra, writes are cheaper than reads.
Developers are continuously adding new features that often need changing an application’s underlying database. In terms of Developer’s learning curve, querying data can be done with SQL like language for very straightforward data storage/retrieval requirements. On DynamoDB, Querying data can be done with a proprietary API from AWS.
The pros and cons of a database engine for your business will likely depend upon your dev team and the applications you use. But, if you’re already using the AWS stack and need a NoSQL database, then you should first review what DynamoDB has to offer and how well it works for your use case. The biggest question is whether you want to set up, maintain and monitor your own cluster or will AWS do that for you.