|Cassandra 4 Improves Performance
|Written by Kay Ewbank
|Tuesday, 03 August 2021
Version 4.0 of Apache Cassandra, the open source NoSQL distributed database, has finally arrived with improvements including much faster performance and a promise that this release is the most stable yet.
Apache Cassandra handles massive amounts of data across load-intensive applications with high availability and no single point of failure.
High-profile users include Apple, which has over 160,000 instances storing over 100 petabytes of data across 1,000+ clusters, Huawei and Netflix, where Cassandra handles over 1 trillion requests per day. Cassandra was developed by Facebook in 2008, and became an Apache Top-Level Project in 2010.
The improvements to the new version start with increased speed and scalability. Cassandra 4 streams data up to five times faster during scaling operations, and is up to 25% faster throughput on reads and writes.
Consistency is better; the new release keeps data replicas in sync to optimize incremental repair. The security and manageability is also improved with better audit logging that tracks users' access and activity, and new capture and replay to ensure regulatory and security compliance.
Latency has been improved through work on the garbage collector to reduce pause times as heap sizes increase, and compression is also more efficient, which improves read performance.
The last major release of Cassandra was in 2015 when version 3.0 was released. There have been 3.x releases since then, but the project team say the long gap till 4.0 is because they decided to become uncompromising on one important feature: quality. The intention is to avoid the situation where x.0 releases are avoided because of quality issues.
The Cassandra team says that the scale that Cassandra clusters can reach means that there is an enormous surface area for potential bugs or data corruption, so they put in place tools including property-based / fuzz testing, replay testing, performance testing and fault injection to ensure this and future releases maintain a high level of quality and correctness. The testing resulted in over 1,000 bugs being identified and fixed, many of which were only found in the largest scale production workloads.
The aim was to have Cassandra 4.0 at a state at release where major users would run it in production. Cassandra 4.0 is already running in production today at Apple, DataStax, Instaclustr, Netflix, Orange, Pythian, Sky UK, and Yelp.
or email your comment to: email@example.com