|RocksDB on Steroids|
|Written by Kay Ewbank|
|Thursday, 30 April 2015|
Yahoo researchers have made changes to RocksDB that, they claim, has put it on steroids.
RocksDB is an embeddable persistent key-value store for fast storage based on Google’s LevelDB and open-sourced by Facebook in 2013.
It is also the library that is used by Sherpa, Yahoo’s cloud Key-Value store that in turn is used for content personalization systems to:
“optimize your experience by ranking the available content by relevance for you”.
When you look at the Yahoo home page or one of Yahoo’s digital magazines, what you see depends on what you’ve read in the past. The system is based on a huge persistent map, the Sherpa key-value (KV) store that associates user ids (keys) with their properties (values).
Sherpa supports over a billion users, and runs on thousands of nodes spread across multiple, geo-distributed datacenters. Local KV-stores are used to power a single node, and Sherpa uses the open-source RocksDB library for this.
Obviously, the faster the KV can deal with reads and writes, the happier users are, so in an attempt to improve the performance of local KV-stores, the researchers set out to scale them up on modern multi-core hardware. KV-stores such as RocksDB have a log-structured-merge (LSM) design that is optimized for write-heavy workloads. The LSM buffer writes to a fast in-memory segment, and carries out batch I/O to a collection of on-disk segments. When the in-memory segment fills up, it is flushed to disk, either by being merged with an existing disk segment or by creating a new one.
Reads look for the latest version of the key in the in-memory segment, and if it is not found, also search the on-disk segments. According to the researchers, RocksDB has become so efficient at optimizing I/O speed that in many applications, its in-memory operations have become the performance bottleneck. On Yahoo Labs the researchers say that nowadays:
“the data access rate is usually limited by the speed of reads and writes to RAM. Today’s most popular KV-stores were designed for hardware with a relatively small number of cores, which were common in data-serving farms until recently. We posited that pushing the envelope of KV-store serving rates would involve harnessing more cores, and allowing read and write operations to execute concurrently on the in-memory data structure. This is where the real fun started! “
The problem with attempting to scale KV-stores on multi-core hardware is that of synchronization bottlenecks when the data in memory is synchronized with that on disk, and redundant disk elements are merged. By using new research on multiprocessor-friendly lock-free data structures, the Yahoo team thought they could improve hardware utilization and increase throughput and latency. The research work in this area has until now focused on applying lock-free data structures to memory-intensive applications. Instead, the Yahoo researchers concentrated on attempting to optimize the big-data platforms, combining RAM access with disk I/O. They say they have infused lock-free data structures into the RocksDB internals, adding that this was nontrivial due to the library’s rich API (get/put/snapshot scan/atomic read-modify-write) and the need to coordinate the memory and disk accesses.
The work was carried out jointly by Guy Gueta, Eddie Bortnikov, Eshcar Hillel, and Idit Keidar at Yahoo Labs and many of the details of the algorithm they developed, called cLSM (concurrent LSM), are included in a research paper that was recently presented at the EuroSys conference and now available to download as a pdf.
The results of the research are that cLSM outperforms the state-of-the-art LSM implementations (including RocksDB and LevelDB), improving throughput by 1.5x to 2.5x. It demonstrates superior scalability with the number of cores, successfully exploiting twice as many as RocksDB could scale up to previously.
To be informed about new articles on I Programmer, install the I Programmer Toolbar, subscribe to the RSS feed, follow us on, Twitter, Facebook, Google+ or Linkedin, or sign up for our weekly newsletter.
or email your comment to: firstname.lastname@example.org
|Last Updated ( Thursday, 30 April 2015 )|