|Apache DataSketches Reaches Top Level Status|
|Written by Kay Ewbank|
|Thursday, 11 February 2021|
Apache DataSketches has reached top-level project status. The data analysis software was originally developed at Yahoo, and has been an Apache incubator project for the last two years.
DataSketches is an open source, high-performance library of stochastic streaming algorithms commonly called "sketches" in the data sciences. Sketches are small, stateful programs that process massive data as a stream and can provide approximate answers, with mathematical guarantees, to computationally difficult queries orders-of-magnitude faster than traditional, exact methods.
The developers of DataSketches say such sketches are important for any system that needs to extract useful information from big data, and that sketches should be tightly integrated into the analysis capabilities of such systems. Sketches implement algorithms that can extract information from a stream of data in a single pass, aka “one-touch” processing
The DataSketches technology has helped Yahoo (Verizon Media) successfully reduce data processing times from days or hours to minutes or seconds on a number of its internal platforms. The DataSketches project is dedicated to providing a broad selection of sketch algorithms of production quality.
The usefulness of sketches comes down to the fact that businesses don't always need answers that are pinpoint accurate. If an approximate answer is acceptable, then sketches algorithms allow you to answer these queries orders-of-magnitude faster, with much lower resource utilization.
Instead of requiring the data analysis system to keep enormous data on-hand, sketches have small data structures that are usually kilobytes in size. Sketches are also streaming algorithms, in that they only need to see each incoming item once.
The DataSketches library has been specifically designed for production systems that must process massive data. It includes adaptors for Apache Hive, Apache Pig, and PostgreSQL (C++), and these adaptors are designed to provide examples for adaptors for other systems. The sketches in this library are also designed to have compatible binary representations across languages (Java, C++, Python) and platforms.
or email your comment to: firstname.lastname@example.org
|Last Updated ( Thursday, 11 February 2021 )|