|Cloudera Extends Apache HBase To Use Amazon S3|
|Written by Kay Ewbank|
|Friday, 04 October 2019|
Cloudera has updated Cloudera Data Platform to provide a way for Apache HBase deployments to use Amazon Simple Storage Service (S3) as its main persistence layer for saving table data.
The advantage this offers is that Amazon S3 uses a pay-per-use payment method with no server-side component to run or manage for S3. Cloudera Data Platform (CDP) is described as combining the best of Hortonworks' and Cloudera's technologies to create an enterprise data cloud that includes cloud-native services for data warehousing, machine learning, streaming ingest, and operational data stores.
Apache HBase is Hadoop's open-source, distributed, versioned, non-relational database, modeled after Google's BigTable, which offers random, realtime read/write access to big data. Apache's goal for this project is for it to host very large tables -- billions of rows X millions of columns -- on top clusters of commodity hardware.
Amazon Simple Storage Service (S3) is designed to offer secure, durable, highly scalable object storage at a low cost.
Until now, it's not been possible to use S3 directly from HBase because HBase requires a consistent and atomic file system, whereas S3 provides an eventually consistent object store. This means that HBase has been limited to using HDFS rather than being able to natively use S3. Cloudera has now created a solution that is being offered via CDP. When you launch an Operational Database (HBase) cluster on CDP, HBase StoreFiles (the backing files for HBase tables) are stored in S3 and HBase write-ahead-logs (WAL) are stored in an HDFS instance run alongside HBase per usual.
Under the covers, this relies on using the Hadoop S3A filesystem adapter which accesses data in S3 via the standard FileSystem APIs. Hadoop's S3Guard is also used for directory listing and object status for the S3A adapter so that HBase sees when new StoreFiles are added to an HBase table.
The new element is the HBase Object Store Semantics (HBOSS), a new software project that has been added to the Apache HBase project to handle the gap between S3Guard and HBase. HBOSS is a facade on top of the S3A adapter and S3Guard which uses a distributed lock to ensure that HBase operations can atomically manipulate its files on S3.
or email your comment to: email@example.com
|Last Updated ( Friday, 04 October 2019 )|