|Asterix DB – Big Data Management System|
|Written by Kay Ewbank|
|Wednesday, 17 December 2014|
A new database designed specifically for managing semi-structured information has been made available in beta form.
AsterixDB Big Data Management System (BDMS) has been developed over the last four years by researchers at UC Irvine, UC Riverside, and UC San Diego. The project has been sponsored by the National Science Foundation (NSF), and is designed for ingesting, storing, managing, indexing, querying, and analyzing vast quantities of semi-structured information.
The researchers have taken ideas from three distinct areas—semi-structured data, parallel databases, and data-intensive computing (a.k.a. today’s Big Data platforms), and put them together to create what the developers describe as “a next-generation, open-source software platform that scales by running on large, shared-nothing commodity computing clusters”.
The semi-structured information that the project is aimed at managing can be anything from data that is well-typed and highly regular, to more irregular data where the data values may be textual, and the ultimate schema for the various data types involved may be hard to anticipate up front.
The team has been concentrating on solutions to the problems that big data sets give rise to, such as the need for highly scalable data storage and indexing. It has also been researching semi-structured query processing on very large clusters. Another area of research has been how to combine parallel database techniques with modern data-intensive computing techniques in the hope of finding solutions to the problem of storing and analyzing semi-structured information effectively.
The team has now released a beta version of the AsterixDB system that encapsulates their research.
AsterixDB has a semistructured NoSQL style data model (ADM) resulting from extending JSON with object database ideas. It offers basic transactional capabilities for concurrency and recovery that are akin to those of a NoSQL store.
The query language (AQL) is described as expressive and declarative, and it supports a broad range of queries and analysis over semi-structured data. Queries can access externally stored data (e.g., data in HDFS) as well as data stored natively by AsterixDB.
The parallel runtime query execution engine, Hyracks, has been scale-tested on up to 1000+ cores and 500+ disks. AsteriskDB also supports partitioned LSM-based data storage and indexing. This is designed to enable efficient ingestion and management of semi-structured data. Secondary indexing options include B+ trees, R trees, and inverted keyword (exact and fuzzy) index types, and you can also create fuzzy and spatial queries. The data types supported include spatial and temporal data in addition to integer, floating point, and textual.
Writing about the beta, the researchers say they “are hoping that the arrival of AsterixDB will mark the beginning of the ‘BDMS era’”. They hope AsterixDB will be useful for a much broader class of problems than can be addressed with any one of today’s current Big Data platforms and related technologies such as Hadoop, Pig, Hive, HBase, MongoDB, and so on. That’s quite a big ambition, and it’ll be interesting to see whether they succeed.
To be informed about new articles on I Programmer, install the I Programmer Toolbar, subscribe to the RSS feed, follow us on, Twitter, Facebook, Google+ or Linkedin, or sign up for our weekly newsletter.
or email your comment to: firstname.lastname@example.org
|Last Updated ( Wednesday, 17 December 2014 )|