Apache Arrow Adds DataFusion Rust-Native Engine
Written by Kay Ewbank   
Tuesday, 16 April 2019

Apache Arrow has been updated with the addition of the DataFusion Rust-Native query engine for the Arrow columnar format.

Apache Arrow is a columnar in-memory analytics layer the permits random access. It is language independent, can be used for flat and hierarchical data, and the data store is organized for efficient analytic operations. It also provides computational libraries. Languages currently supported are C, C++, C#, Go, Java, JavaScript, MATLAB, Python, R, Ruby, and Rust.

asf logo

Arrow isn’t a standalone piece of software. It is used as a component within systems to accelerate analytics and to allow Arrow-enabled systems to exchange data with low overhead. It is sufficiently flexible to support most complex data models.

The new Rust-Native query engine has been included after DataFusion was donated to the Apache Arrow project. DataFusion supports SQL queries against iterators of RecordBatch and has support for CSV files. There are plans to add support for Parquet files.

At the moment the SQL support is limited to Select, Where, and simple aggregates in the for of Min, Max and Sum with an optional Group By clause. Supported expressions are identifiers, literals, simple math operations (+, -, *, /), binary expressions (And and Or), equality and comparison operators (=, !=, <, <=, >=, >), and Cast.

The developers of Arrow say that in this current release they have made significant progress on Arrow Flight, an Arrow-native data messaging framework. Flight now has integration tests to check C++ and Java compatibility, and Python bindings have been added for the C++ library. Flight is designed to overcome the problem that Apache Arrow's primary medium is in-memory data, but not all systems can be co-located. Arrow needs an RPC layer, and that's what Apache Flight adds. 

Flight provides stream management. Data is handled as 'flights' that are a stream of Arrow record batches that you can interact with using Get Stream and Put Stream methods. Flight also supports a simple Generic Messaging Framework. Arrow Flight Clients can be written without knowledge of the internals of the daya handling, or developers could simply use existing JSON tooling on top of the generic Flight API.

 asf logo

 

More Information

Apache Arrow Website

Related Articles

Apache Arrow Adds Streaming Binary Format

Databricks Delta Adds Faster Parquet Import

Apache Kudu 1.9 Adds Location Awareness

Apache Kudu Improves Web Interface 

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on, Twitter, Facebook or Linkedin.

Banner


New Emojis Announced For World Emoji Day
21/07/2020

Apple, Facebook and Google have all shown off new emojis to mark World Emoji day. The full list was announced back at the beginning of the year, but the companies have now shown off the way they'll lo [ ... ]



The Bit Player - Shannon Bio Pic On Amazon
26/07/2020

You probably know who Claude Shannon was and what he did, but apparently he is little known in wider world. Hence a new movie "The Bit Player" which presents him to a wider audience is welcome - no ma [ ... ]


More News

graphics

 



 

Comments




or email your comment to: comments@i-programmer.info