TileDB Improves Sparse Array Support
Written by Kay Ewbank   
Monday, 25 May 2020

Designed to give data scientists a more powerful way to store, update, analyze, and share large sets of data, TileDB lets you model data as either a dense or a sparse multi-dimensional array. New features in version 2 include support for heterogeneous dimensions and string dimensions in sparse arrays.

TileDB consists of a multi-dimensional array data format, a fast embeddable, open-source C++ storage engine with data science tooling integrations, and its own propriatary cloud service, TileDB Cloud, which was launched in November 2019. So while the Developer Edition of the database engine is free, you pay as you go for compute time. There is also TileDB Enterprise which is the same as TileDB Cloud delivered as licensable software with support. TileDB Enterprise can be deployed on-premises or on a private cloud with LDAP and SAML authentication.

 

 

TileDB's embeddable C++ library includes APIs for C, C++, Python, R, Java and Go. The library is integrated with Spark, Dask, PrestoDB, MariaDB, Arrow and geospatial libraries like PDAL, GDAL and Rasterio. The new release has a completely revamped TileDB R API that was worked on by Dirk Eddelbuettel, a developer known for his work on several popular R packages and a board member of the R Foundation. The TileDB team says:

"We want to make TileDB an integral part of the R ecosystem and are just getting going on integrations with other key R packages, such as the tidyverse and Bioconductor."

TileDB 2.0 adds support for Google Cloud Storage and Azure Blob Storage to the existing AWS S3 support together with a completely revamped R API. The other main improvement to the new release is support for heterogenous and string dimensions. The previous release of TileDB only supported homogeneous dimensions, dimensions with the same data type. This worked well for some data, but the developers realized it was limiting for dataframes where the columns were made up of different data types such as Date (Datetime) and Price (Double).. In addition, many dataframes also have String columns that users need to slice such as Name. TileDB 2.0 adds heterogeneous and string dimensions, so now fully supports use with dataframes.

The new release is available on GitHub or from the TileDB website.

 

tiledb

 

More Information

TileDB Homepage

TileDB On GitHub

Related Articles

Databricks Delta Adds Faster Parquet Import

Databricks Delta Lake Now Open Source

RAPIDS GPU Data Analysis Platform Launched

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Is PHP in Trouble?
10/04/2024

The April 2024 headline for the TIOBE Index, which ranks programming languages in terms of their popularity, reads, "Is PHP losing its mojo" asking this question because this month PHP has dropped out [ ... ]



Grafana 11 Improves Metrics
11/04/2024

Grafana Labs, creators of the Grafana open-source metrics analytics and visualization suite, has announced the preview release of Grafana 11 with improvements to make it easier to view metrics, and ch [ ... ]


More News

raspberry pi books

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Monday, 25 May 2020 )