Apache Kylin 2.5 Adds All-in-Spark Cubing Engine
Written by Kay Ewbank   
Tuesday, 02 October 2018

There's a new release of Apache Kylin with improvements including an all-in-Spark cubing engine, and support for using MySQL for the Kylin metastore.

kylin

Kylin is an open source distributed analytics engine designed to provide a SQL interface and multi-dimensional analysis (OLAP) on Apache. It was originally developed at eBay before becoming an Apache project.

The Kylin OLAP Engine is made up of a metadata engine, a query engine, a job engine and a storage engine. It also includes a REST Server to service client requests. The query engine is based on Apache Calcite.

The new all-in-Spark cubing engine means that Kylin’s Spark engine will now run all distributed jobs in Spark, including fetch distinct dimension values, converting cuboid files to HBase HFile, merging segments and merging dictionaries. The developers say that the default configurations are tuned so the user can get an out-of-box experience, and that the performance is expected to improve. Job management for Spark has also been improved so that you can get the job link in the web console once Spark starts to run. If you discard the job, Kylin will kill the Spark job to release the resource, and if Kylin is restarted, it can resume from the previous job instead of resubmitting a new job.

In previous versions, the only choice for storing Kylin metadata was HBase, but from this release you can choose to use MySQL instead. This will overcome problems caused by the fact that replicated HBase is read only, so doesn't really work when used for Kylin's High Availability in a clustered structure. MySQL will be able to work correctly in such cases, though the function is currently in beta.

The next improvement is the ability to create Hybrid models in a custom web GUI. Hybrid is an advanced model for creating multiple cubes, and it can be used for the Cube schema change issue. This function had no GUI in the past so was used by only a small portion of Kylin users. This version of Kylin adds a web GUI to make it easier to use. 

Another cube related improvement the enabling by default of the cube planner, a feature added in Kylin 2.3 to optimize the cube structure. The cube planner can not only optimize the cube structure, but by doing that can use less computing and/or storage resources and improve the query performance. The algorithm will automatically optimize the cube by your data statistics on the first build.

This release also offers better segment pruning to reduce the disk and network I/O. Until now, Kylin only pruned segments by the partition column’s value, but this version records the minimum and maximum values for all dimensions at the segment level.

Other improvements include the ability to carry out dictionary merges on YARN rather than in Kylin’s JVM; better estimating of cube size for TOPN and COUNT DISTINCT measures; and support for Hadoop 3 and HBase 2. 

kylin

More Information

Kylin Website

Related Articles

Kylin 2.3.0 Adds SQL Server Support

Apache Kylin Gets Table Level ACL Management

Apache Kylin Adds RDBMS Support 

Spark BI Gets Fine Grain Security

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Random Gifts For Programmers
24/11/2024

Not really random. Not even pseudo random, more stuff that caught my attention and that I, for one, would like to be given. And, yes, if I'm not given them, I'd probably buy some for myself.



Microsoft Introduces Vector Abstractions Library For .NET
21/11/2024

Microsoft has announced a preview release of the Microsoft Extensions VectorData Abstractions library, which can be used to help integrate vector stores into .NET applications and libraries.


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Tuesday, 02 October 2018 )