Apache Kylin 2.5 Adds All-in-Spark Cubing Engine

Written by Kay Ewbank

Tuesday, 02 October 2018

There's a new release of Apache Kylin with improvements including an all-in-Spark cubing engine, and support for using MySQL for the Kylin metastore.

kylin

Kylin is an open source distributed analytics engine designed to provide a SQL interface and multi-dimensional analysis (OLAP) on Apache. It was originally developed at eBay before becoming an Apache project.

The Kylin OLAP Engine is made up of a metadata engine, a query engine, a job engine and a storage engine. It also includes a REST Server to service client requests. The query engine is based on Apache Calcite.

The new all-in-Spark cubing engine means that Kylin’s Spark engine will now run all distributed jobs in Spark, including fetch distinct dimension values, converting cuboid files to HBase HFile, merging segments and merging dictionaries. The developers say that the default configurations are tuned so the user can get an out-of-box experience, and that the performance is expected to improve. Job management for Spark has also been improved so that you can get the job link in the web console once Spark starts to run. If you discard the job, Kylin will kill the Spark job to release the resource, and if Kylin is restarted, it can resume from the previous job instead of resubmitting a new job.

In previous versions, the only choice for storing Kylin metadata was HBase, but from this release you can choose to use MySQL instead. This will overcome problems caused by the fact that replicated HBase is read only, so doesn't really work when used for Kylin's High Availability in a clustered structure. MySQL will be able to work correctly in such cases, though the function is currently in beta.

The next improvement is the ability to create Hybrid models in a custom web GUI. Hybrid is an advanced model for creating multiple cubes, and it can be used for the Cube schema change issue. This function had no GUI in the past so was used by only a small portion of Kylin users. This version of Kylin adds a web GUI to make it easier to use.

Another cube related improvement the enabling by default of the cube planner, a feature added in Kylin 2.3 to optimize the cube structure. The cube planner can not only optimize the cube structure, but by doing that can use less computing and/or storage resources and improve the query performance. The algorithm will automatically optimize the cube by your data statistics on the first build.

This release also offers better segment pruning to reduce the disk and network I/O. Until now, Kylin only pruned segments by the partition column’s value, but this version records the minimum and maximum values for all dimensions at the segment level.

Other improvements include the ability to carry out dictionary merges on YARN rather than in Kylin’s JVM; better estimating of cube size for TOPN and COUNT DISTINCT measures; and support for Hadoop 3 and HBase 2.

kylin

More Information

Kylin Website

Kylin 2.3.0 Adds SQL Server Support

Apache Kylin Gets Table Level ACL Management

Apache Kylin Adds RDBMS Support

Spark BI Gets Fine Grain Security

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Facebook or Linkedin.

AI - It's All Downhill From Here?
31/12/2025

AI is a complex beast, but it is based on some very simple and very powerful ideas that deserve to be better known as they throw much light not only on the way AI works but on the way the universe wor [ ... ]

+ Full Story

TypeScript 7 On Course For Early 2026
30/12/2025

Microsoft says work on the next two versions of TypeScript is going well, with both TypeScript 6 and 7 on course to appear in early 2026. TypeScript 7.0 is the version that is being rewritten in nativ [ ... ]

+ Full Story

More News

Comments

or email your comment to: comments@i-programmer.info

Last Updated ( Tuesday, 02 October 2018 )

More Information

Related Articles

Comments