|Apache Kylin 2.5 Adds All-in-Spark Cubing Engine|
|Written by Kay Ewbank|
|Tuesday, 02 October 2018|
There's a new release of Apache Kylin with improvements including an all-in-Spark cubing engine, and support for using MySQL for the Kylin metastore.
Kylin is an open source distributed analytics engine designed to provide a SQL interface and multi-dimensional analysis (OLAP) on Apache. It was originally developed at eBay before becoming an Apache project.
The Kylin OLAP Engine is made up of a metadata engine, a query engine, a job engine and a storage engine. It also includes a REST Server to service client requests. The query engine is based on Apache Calcite.
The new all-in-Spark cubing engine means that Kylin’s Spark engine will now run all distributed jobs in Spark, including fetch distinct dimension values, converting cuboid files to HBase HFile, merging segments and merging dictionaries. The developers say that the default configurations are tuned so the user can get an out-of-box experience, and that the performance is expected to improve. Job management for Spark has also been improved so that you can get the job link in the web console once Spark starts to run. If you discard the job, Kylin will kill the Spark job to release the resource, and if Kylin is restarted, it can resume from the previous job instead of resubmitting a new job.
In previous versions, the only choice for storing Kylin metadata was HBase, but from this release you can choose to use MySQL instead. This will overcome problems caused by the fact that replicated HBase is read only, so doesn't really work when used for Kylin's High Availability in a clustered structure. MySQL will be able to work correctly in such cases, though the function is currently in beta.
The next improvement is the ability to create Hybrid models in a custom web GUI. Hybrid is an advanced model for creating multiple cubes, and it can be used for the Cube schema change issue. This function had no GUI in the past so was used by only a small portion of Kylin users. This version of Kylin adds a web GUI to make it easier to use.
Another cube related improvement the enabling by default of the cube planner, a feature added in Kylin 2.3 to optimize the cube structure. The cube planner can not only optimize the cube structure, but by doing that can use less computing and/or storage resources and improve the query performance. The algorithm will automatically optimize the cube by your data statistics on the first build.
This release also offers better segment pruning to reduce the disk and network I/O. Until now, Kylin only pruned segments by the partition column’s value, but this version records the minimum and maximum values for all dimensions at the segment level.
Other improvements include the ability to carry out dictionary merges on YARN rather than in Kylin’s JVM; better estimating of cube size for TOPN and COUNT DISTINCT measures; and support for Hadoop 3 and HBase 2.
or email your comment to: firstname.lastname@example.org
|Last Updated ( Tuesday, 02 October 2018 )|