IBM Big SQL Sandbox
Written by Kay Ewbank   
Tuesday, 19 September 2017

IBM has released a sandbox version of Big SQL for desktop use. The Sandbox comes as a single node docker image, and is designed to let you started with Big SQL and Hortonworks Data platform.

Each Sandbox download comes preconfigured with sample data, a tutorial and an exercise for you to complete, and IBM says you'll be up and running in 30 minutes.

IBM Big SQL is IBM's SQL engine for Hadoop. IBM has worked with Hortonworks to integrate HDP (Hortonworks Data Platform) with IBM Big SQL, and Big SQL 5 extends the capabilities of Hive, and makes use of HBase and Spark to provide an integated analytics option.




Big SQL makes use of IBM Fluid Query to virtualize data from many different data stores such as Hive, HBase, Spark, DB2, Oracle, SQL Server, Netezza, Informix, Teradata, WebHDFS and object store.

IBM Fluid Query was introduced in 2015. It is powered by Netezza technology, and can be used to create federated queries where the data is drawn from a variety of sources, without the users of the data neding to deal with managing multiple data stores or query systems. Fluid Query can also be used to carry out and control bulk data movement between data repositories. Netezza created the first data warehouse appliance, and as an independent company also developed advanced analytics applications. It was bought by IBM in 2010. 

Big SQL offers bi-directional integration with Spark, and supports synthesis between Spark executors and Big SQL worker nodes. Along with the big data support, it also supports SQL dialects from other offerings such as IBM DB2 database and IBM Netezza data warehouse appliances and Oracle database, including built-in support for Oracle’s SQL and PL/SQL dialects. IBM's hope is that applications that were written against Oracle will be moved to run in Big SQL, because they can be moved across with minimal changes.

Big SQL also offers YARN integration through Slider. YARN (Yet Another Resource Negotiator) is Apache's cluster management technology, while Slider extends Hadoop and YARN to let other databases run in YARN without modification. Obviously thinking they hadn't included enough big data names and technologies, IBM has added a new technology to Big SQL called “Elastic Boost”.  IBM says this can improved Big SQL's performance by up to 50% by enabling allocation of multiple workers per node for more efficient CPU and memory utilization.

Big SQL also comes with an ANSI-compliant SQL parser that can run all the 99 TPC-DS queries without the need for query modifications and structured streaming with new APIs.



More Information

Big SQL Sandbox


Related Articles

SQL At Hadoop Scale 

Hadoop Adds In-Memory Caching

Apache Spark With Structured Streaming


To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on, Twitter, Facebook or Linkedin.



SIGGRAPH ASIA 2020 -The Trailer

SIGGRAPH ASIA is increasingly important and this year it is running 4-13 December - but as an online event. This means more of us can join in. See the technical paper video - you won't regret the time [ ... ]

Android Adopts Bazel For Build System

The Android Platform is migrating from its current build systems (Soong and Make) to Bazel. While components of Bazel have been already checked into the Android Open Source Project (AOSP) source tree, [ ... ]

More News






or email your comment to:

Last Updated ( Tuesday, 19 September 2017 )