|Grail Open Sources Bigslice Cluster Computing For Golang|
|Written by Kay Ewbank|
|Thursday, 17 October 2019|
GRAIL has open sourced two projects, Bigslice and Bigmachine, which enable distributed computation across large datasets using simple Golang programs.
Bigslice is a system for fast, large-scale, serverless data processing using Go. It exposes a composable API that lets the user express data processing tasks in terms of a series of data transformations that invoke user code.
The developers describe Bigslice as similar to data processing systems like Apache Spark and FlumeJava, but with different aims. Bigslice is built for Go, and is used as an ordinary Go package. Users use their existing Go code, and Bigslice binaries are compiled like ordinary Go binaries.
It is also serverless, and the team says that with nothing more than cloud credentials, Bigslice can be used to process large datasets without the use of any other external infrastructure.
Bigslice programs are regular Go programs, providing users with a familiar environment and tools. A Bigslice program can be run on a single node like any other program, but it is also capable of transparently distributing itself across an ad hoc cluster, managed entirely by the program itself.
The data processing features of Bigslice come in the form of a coherent set of operators that can be used to work with large data sets using ordinary Go code. The operators are familiar data transformation primitives such as map, filter, reduce, and join. While the user’s computations are sequential — they specify how a dataset is to be transformed, step-by-step, into the desired result — Bigslice parallelizes the computation and can distribute it across many processors and over large compute clusters.
Bigslice achieves this by splitting the datasets into many smaller pieces, and performing the transformations individually on each piece so that they can fit in memory, and so they can be performed in parallel across many machines. When transformations require that data be rearranged (for operations like join or reduce), Bigslice arranges that the data are re-shuffled accordingly.
Bigslice uses Bigmachine to manage an ad-hoc cluster of compute nodes to support distribution. Bigmachine is a toolkit for building self-managing serverless applications in Go. It provides an API that lets a driver process form an ad-hoc cluster of machines to which user code is transparently distributed. User code is exposed through services, which are stateful Go objects associated with each machine.
or email your comment to: firstname.lastname@example.org
|Last Updated ( Thursday, 17 October 2019 )|