Ebay Open Sources Beam
Written by Alex Denham   
Friday, 10 May 2019

eBay has made its distributed knowledge graph store open source. Beam can be used to store RDF-like data and supports SPARQL-like querying. It uses Apache Kafka for storage. Beam was largely written in Go.

Knowledge graph stores are designed for modeling data that is highly interconnected. In a knowledge graph, data is represented as a single table of facts, where each fact has a subject, predicate, and object.Beam uses an RDF-like representation for data and a SPARQL-like query language.

ebaylogo

Beam is distributed because it is designed to store large graphs that are too large to fit on a single server. It scales out horizontally to support higher query rates and larger data sets. The developers say that while its write rates don't scale, a typical Beam deployment should still be able to support tens of thousands of changes per second. Ebay has run a 20-server deployment of Beam for development purposes for about a year, which has been loaded with a dataset of about 2.5 billion facts, and which hasn't pushed Beam to its limits.

Beam's architecture is based around a central log. All write requests are added to an append-only central log. The log is a network service that is internally replicated for fault-tolerance. Several view servers read the log and apply its entries in sequence. Different view servers maintain different states. An API tier accepts requests from clients. It appends the write requests to the log, and it collects data from the view servers to answer reads.

beamarch

Apache Kafka is used for the log, and the view is implemented as a DiskView, which can run in two modes: either indexing knowledge graph facts by subject-predicate or by predicate-object. A typical deployment will run three replicas of multiple partitions of each mode. The DiskViews store their facts in RocksDB.

There's also an API server that contains a query processor. This implements a query language that's similar to a subset of SPARQL. It consists of a parser, a cost-based query planner, and a parallel execution engine. The parser transforms an initial set of query lines into an abstract syntax tree (AST). The planner combines the AST with statistics about the data to find an efficient query plan. The executor then runs the plan, using batching and streaming throughout for high performance.

Beam has been made open source because the team at Ebay can't continue working on it full-time to turn it into a finished production-ready system. The developers say Beam is ready to be used for offline, noncritical, or research applications today. It also has a number of internal packages that may be useful in other projects, such as its fanout and query planner modules. Beam is now available on GitHub.

ebaylogoMore Information

Beam On GitHub

Related Articles

Visual Search Adopted by eBay

eBay introduces ql.io

Apache Kylin Adds RDBMS Support

Kafka 2 Adds Support For ACLs

SPARQL Moves Closer

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on, Twitter, Facebook or Linkedin.

Banner


MasterTracks and Professional Certificates in Data Science on Coursera
12/09/2019

Coursera has introduced two new types of credentials and has  Data Science offering for both of them. The new MasterTrack courses will be of interest to those looking to gain a Masters Degree whi [ ... ]



WiringPi - Deprecated
24/08/2019

Open source is wonderful - except when it isn't. The latest mini-disaster, let's not get carried away, is that the creator and maintainer of wiringPi has given up, frustrated with the way people, aka  [ ... ]


More News

graphics

 



 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Friday, 10 May 2019 )