Intel GraphBuilder Good For Extracting Knowledge From Big Data
Written by Kay Ewbank   
Friday, 07 December 2012

Intel's Open Source Technology Center has released GraphBuilder, an open source tool that you can use to create web-like structures for analyzing big data.

GraphBuilder is a Java library for constructing graphs out of large datasets for data analytics and structured machine learning applications that exploit relationships in data. The library offloads many of the complexities of graph construction, such as graph formation, tabulation, compression, transformation, partitioning, output formatting, and serialization. It scales using the MapReduce parallel programming model. The major components of GraphBuilder library, and its relation to Hadoop MapReduce, are shown in this diagram:

 

(Click in diagram to enlarge)

 

GraphBuilder will reveal hidden structure within big data. Writing on the GraphBuilder blog, Intel principal scientist Ted Willke explains that apps such as Hadoop MapReduce chop big data sets into slices and farm it out to masses of machines for filtering, ordering and transforming. Such systems don’t make it easy to extract knowledge from a different type of structure within the data, a type that is best modeled by tree or graph structures,  Willke says:

“Imagine the pattern of hyperlinks connecting Wikipedia pages or the connections between Tweeters and Followers on Twitter. In these models, a line is drawn between two bits of information if they are related to each other in some way. The nature of the connection can be less obvious than in these examples and made specifically to serve a particular algorithm.”

The research team at Intel found that there are a number of systems available to process, store, visualize, and mine graphs but not to construct them from unstructured sources. With this in mind, Intel set out to develop a demo of a scalable graph construction library for Hadoop, and this became GraphBuilder, which has been open sourced this week at 01.org.

intelio

 

GraphBuilder not only constructs large-scale graphs rapidly, but also offloads many of the complexities of graph construction, including graph formation, cleaning, compression, partitioning, and serialization. Willke says that this makes it easy for just about anyone to build graphs for interesting research and commercial applications, and that using GraphBuilder, a Java programmer could build an internet-scale graph for PageRank in about 100 lines of code and a Wikipedia-sized graph for LDA in about 130.

 

graphbuilderlogo

 

More Information

GraphBuilder

GraphBuilder: Reveal hidden structure within big data

Related Articles

Microsoft's New Research Center into Social Data

HDInsight - Brings Apache Hadoop to Windows

Twitter Can't Predict Elections Either

New Hadoop connectors

 

To be informed about new articles on I Programmer, install the I Programmer Toolbar, subscribe to the RSS feed, follow us on, Twitter, Facebook, Google+ or Linkedin,  or sign up for our weekly newsletter.

kotlin book

 

Comments




or email your comment to: comments@i-programmer.info

Banner


VS Code Extension For Python Data Science
23/09/2024

Microsoft has announced the Python Data Science Extension Pack for Visual Studio Code which is intended as a one-stop shop for doing data science work in Python. If, as a Python programmer, you d [ ... ]



OpenAI Announces ChatGPT Canvas
10/10/2024

OpenAI has launched an extra facility for developers using ChatGPT. Canvas is described as offering a new way of working with ChatGPT to write and code.


More News

You can try out GraphBuilder here: https://01.org/graphbuilder/

Last Updated ( Friday, 07 December 2012 )