Apache MADlib Adds HITS Implementation
Apache MADlib Adds HITS Implementation
Written by Kay Ewbank   
Wednesday, 10 January 2018

There's a new version of Apache MADlib with new features including an implementation of HITS. MADlib makes it possible to carry out  big data machine learning from SQL

MADlib is an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine learning methods for structured and unstructured data. It currently supports PostgreSQL, Greenplum Database, and Apache HAWQ. It started as a collaboration between a team at UC Berkeley and developers at Pivotal. Pivotal was previously known as EMC Greenplum. The project was added to Apache as an incubator project in 2015.

MADlib uses the MPP (Massively Parallel Processing) architecture’s full compute power to process very large data sets, whereas other products are limited by the amount of data that can be loaded into memory on a single node. It runs as a fully parallelized implementation on GPDB (Greenplum Database)  and HAWQ for large data sets, meaning it offers a much better performance than R or Python libraries. It is scalable due to the ability to add more nodes to achieve higher performance as your data scales.  Greenplum Database is an advanced, fully featured, open source data platform designed for analyzing petabyte scale data volumes. HAWQ is Apache Hadoop Native SQL Advanced Analytics MPP Database for Enterprises, and is currently an Apache Incubator project.

When MADlib was made a top level project in August 2017, Joe Hellerstein, Professor of Computer Science at UC Berkeley, Co-Founder and Chief Strategy Officer at Trifacta, and one of the original authors of MADlib, said:

"MADlib was conceived from the outset as an open-source meeting ground for software developers, computing researchers and data scientists to collaborate on scalable, in-database machine learning and statistics."

The new release, 1.13, of MADlib has a new HITS (Hyperlink-Induced Topic Search) link analysis algorithm. HITS provides a way to analyze links to rate web pages.

Another improvement to the new release is better handling of k-nearest neighbors classification. k-NN in MADlib now has more distance metrics, and the ability to show a list of neighbors in the output table.

Grouping support has been added to MLP (MultiLayer Perceptron), and the quality of results for correlation analysis has been improved by ignoring only a NULL value and not the whole row containing the NULL.

madlib 

 

More Information

MADlib site

Related Articles

Apache PredictionIO Reaches Top Level Status

Azure Machine Learning Enhancements

Amazon's Giant Push Into Machine Learning

Spark Gets NLP Library

 

 

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on, Twitter, FacebookGoogle+ or Linkedin.

 

Banner


Fear And Loathing In The App Store 18 - Apple Bans Templated Apps
22/12/2017

This story isn't clear cut black and white, Apple is evil or Apple is good. What you think of the action depends on which side of the fence you are on, but it is a clear indication that once you accep [ ... ]



Fear And Loathing In The App Store 19 - Apple Rejects Net Neutrality App
20/01/2018

UPDATE  Apple bowed to concerted pressure and the Wehe App has been approved by Apple and is now available in the App Store.

Initially the app, which tests to see if your ISP is applying throttl [ ... ]


More News

 

 
 

 

blog comments powered by Disqus

Last Updated ( Wednesday, 10 January 2018 )
 
 

   
Banner
RSS feed of news items only
I Programmer News
Copyright © 2018 i-programmer.info. All Rights Reserved.
Joomla! is Free Software released under the GNU/GPL License.