Hadoop 2.9 Adds Resource Estimator
Hadoop 2.9 Adds Resource Estimator
Written by Kay Ewbank   
Friday, 24 November 2017

Apache has released Hadoop 2.9 with new features including YARN federation, HDFS router based federation, and a resource estimator.

The Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. YARN is a framework for job scheduling and cluster resource management, and high availability for the HDFS filing system.

The YARN federation means that it should be possible to scale a single YARN cluster to tens of thousands of nodes, by federating multiple YARN sub-clusters. The proposed approach is to divide a large (10-100k nodes) cluster into smaller units called sub-clusters, each with its own YARN resource manager and compute nodes. The federation system will stitch these sub-clusters together and make them appear as one large YARN cluster to the applications. The new version also adds a new version of the YARN Web UI.

Router based federation has also been added for HDFS. Until now, HDFS supported partitioned federation, where the filesystem is split into smaller subclusters, but this gives the problem of how to maintain the split of the subclusters, meaning users have to connect to multiple subclusters and manage the allocation of folders and files to the various subclusters. The router based federation adds a layer of software responsible for federating the namespaces, meaning the subclusters manage their own block pools independently. The Router component that has the same interface as a NameNode, and forwards the client requests to the correct subcluster.

The Resource Estimator gives an estimate of job resource requirements, based on the fact that a large portion of jobs (more than 60%) are recurring jobs, so can be used to automatically estimate job resource requirements based on job’s history runs.

Another improvement to this version is the addition of opportunistic containers. Unlike existing YARN containers that are scheduled in a node only if there are unallocated resources, opportunistic containers can be dispatched to a node manager even if their execution at that node cannot start immediately. The container will be queued at that node manager until resources become available. 

The final main change to Hadoop 2.9 is the addition of an API for Scheduler Queue (Re-)configuration for the CapacityScheduler

 

hadooplogo

More Information

Apache Hadoop Site

Related Articles

Hadoop Adds In-Memory Caching

Hadoop SQL Query Engine Launched

Hadoop 2 Introduces YARN 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on, Twitter, FacebookGoogle+ or Linkedin.

 

Banner


JetPack And Android Studio 3.2 - Not Much New
09/05/2018

Google I/O used to be an exciting meet up where really new and startling things were announced. Now it's past its best and Googler's have to work hard to package the dull into something that looks exc [ ... ]



What Makes Python Special?
07/05/2018

Python is currently a trending language. It has ranked as the most popular programming language in more than one survey. It is the most widely used language for teaching computer science and is the la [ ... ]


More News

 

justjsquare

 



 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Friday, 24 November 2017 )
 
 

   
Banner
Banner
RSS feed of news items only
I Programmer News
Copyright © 2018 i-programmer.info. All Rights Reserved.
Joomla! is Free Software released under the GNU/GPL License.