MOOC On Apache Spark
Written by Alex Denham   
Thursday, 28 May 2015

If you are want to apply data science techniques using parallel programming, in Apache Spark, you'll be interested in an edX course starting Monday June 1st that prepares you for the Spark Certified Developer exam.

spark

CS 100.1x Introduction to Big Data with Apache Spark is a 5-week course at Intermediate level under the auspices of UC BerkeleyX, Berkeley's online course outfit, and sponsored by Databricks, a company founded by the creators of Apache Spark.

It will be taught by Anthony D Joseph who is both Professor in Electrical Engineering and Computer Science and Technical Adviser at Databricks.

With a required effort of 5-7 hours per week (around 30 hours in total) students will learn:

  • Learn how to use Apache Spark to perform data analysis

  • How to use parallel programming to explore data sets

  • Apply Log Mining, Textual Entity Recognition and Collaborative Filtering to real world data questions

  • Prepare for the Spark Certified Developer exam

The Spark Certified developer exam is offered by Databricks in conjunction with O'Reilly at a cost of $300. It can be taken in person during sessions at Strata events or online from you computer.

This certification enables you to:

 

  • Demonstrate industry recognized validation for your expertise.
  • Meet global standards required to ensure compatibility between Spark applications and distributions.
  • Stay up to date with the latest advances and training in Spark.
  • Become an integral part of the growing Spark developer community.

Of course you don't have to take this certification and can use this MOOC, simply to extend your knowledge of data science. It is part of a two-module Big Data XSeries with the other module being CS 190.1x: Scalable Machine Learning which starts on June 29.

 

cs10001x

According to its rubric:

This course will attempt to articulate the expected output of Data Scientists and then teach students how to use PySpark (part of Apache Spark) to deliver against these expectations. The course assignments include Log Mining, Textual Entity Recognition, Collaborative Filtering exercises that teach students how to manipulate data sets using parallel processing with PySpark.

Because all exercises will use PySpark (part of Apache Spark) you either need expereience with Python or to take a free online Python mini-course supplied by UC Berkeley.

 

 

 

 

 

Banner


Angular 11 Released With Byelog Goal Complete
17/11/2020

Angular 11 has been released with updates across the platform including the framework, the CLI and components. More specifically, the new version enforces stricter types and has automatic inlining of  [ ... ]



Haskell Foundation Launched
05/11/2020

Launched at this week's Haskell eXchange online event, the newly formed Haskell Foundation has been established as a non-profit dedicated to broadening the adoption of the Haskell programming language [ ... ]


More News

 

square

 



 

Comments




or email your comment to: comments@i-programmer.info

 

Last Updated ( Wednesday, 17 August 2016 )