MOOC On Apache Spark
Written by Alex Denham   
Thursday, 28 May 2015

If you are want to apply data science techniques using parallel programming, in Apache Spark, you'll be interested in an edX course starting Monday June 1st that prepares you for the Spark Certified Developer exam.

spark

CS 100.1x Introduction to Big Data with Apache Spark is a 5-week course at Intermediate level under the auspices of UC BerkeleyX, Berkeley's online course outfit, and sponsored by Databricks, a company founded by the creators of Apache Spark.

It will be taught by Anthony D Joseph who is both Professor in Electrical Engineering and Computer Science and Technical Adviser at Databricks.

With a required effort of 5-7 hours per week (around 30 hours in total) students will learn:

  • Learn how to use Apache Spark to perform data analysis

  • How to use parallel programming to explore data sets

  • Apply Log Mining, Textual Entity Recognition and Collaborative Filtering to real world data questions

  • Prepare for the Spark Certified Developer exam

The Spark Certified developer exam is offered by Databricks in conjunction with O'Reilly at a cost of $300. It can be taken in person during sessions at Strata events or online from you computer.

This certification enables you to:

 

  • Demonstrate industry recognized validation for your expertise.
  • Meet global standards required to ensure compatibility between Spark applications and distributions.
  • Stay up to date with the latest advances and training in Spark.
  • Become an integral part of the growing Spark developer community.

Of course you don't have to take this certification and can use this MOOC, simply to extend your knowledge of data science. It is part of a two-module Big Data XSeries with the other module being CS 190.1x: Scalable Machine Learning which starts on June 29.

 

cs10001x

According to its rubric:

This course will attempt to articulate the expected output of Data Scientists and then teach students how to use PySpark (part of Apache Spark) to deliver against these expectations. The course assignments include Log Mining, Textual Entity Recognition, Collaborative Filtering exercises that teach students how to manipulate data sets using parallel processing with PySpark.

Because all exercises will use PySpark (part of Apache Spark) you either need expereience with Python or to take a free online Python mini-course supplied by UC Berkeley.

 

 

 

 

 

Banner


Facebook Launches DeepFake Detection Challenge
06/09/2019

Facebook is teaming up with Microsoft and academics to create a Deepfake Detection Challenge. The goal of the challenge is to produce technology that can be used by anyone to detect when AI has been u [ ... ]



Linux Foundation Launches Reactive Foundation
17/09/2019

The Linux Foundation has announced the launch of the Reactive Foundation, a new community that aims to promote the use of reactive programming in networked applications.


More News

 

graphics

 



 

Comments




or email your comment to: comments@i-programmer.info

 

Last Updated ( Wednesday, 17 August 2016 )