MOOC On Apache Spark
Written by Alex Denham   
Thursday, 28 May 2015

If you are want to apply data science techniques using parallel programming, in Apache Spark, you'll be interested in an edX course starting Monday June 1st that prepares you for the Spark Certified Developer exam.

spark

CS 100.1x Introduction to Big Data with Apache Spark is a 5-week course at Intermediate level under the auspices of UC BerkeleyX, Berkeley's online course outfit, and sponsored by Databricks, a company founded by the creators of Apache Spark.

It will be taught by Anthony D Joseph who is both Professor in Electrical Engineering and Computer Science and Technical Adviser at Databricks.

With a required effort of 5-7 hours per week (around 30 hours in total) students will learn:

  • Learn how to use Apache Spark to perform data analysis

  • How to use parallel programming to explore data sets

  • Apply Log Mining, Textual Entity Recognition and Collaborative Filtering to real world data questions

  • Prepare for the Spark Certified Developer exam

The Spark Certified developer exam is offered by Databricks in conjunction with O'Reilly at a cost of $300. It can be taken in person during sessions at Strata events or online from you computer.

This certification enables you to:

 

  • Demonstrate industry recognized validation for your expertise.
  • Meet global standards required to ensure compatibility between Spark applications and distributions.
  • Stay up to date with the latest advances and training in Spark.
  • Become an integral part of the growing Spark developer community.

Of course you don't have to take this certification and can use this MOOC, simply to extend your knowledge of data science. It is part of a two-module Big Data XSeries with the other module being CS 190.1x: Scalable Machine Learning which starts on June 29.

 

cs10001x

According to its rubric:

This course will attempt to articulate the expected output of Data Scientists and then teach students how to use PySpark (part of Apache Spark) to deliver against these expectations. The course assignments include Log Mining, Textual Entity Recognition, Collaborative Filtering exercises that teach students how to manipulate data sets using parallel processing with PySpark.

Because all exercises will use PySpark (part of Apache Spark) you either need expereience with Python or to take a free online Python mini-course supplied by UC Berkeley.

 

 

 

 

 

Banner


Apache Updates Wicket
03/10/2024

Apache Wicket has been updated to version 10.2, following the major release of Wicket 10 earlier this year. The open source Java web framework is now built on top of Java 17, and has a new module test [ ... ]



VS Code Extension For Python Data Science
23/09/2024

Microsoft has announced the Python Data Science Extension Pack for Visual Studio Code which is intended as a one-stop shop for doing data science work in Python. If, as a Python programmer, you d [ ... ]


More News

 

kotlin book

 

Comments




or email your comment to: comments@i-programmer.info

 

Last Updated ( Wednesday, 17 August 2016 )