Take The ETH Zürich Big Data Course For Free
Written by Nikos Vaggalis   
Friday, 28 October 2022

A great course on everything Big Data taught at ETH Zürich University by Professor Ghislain Fourny. The recorded lectures of fall 2021 are up on Youtube for everyone to enjoy.

The notion behind Big Data that this course adopts is that :

Information society has to turn data into information, information into knowledge, knowledge into value. This has become increasingly complex. Data comes in larger volumes, diverse shapes, from different sources. Data is more heterogeneous and less structured than forty years ago. Nevertheless, it still needs to be processed fast, with support for complex operations.

The course revolves around the database technologies and the most important database design principles that lay the foundations of the Big Data universe. These are distributed storage, the syntax, models, validation, processing, indexing, and querying, all fitted to the Big Data model. Looking more into them we find that they are expanded into :

  • physical storage: distributed file systems (HDFS), object storage(S3), key-value stores

  • logical storage: document stores (MongoDB), column stores (HBase), graph databases (neo4j), data warehouses (ROLAP)

  • data formats and syntaxes (XML, JSON, RDF, Turtle, CSV, XBRL, YAML, protocol buffers, Avro)
  • data shapes and models (tables, trees, graphs, cubes)

  • type systems and schemas: atomic types, structured types (arrays, maps), set-based type systems (?, *, )

  • an overview of functional, declarative programming languages across data shapes (SQL, XQuery, JSONiq, Cypher, MDX)

  • the most important query paradigms (selection, projection, joining, grouping, ordering, windowing)

  • paradigms for parallel processing, two-stage (MapReduce) and DAG-based (Spark)

  • resource management (YARN)

  • what a data center is made of and why it matters (racks, nodes, . . . )

  • underlying architectures (internal machinery of HDFS, HBase, Spark, neo4j)

  • optimization techniques (functional and declarative paradigms, query plans, rewrites, indexing)

  • applications

Subsequently, those subjects are spread out in a 40 recorded lectures, each video up to 45 minutes in length :

  • Introduction
  • Lessons learnt (1/2/3)
  • Object storage (1/2/3)
  • Distributed file systems (1/2/3)
  • Syntax (1/2/3)
  • Wide column stores (1/2/3)
  • Data models (1/2/3)
  • Massive Parallel Processing I MapReduce (1/2)
  • Resource management (1/2)
  • Massive parallel processing II: Spark (1/2/3/4)
  • Performance at large scales (1/2)
  • Document stores (1/2/3/4)
  • Querying trees (1/2/3/4)
  • Graph databases (1/2/3)
  • Data warehouses and cubes (1/2/3)
  • Wrap up (1/2)

All the videos are very interesting but I particularly liked the series on "Wide column stores". 

There also "Big Data for Engineers" being taught which is similar to Big Data, but it's adapted for non Computer Scientists. Big Data is addressed purely to Computer Science students.

In the end watching through should have gained you an overview and understanding of the Big Data landscape.Armed with this knowledge you should be able to make informed decisions addressing any of your projects' needs.


More Information

YouTube playlist

Related Articles

Brand New Data Science Courses on edX

OS-Climate - Open Source To Tackle Climate Change

Google's Cloud Spanner To Settle the Relational vs NoSQL Debate?



To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.


Stack Overflow Jobs Reborn In Partnership With Indeed

Stack Overflow has launched a new jobs site co-branded with Indeed. It is intended to make thousands of highly-relevant job openings easily discoverable by developers. The job site is curren [ ... ]

OpenSilver 2.2 Adds LightSwitch Compatibility Pack

OpenSilver 2.2 has been released with the addition of a LightSwitch Compatibility Pack designed to provide a way to run legacy Visual Studio LightSwitch applications on modern browsers. The open-sourc [ ... ]

More News

raspberry pi books



or email your comment to: comments@i-programmer.info

Last Updated ( Wednesday, 02 November 2022 )