Data Creation Course Free On edX Opens Today
Written by Sue Gee   
Tuesday, 21 February 2023

A new course from the Delft University of Technology on the edX platform takes a  novel approach to generating data for artificial intelligence and machine learning.

The title provides the clue - Data Creation and Collection for Artificial Intelligence via Crowdsourcing


Disclosure: When you make a purchase having followed a link to from this article, we may earn an affiliate commission.

Courses on Machine Learning often skip the step of obtaining the data for training an AI model so a course that focuses on how to acquire data that accurately represents real application scenarios is welcome. In addition it introduces crowdsourcing which could be the answer to the problems of bias and fairness that have been identified in machine learning models.

Noting that creating data is a long, laborious and expensive process the course introduction argues:

Crowdsourcing offers a viable means of leveraging human intelligence at scale for data creation, enrichment and interpretation with great potential to improve the performance of AI systems and increase the wider adoption of AI in general.

This is a 6-week course requiring 4-5 hours effort per week and covers the following skills

  • Examine the use of crowdsourcing for gathering data
  • Explain how cognitive biases and other human factors influence data quality
  • Describe the use of active learning in the creation of crowdsourced training data
  • Demonstrate the design of crowdsourcing tasks with quality control mechanisms
  • Discuss the evaluation of ML models with humans in the loop

Its outline is:

Week 1: Crowdsourcing for High-quality Data Collection and The ImageNet Story

  • The intuition behind crowdsourcing
  • The role of crowdsourcing platforms
  • The need for high-quality data for AI models
  • What is ImageNet, the gap it filled, and how it was built

Week 2: Quality Control Mechanisms for Crowdsourcing

  • Workers' motives and behaviors
  • Quality control mechanisms in crowdsourcing
  • Incentives in crowdsourcing (like gamification)
  • Cognitive aspects and psychometric methods

Week 3: Factors Affecting Quality in Crowdsourcing

  • Tradeoff between task pricing and quality of output
  • The role of workers' demographics, qualifications and skills
  • The importance of task clarity and work environments
  • The concepts of task packaging, task framing and task priming

Week 4: Human Input for Data Creation and Model Evaluation in AI

  • The importance of data collection
  • Data generation
  • The role of crowdsourcing in advanced machine learning
  • Taxonomy of microtasks

Week 5: Reducing Worker Effort: Active Learning

  • Approaches to reducing worker effort
  • The implications of reducing labeling effort
  • The key idea of active learning
  • Query strategies for selecting informative instances

Week 6: Interpreting, Evaluating, and Debugging ML models

  • The notion of model interpretability
  • The role of humans in the interpretability process
  • Debugging ML pipelines and related challenges

This promises to be an interesting course that will equip learners to understand and apply crowdsourcing methods as a means of gathering high-quality data from humans for machine learning and to be able to identify biases in datasets as a result of how they are gathered or created. It is designed to impart a set of skills required for a career in the fields of Data Science and Machine Learning, and the broader realms of Artificial Intelligence.

The first presentation of the course starts today February 21st, 2023 and has a free-of-charge Audit Track. This gives access to the course content for 6 weeks. If you want a certificate on successful completion and to be able to do graded assignments the cost to upgrade to the Verified Track is $149.

While this course is at Intermediate level it has minimal prerequisites. The blurb states that some prior experience with a programming language (e.g. Python, Java) is recommended but not required. Even so you might get more from it after completing an introductory course such as  Understanding the World Through Data, another free course on the edX platform. 


More Information

Data Creation and Collection for Artificial Intelligence via Crowdsourcing

Understanding the World Through Data

Introduction to Data Science with Python

Principles of Data Science Ethics

Applied Data Science Ethics

Related Articles 

Brand New Data Science Courses on edX

Data Scientists Salary Data 

Data Scientist Best Paying Entry-Level Job Says Glassdoor 

Ethics of AI - A Course From Finland 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.



Remembering Robert Dennard, Inventor of DRAM

Robert Dennard, the IBM engineer who invented the key memory technology DRAM that we now rely on in our computers smartphones and tablets,  passed away on April 23rd, 2024, at age 91.

GitLab Adds Google Cloud Integration

GitLab has released public betas of the integration features with Google Cloud that the company announced in 2023. The integration means GitLab’s DevSecOps workflow integrates with Google Cloud secu [ ... ]

More News

raspberry pi books



or email your comment to:

Last Updated ( Friday, 24 February 2023 )