Kaggle Contest To Detect Chatbot Essays
Written by Sue Gee   
Friday, 03 November 2023

As LLMs like ChatGPT rapidly improve their ability to generate text similar to human-written content, educators have very real concerns about how to distinguish between students own work and that generated with undue help from artificial intelligence. A Kaggle contest has just launched to detect whether an essay was written by a student or an LLM. 


With its community of over 15 million members, Kaggle is the obvious place to turn to for some machine-learning approach to of authenticating the work undertaken by conscientious students and of deterring this new method of cheating. And Kagglers seem enthusiastic to tackle the problem and there are already 320 teams, mostly individuals, making submissions. With almost 3 months to go before the Final Submission Deadline there's plenty of time to join in.

The contest comes from Vanderbilt University and the Learning Agency Lab with financial support from the Bill & Melinda Gates Foundation, Schmidt Futures, and Chan Zuckerberg Initiative. 

The challenge is to develop a machine learning model that can accurately detect whether an essay was written by a student or an LLM.

The competition dataset comprises about 10,000 essays. All of the essays were written in response to one of seven essay prompts. In each prompt, the students were instructed to read one or more source texts and then write a response. This same information may or may not have been provided as input to an LLM when generating an essay. The competition blurb states: 

Essays from two of the prompts compose the training set; the remaining essays compose the hidden test set. Nearly all of the training set essays were written by students, with only a few generated essays given as examples. You may wish to generate more essays to use as training data.

In fact one of the participant's has already made additional ai-generated essays available

This is a Code Competition and submissions must be made through either a CPU or a GPU Notebook and require no more than 9 hours of runtime.

The prize pool of $110 will be divided between Leaderboard Prizes, awarded for predictive performance and Efficiency Prizes, where the runtime required for a submission is also evaluated - and this is restricted to CPU only. Winning a Leaderboard Prize does not preclude you from winning an Efficiency Prize. For both prizes 1st Place wins $20,000. 

While the immediate concern of the competition is to identify essays written using LLMs in a middle-school or high-school context, in a broader context the models participants devise will  help identify telltale LLM artifacts and advance the state of the art in LLM text detection overall.


learning agency lab logo


More Information

LLM - Detect AI Generated Text

Related Articles

Vesuvius Challenge - Progress and Prizes

AI Village Capture The Flag

Kaggle Enveloped By Google Cloud

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.



Pharo 12 Adds New Breakpoint System

The latest version of Pharo, the open-source Smalltalk-inspired language and core library adds a new breakpoint model based on the debug point system.

OpenAI Introduces GPT-4o, Loses Sutskever

It's an eventful week for OpenAI, the research company dedicated to making advances towards Artificial General Intelligence that are both safe and beneficial to all. A day after it showcased its lates [ ... ]

More News

raspberry pi books



or email your comment to: comments@i-programmer.info


Last Updated ( Friday, 03 November 2023 )