|Kaggle Contest To Detect Chatbot Essays|
|Written by Sue Gee|
|Friday, 03 November 2023|
As LLMs like ChatGPT rapidly improve their ability to generate text similar to human-written content, educators have very real concerns about how to distinguish between students own work and that generated with undue help from artificial intelligence. A Kaggle contest has just launched to detect whether an essay was written by a student or an LLM.
With its community of over 15 million members, Kaggle is the obvious place to turn to for some machine-learning approach to of authenticating the work undertaken by conscientious students and of deterring this new method of cheating. And Kagglers seem enthusiastic to tackle the problem and there are already 320 teams, mostly individuals, making submissions. With almost 3 months to go before the Final Submission Deadline there's plenty of time to join in.
The contest comes from Vanderbilt University and the Learning Agency Lab with financial support from the Bill & Melinda Gates Foundation, Schmidt Futures, and Chan Zuckerberg Initiative.
The challenge is to develop a machine learning model that can accurately detect whether an essay was written by a student or an LLM.
The competition dataset comprises about 10,000 essays. All of the essays were written in response to one of seven essay prompts. In each prompt, the students were instructed to read one or more source texts and then write a response. This same information may or may not have been provided as input to an LLM when generating an essay. The competition blurb states:
Essays from two of the prompts compose the training set; the remaining essays compose the hidden test set. Nearly all of the training set essays were written by students, with only a few generated essays given as examples. You may wish to generate more essays to use as training data.
In fact one of the participant's has already made additional ai-generated essays available
This is a Code Competition and submissions must be made through either a CPU or a GPU Notebook and require no more than 9 hours of runtime.
The prize pool of $110 will be divided between Leaderboard Prizes, awarded for predictive performance and Efficiency Prizes, where the runtime required for a submission is also evaluated - and this is restricted to CPU only. Winning a Leaderboard Prize does not preclude you from winning an Efficiency Prize. For both prizes 1st Place wins $20,000.
While the immediate concern of the competition is to identify essays written using LLMs in a middle-school or high-school context, in a broader context the models participants devise will help identify telltale LLM artifacts and advance the state of the art in LLM text detection overall.
or email your comment to: firstname.lastname@example.org
|Last Updated ( Friday, 03 November 2023 )|