The StackExchange Machine Learning Contest launched this week on Kaggle has a prize pool of $20,000 and winners will also improve the way that StackOverflow works - which might be reward enough.
StackOverflow, the original site in StackExchange, the network of community-driven Q&A sites, receives more than six thousand new questions every weekday, so it's hardly surprising that it feels it could do with some automated way of dealing with inappropriate postings.
The blog post announcing the contest explains that the current moderation tools include closing, editing, deleting. voting, commenting and more and it hints that some of these actions can seem pretty unfriendly, especially to new users. To have a post publicly slapped down for infringing rules that are fairly vague and a matter of judgement can feel like a personal attack.
It is therefore looking for an algorithm that can predict which posts (around 6% of the total, which in the case of StackOverflow works out at some 360 per day) are going to end up closed and for what reasons.
The task, which is explained on the contest website, Kaggle, is to build a classifier that predicts whether or not a question will be closed given the question as submitted, along with the reason that the question was closed.
The advantages of having an automatic algorithm is that, not only can the verdict be delivered in private, users really shouldn't feel that it is the work of a heavy handed moderator. After all an AI agent is a dispassionate entity - for now at least.
Given the quantity of training data available the task shouldn't be too difficult and probably doesn't need deep knowledge of the subject area.
The data for the contest is already available and the deadline for submission is Tuesday 9 October. You can submit a maximum of 2 entries per day but can select only 1 final submission for judging.
The prizes are:
There's also a prize of $1,000 for an informative visualization of the data.
So if you want to make StackOverflow a better environment where innocent users don't get shot down and where off-topic posts are automatically weeded out, now is the time to put machine learning to good use.