Predicting The Oscar Winner With Data Science

Written by Lucy Black

Sunday, 24 February 2019

After its success with predicting last year's Best Picture Award at the 90th Academy Awards, the data science bootcamp Thinkful has repeated the exercise for this year Oscars and predicts that Roma will win. UPDATE: And the winner wasn't Roma. Instead it went to the second place prediction, Green Book. What went wrong for data science?

Last year the team at online coding bootcamp, Thinkful, used supervised learning to look for patterns in past outcomes of the Best Picture Award to predict future ones. Having done the hard work of collecting and cleaning lots of data for the initial exercise in 2018 - and encouraged by the accuracy of the prediction that Shape of Water wold win - re-running the exercise was a relatively simple matter:

In his blog post Adam Levenson writes:

In order to make our projection, we utilized Random Forest Classifier ... a machine learning algorithm that determines the relationships between variables through the creation and evaluation of decision trees. In the case of our Oscar prediction, these decision trees ask simple Yes/No questions like “Did the film win Best Picture at the Directors Guild?” or “Is the film’s IMBD rating higher than X?” and the Random Forest Classifier figures out their relative importance.

The relative importance weights had, of course, changed from last year's to take account of it. While the top three weights remained in the same order Won Directors and Won Producers both increased in importance by a few percent while Won Actors, which had been the fourth most important weight lost a few percent to drop to sixth. Won Golden Globe also dropped one place - from eighth to ninth. The weightings of IMDB Ratings and Bafta Nominations increased a little, allowing them to ascend by one place each.

The final weightings are shown in this table which shows graphically that Won Directors is easily the most importance feature:

oscar2019feat

With the weightings updated the model could be used to provide the probability of winning among the nominations and this is the prediction:

oscar2019win

Last year Thinkful's predicted, and the actual winner, Shape of Water had a win probability of well ahead of just one near contender, with seven others consigned to the outfield with probabilities less than 0.1 This year things are not as clear cut. As Levenson puts it:

The gap between Roma and the next closest film - Green Book is 12% (36% → 24%.) That’s a closer predicted margin than last year’s two-picture race between Shape of Water (47%) and Three Billboards Outside Ebbing, Missouri (27%.)

There's only a matter of hours to wait to see if Thinkful has scored a second success.

UPDATE: Roma, Alfonso Cuarón's semi-autobiographical Netflix movie, shot in black and white, about his childhood growing up in Mexico City, didn't win Best Picture, although it did win three Oscars, for best director, best foreign film and best cinematography, only the second black and white movie to do so.

So Data Science failed to come to the same conclusion as the Academy judges. Did Roma's key characteristics - being on Netflix and available only a limited number of cinemas, being in Spanish and being monochrome sway the Academy away from awarding it Best Picture? Should the classifier incoporate yes/no questions like "on wide general release", "in full color" and "in English" next time around?

The fact that the winner of Best Picture, Green Book, was Thinkful's second choice, is a vindication that the methodolgy has a lot going for it. Given Green Book has a higher IMDB Rating than Roma - 8.3 compared to 7.9 - also suggests one way in which the weightings for 2020 should be tweaked.

oscar2019

More Information

Data Science Says Roma Will Win Best Picutre in 2019

Data Science Predicts Oscar Winner Correctly

Get On The Machine Learning Bandwagon With Google

Earthquake Prediction Using Machine Learning

I Know Who You Are By The Way You Take A Corner

Game Of Thrones Analysed

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Apache Arrow 21 Released
07/07/2025

Version 21 of Apache Arrow has been released, including the first official Swift implementation of the platform. Improvements to Arrow 21 include exposing gRPC in the Flight client builder and improve [ ... ]

+ Full Story

Geany 2.1 Improves UI
14/07/2025

Geany, the lightweight IDE, has been updated to add new themes and support for more file types and platform-native file selection dialogs.

+ Full Story

More News

Comments

or email your comment to: comments@i-programmer.info

Last Updated ( Monday, 25 February 2019 )

Recent Articles

Recent Book Reviews

Popular Articles

More Information

Related Articles

Comments