|Predicting The Oscar Winner With Data Science|
|Written by Lucy Black|
|Sunday, 24 February 2019|
After its success with predicting last year's Best Picture Award at the 90th Academy Awards, the data science bootcamp Thinkful has repeated the exercise for this year Oscars and predicts that Roma will win. UPDATE: And the winner wasn't Roma. Instead it went to the second place prediction, Green Book. What went wrong for data science?
Last year the team at online coding bootcamp, Thinkful, used supervised learning to look for patterns in past outcomes of the Best Picture Award to predict future ones. Having done the hard work of collecting and cleaning lots of data for the initial exercise in 2018 - and encouraged by the accuracy of the prediction that Shape of Water wold win - re-running the exercise was a relatively simple matter:
In his blog post Adam Levenson writes:
In order to make our projection, we utilized Random Forest Classifier ... a machine learning algorithm that determines the relationships between variables through the creation and evaluation of decision trees. In the case of our Oscar prediction, these decision trees ask simple Yes/No questions like “Did the film win Best Picture at the Directors Guild?” or “Is the film’s IMBD rating higher than X?” and the Random Forest Classifier figures out their relative importance.
The relative importance weights had, of course, changed from last year's to take account of it. While the top three weights remained in the same order Won Directors and Won Producers both increased in importance by a few percent while Won Actors, which had been the fourth most important weight lost a few percent to drop to sixth. Won Golden Globe also dropped one place - from eighth to ninth. The weightings of IMDB Ratings and Bafta Nominations increased a little, allowing them to ascend by one place each.
The final weightings are shown in this table which shows graphically that Won Directors is easily the most importance feature:
With the weightings updated the model could be used to provide the probability of winning among the nominations and this is the prediction:
Last year Thinkful's predicted, and the actual winner, Shape of Water had a win probability of well ahead of just one near contender, with seven others consigned to the outfield with probabilities less than 0.1 This year things are not as clear cut. As Levenson puts it:
The gap between Roma and the next closest film - Green Book is 12% (36% → 24%.) That’s a closer predicted margin than last year’s two-picture race between Shape of Water (47%) and Three Billboards Outside Ebbing, Missouri (27%.)
There's only a matter of hours to wait to see if Thinkful has scored a second success.
UPDATE: Roma, Alfonso Cuarón's semi-autobiographical Netflix movie, shot in black and white, about his childhood growing up in Mexico City, didn't win Best Picture, although it did win three Oscars, for best director, best foreign film and best cinematography, only the second black and white movie to do so.
So Data Science failed to come to the same conclusion as the Academy judges. Did Roma's key characteristics - being on Netflix and available only a limited number of cinemas, being in Spanish and being monochrome sway the Academy away from awarding it Best Picture? Should the classifier incoporate yes/no questions like "on wide general release", "in full color" and "in English" next time around?
The fact that the winner of Best Picture, Green Book, was Thinkful's second choice, is a vindication that the methodolgy has a lot going for it. Given Green Book has a higher IMDB Rating than Roma - 8.3 compared to 7.9 - also suggests one way in which the weightings for 2020 should be tweaked.
or email your comment to: email@example.com
|Last Updated ( Monday, 25 February 2019 )|