While it seems that big data can solve more or less any problem, there are some things it just isn't up to. Using social network data to judge public opinion and specifically to predict the outcome of elections seems as it if is a rock solid data mining application, but is it?
Twitter data has been used to predict election results so often that we tend to think that the opinions expressed in the tweet are the key to predicting the future. Various papers have been published that claim predictions of all sorts of future results including the stock market and the occurrence of pandemics. But are these claims justified? Is there predictive knowledge in the massed tweet?
In an imaginatively titled paper “I Wanted to Predict Elections with Twitter and all I got was this Lousy Paper” researcher Daniel Gayo-Avello at the University of Oviedo puts the case that Twitter's predictive power is far from proven. He even goes so far as to say:
“No, you cannot predict elections with Twitter”
He then goes on to point out the flaws in trying to use Twitter in this way. Perhaps the most damming is the observation:
"It’s not prediction at all! I have not found a single paper predicting a future result. All of them claim that a prediction could have been made; i.e. they are post-hoc analysis and, needless to say, negative results are rare to ﬁnd."
Post-hoc prediction always carries with it the possibility of bias. He also points out that it isn't fair to compare the prediction against an even chance outcome - after all it isn't often that there is an apriori equal chance of all the parties winning. He goes on to list eight points where previous analyses are flawed and makes recommendations for future work that might prove that Twitter can predict elections. As you can guess, the most important is for researchers to actually make predictions and firm up what would be a good prediction result. He also provides an annotated bibliography, complete with the fairly stark verdict that in no case was an election prediction actually made.
Notice that Gayo-Avello hasn't proved that you can't predict election results or anything else using Twitter data - just that no-one else has convincingly proved that it can.
Of course this has a commercial aspect. If companies are willing to spend big money to get hold of the big data that Twitter offers they need to be convinced that it has predictive power. If Twitter data can't predict elections can it predict market trends and public sentiment?
This is a matter that needs clearing up.
I Wanted to Predict Elections with Twitter and all I got was this Lousy Paper arXiv:1204.6441v1