|//No Comment - Fake News Spreads Like Real News, Flu Detector & NNs Detect Clickbait|
|Written by Janet Swift|
|Monday, 23 January 2017|
• It's Always April Fools' Day! On the Difficulty of Social Network Misinformation Classification via Propagation Features
• Flu Detector: Estimating influenza-like illness rates from online user-generated content
• We used Neural Networks to Detect Clickbaits: You won’t believe what happened Next!
Sometimes the news is reported well enough elsewhere and we have little to add other than to bring it to your attention.
No Comment is a format where we present original source information, lightly edited, so that you can decide if you want to follow it up.
It's Always April Fools' Day! On the Difficulty of Social Network Misinformation Classification via Propagation Features
Fake news is important and there have been promises to detect and remove it but this might be more difficult to do automatically than you might think. It appears that fake news behaves a lot like real news.
Given the huge impact that Online Social Networks (OSN) had in the way people get informed and form their opinion, they became an attractive playground for malicious entities that want to spread misinformation, and leverage their effect. In fact, misinformation easily spreads on OSN and is a huge threat for modern society, possibly influencing also the outcome of elections, or even putting people's life at risk (e.g., spreading "anti-vaccines" misinformation). Therefore, it is of paramount importance for our society to have some sort of "validation" on information spreading through OSN.
The need for a wide-scale validation would greatly benefit from automatic tools. In this paper, we show that it is difficult to carry out an automatic classification of misinformation considering only structural properties of content propagation cascades. We focus on structural properties, because they would be inherently difficult to be manipulated, with the the aim of circumventing classification systems.
To support our claim, we carry out an extensive evaluation on Facebook posts belonging to conspiracy theories (as representative of misinformation), and scientific news (representative of fact-checked content). Our findings show that conspiracy content actually reverberates in a way which is hard to distinguish from the one scientific content does: for the classification mechanisms we investigated, classification F1-score never exceeds 0.65 during content propagation stages, and is still less than 0.7 even after propagation is complete.
And the conclusion is:
Our findings suggest that in Facebook users interact with different types of content in similar ways, reinforcing the hypothesis of echo chambers. Inside these chambers, strongly polarized by topic, content propagation exhibits very similar structural properties, that are therefore less useful in content classification. These results highlight the necessity of including content-related features, or polarization metrics, in future analysis (i.e., whether particular users and their echo chambers are more polarized towards one type of content).
Unfortunately, misinformation creators can easily control content-related features, in order to avoid algorithmic detection. Moreover, user polarization can be clearly understood from past users’ behaviors, but it takes time to understand polarization of new users. Hence, automatic detection of fake news remains an open challenge.
The wide availability of user-provided content in online social media facilitates the aggregation of people around common interests, worldviews, and narratives. However, the World Wide Web is a fruitful environment for the massive diffusion of unverified rumors. In this work, using a massive quantitative analysis of Facebook, we show that information related to distinct narratives––conspiracy theories and scientific news––generates homogeneous and polarized communities (i.e., echo chambers) having similar information consumption patterns. Then, we derive a data-driven percolation model of rumor spreading that demonstrates that homogeneity and polarization are the main determinants for predicting cascades’ size.
You know that there is real information contained in social media the problem is quantifying it. Many have tried to extract illness and in particular flu rates from what people tweet and post. In this case we have an example you can try out:
We provide a brief technical description of an online platform for disease monitoring, titled as the Flu Detector (fludetector.cs.ucl.ac.uk). Flu Detector, in its current version (v.0.5), uses either Twitter or Google search data in conjunction with statistical Natural Language Processing models to estimate the rate of influenza-like illness in the population of England.
Its back-end is a live service that collects online data, utilises modern technologies for large-scale text processing, and finally applies statistical inference models that are trained offline. The front-end visualises the various disease rate estimates. Notably, the models based on Google data achieve a high level of accuracy with respect to the most recent four flu seasons in England (2012/13 to 2015/16). This highlighted Flu Detector as having a great potential of becoming a complementary source to the domestic traditional flu surveillance schemes.
It could well end up making Google's flu forecasts more accurate.
What a title for a paper - you have to at least read some of it.
In case you don't know what clickbait is:
Clickbaits work by exploiting the insatiable appetite of humans to indulge their curiosity. According to the Loewenstein’s information gap theory of curiosity, people feel a gap between what they know and what they want to know, and curiosity proceeds in two basic steps – first, a situation reveals a painful gap in our knowledge (that’s the headline), and then we feel an urge to fill this gap and ease that pain (that’s the click).
Clickbaits clog up the social media news streams with low-quality content and violate general codes of ethics of journalism. Despite a huge amount of backlash and being a threat to journalism, their use has been rampant and thus it’s important to develop techniques that automatically detect and combat clickbaits.
And how can we do it?
Existing methods for automatically detecting clickbaits rely on heavy feature engineering and domain knowledge. Here, we introduce a neural network architecture based on Recurrent Neural Networks for detecting clickbaits. Our model relies on distributed word representations learned from a large unannotated corpora, and character embeddings learned via Convolutional Neural Networks.
Experimental results on a dataset of news headlines show that our model outperforms existing techniques for clickbait detection with an accuracy of 0.98 with F1-score of 0.98 and ROC-AUC of 0.99.
or email your comment to: firstname.lastname@example.org
|Last Updated ( Monday, 23 January 2017 )|