//No Comment - Fake News Spreads Like Real News, Flu Detector & NNs Detect Clickbait

Written by Janet Swift

Monday, 23 January 2017

• It's Always April Fools' Day! On the Difficulty of Social Network Misinformation Classification via Propagation Features

• Flu Detector: Estimating influenza-like illness rates from online user-generated content

• We used Neural Networks to Detect Clickbaits: You won’t believe what happened Next!

Sometimes the news is reported well enough elsewhere and we have little to add other than to bring it to your attention.

No Comment is a format where we present original source information, lightly edited, so that you can decide if you want to follow it up.

It's Always April Fools' Day! On the Difficulty of Social Network Misinformation Classification via Propagation Features

Fake news is important and there have been promises to detect and remove it but this might be more difficult to do automatically than you might think. It appears that fake news behaves a lot like real news.

Given the huge impact that Online Social Networks (OSN) had in the way people get informed and form their opinion, they became an attractive playground for malicious entities that want to spread misinformation, and leverage their effect. In fact, misinformation easily spreads on OSN and is a huge threat for modern society, possibly influencing also the outcome of elections, or even putting people's life at risk (e.g., spreading "anti-vaccines" misinformation). Therefore, it is of paramount importance for our society to have some sort of "validation" on information spreading through OSN.

The need for a wide-scale validation would greatly benefit from automatic tools. In this paper, we show that it is difficult to carry out an automatic classification of misinformation considering only structural properties of content propagation cascades. We focus on structural properties, because they would be inherently difficult to be manipulated, with the the aim of circumventing classification systems.

To support our claim, we carry out an extensive evaluation on Facebook posts belonging to conspiracy theories (as representative of misinformation), and scientific news (representative of fact-checked content). Our findings show that conspiracy content actually reverberates in a way which is hard to distinguish from the one scientific content does: for the classification mechanisms we investigated, classification F1-score never exceeds 0.65 during content propagation stages, and is still less than 0.7 even after propagation is complete.

And the conclusion is:

Our findings suggest that in Facebook users interact with different types of content in similar ways, reinforcing the hypothesis of echo chambers. Inside these chambers, strongly polarized by topic, content propagation exhibits very similar structural properties, that are therefore less useful in content classification. These results highlight the necessity of including content-related features, or polarization metrics, in future analysis (i.e., whether particular users and their echo chambers are more polarized towards one type of content).

Unfortunately, misinformation creators can easily control content-related features, in order to avoid algorithmic detection. Moreover, user polarization can be clearly understood from past users’ behaviors, but it takes time to understand polarization of new users. Hence, automatic detection of fake news remains an open challenge.

Also see

The spreading of misinformation online

The wide availability of user-provided content in online social media facilitates the aggregation of people around common interests, worldviews, and narratives. However, the World Wide Web is a fruitful environment for the massive diffusion of unverified rumors. In this work, using a massive quantitative analysis of Facebook, we show that information related to distinct narratives––conspiracy theories and scientific news––generates homogeneous and polarized communities (i.e., echo chambers) having similar information consumption patterns. Then, we derive a data-driven percolation model of rumor spreading that demonstrates that homogeneity and polarization are the main determinants for predicting cascades’ size.

Flu Detector: Estimating influenza-like illness rates from online user-generated content

You know that there is real information contained in social media the problem is quantifying it. Many have tried to extract illness and in particular flu rates from what people tweet and post. In this case we have an example you can try out:

We provide a brief technical description of an online platform for disease monitoring, titled as the Flu Detector (fludetector.cs.ucl.ac.uk). Flu Detector, in its current version (v.0.5), uses either Twitter or Google search data in conjunction with statistical Natural Language Processing models to estimate the rate of influenza-like illness in the population of England.

Its back-end is a live service that collects online data, utilises modern technologies for large-scale text processing, and finally applies statistical inference models that are trained offline. The front-end visualises the various disease rate estimates. Notably, the models based on Google data achieve a high level of accuracy with respect to the most recent four flu seasons in England (2012/13 to 2015/16). This highlighted Flu Detector as having a great potential of becoming a complementary source to the domestic traditional flu surveillance schemes.

It could well end up making Google's flu forecasts more accurate.

nocomment

We used Neural Networks to Detect Clickbaits: You won’t believe what happened Next!

What a title for a paper - you have to at least read some of it.

In case you don't know what clickbait is:

Clickbaits work by exploiting the insatiable appetite of humans to indulge their curiosity. According to the Loewenstein’s information gap theory of curiosity, people feel a gap between what they know and what they want to know, and curiosity proceeds in two basic steps – first, a situation reveals a painful gap in our knowledge (that’s the headline), and then we feel an urge to fill this gap and ease that pain (that’s the click).

And why would we want to get rid of clickbait:

Clickbaits clog up the social media news streams with low-quality content and violate general codes of ethics of journalism. Despite a huge amount of backlash and being a threat to journalism, their use has been rampant and thus it’s important to develop techniques that automatically detect and combat clickbaits.

And how can we do it?

Existing methods for automatically detecting clickbaits rely on heavy feature engineering and domain knowledge. Here, we introduce a neural network architecture based on Recurrent Neural Networks for detecting clickbaits. Our model relies on distributed word representations learned from a large unannotated corpora, and character embeddings learned via Convolutional Neural Networks.

Experimental results on a dataset of news headlines show that our model outperforms existing techniques for clickbait detection with an accuracy of 0.98 with F1-score of 0.98 and ROC-AUC of 0.99.

With the title it has, we are 100% sure that that the neural network would have flagged the paper as clickbait. Does this increase or decrease its accuracy stats?

nocomment

To be informed about new articles on I Programmer, sign up for our weekly newsletter,subscribe to the RSS feed and follow us on, Twitter, Facebook, Google+ or Linkedin.

nocommentSocial

Spring One 2024 Sessions Now Available Online
04/10/2024

The sessions from this year's SpringOne conference covering everything and anything concerned with the Spring framework are accessible online for free.

+ Full Story

Apache Lucene Improves Sparce Indexing
22/10/2024

Apache Lucene 10 has been released. The updated version adds a new IndexInput prefetch API, support for sparse indexing on doc values, and upgraded Snowball dictionaries resulting in improved tokeniza [ ... ]

+ Full Story

More News

Comments

or email your comment to: comments@i-programmer.info

Last Updated ( Monday, 23 January 2017 )