An open source project to combat "stylometry", the study of attributing authorship to documents based only on the linguistic style they exhibit, is proving that it is possible to change writing style so as to evade detection.
Artificial Intelligence techniques are routinely used to detect plagiarism and recently were employed to reveal that Harry Potter author J K Rowling is indeed the author of The Cuckoo's Calling published under the byline of Robert Galbraith. Now software is tackling the opposite problem - anonymizing writing style to protect the identity of the originator.
Students from the Privacy, Security and Automation Lab (PSAL) at Drexel University recently won the Andreas Pfitzmann Best Student Paper Award at the 12th Privacy Enhancing Technologies Symposium for their paper “Use Fewer Instances of the Letter “i”: Toward Writing Style Anonymization,” which explains this new framework for anonymizing writing style.
The idea behind Anonymouth is that sylometry can be a threat in situations where individuals want to ensure their privacy while continuing to interact with others over the Internet. A presentation about the program cites two hypothetical scenarios:
- Alice the Anonymous Blogger vs.Bob the Abusive Employer
- Anonymous Forum vs. Oppressive Government
and one anecdotal one from Daniel Domschiet-Berg's book Inside Wikileaks:
“I nudged Julian with my foot. We exchanged glances and started giggling. If someone had run WikiLeaks documents through such a program, he would have discovered that the same two people were behind all the various press releases, document summaries, and correspondence issued by the project."
The JStylo-Anonymouth (JSAN) framework is work in progress at PSAL under the supervision of assistant professor of computer science, Dr. Rachel Greenstadt. It consists of two parts:
- JStylo - authorship attribution framework, used as the underlying feature extraction employing a set of linguistic features
- Anonymouth - authorship evasion (anonymization) framework, which suggests changes that need to be made
In the small scale user study (10 participants) reported in the award-winning paper, 80% were able to anonymize their documents to a limited extent. Modifying pre-written documents was found to be difficult and the anonymization did not hold up to more extensive feature sets. However, the students point out:
It is important to note that Anonymouth is only the rst step toward a tool to achieve stylometric anonymity with respect to state-of-the-art authorship attribution techniques. The topic needs further exploration in order to accomplish signicant anonymity.
The JSAN framework is available on GitHub under a GNU AGPLv3 license for any developer who wants to combat the threat of stylometry.