Many websites rely on CAPTCHA to distinguish genuine humans from bots and malware. Now researchers have devised software that can be trained to decipher and defeat audio CAPTCHAs.
CAPTCHAs are those frustrating tests that ask you to type in characters that correspond to words that have been obscured or defaced by graffiti.
The idea is this task should be easy for humans and impossible for non-humans but to assist the visually impaired many CAPTCHAs offer an audio alternative, in which a computerized voice reads out letters or digits distorted by noise, and these have been proved to be vulnerable to machine recognition.
Decaptcha is a system to defeat audio CAPTCHAs based on non-continuous speech (i.e. a series of digits or letters rather than spoken words). Devised by a team of computer scientists who presented their research at this week's IEEE Symposium on Security and Privacy in Oakland California, it uses audio-processing techniques to remove the noise and identify the digits.
The software has to be trained, a process that takes around 20 minutes for each type of CAPTCHA, but then can solve CAPTCHAs without human assistance.
The software has proved effective against the systems currently used by eBay (82% accuracy), Microsoft (49%) and Yahoo (45%) but has has much less success with reCAPTCHA which uses background conversations to obscure the digits.
Even so with a success rate of 1.5%, a machine that made hundreds of attempts would quite quickly make a correct match - and would probably have more success than humans who also find reCAPTCHA difficult to decipher.
The researchers conclude that the use of "semantic noise" - i.e. noise such as background conversations or music is the least harmful to human understanding
at levels while also being most able to hinder Decaptcha’s performance.
Full research report:
The Failure of Noise-Based Non-Continuous Audio Captchas