|Neural Networks - A Better Speller?|
|Written by Mike James|
|Wednesday, 10 August 2016|
A new paper has the title "Robsut Wrod Reocginiton via semi-Character Recurrent Neural Network" and it isn't a misprint. Find out about the Cmabrigde Uinervtisy effect and how it might make a neural network a better spell checker than any you can find today.
It is a well known fact that language, particularly English, is very redundant in its written form. If you have ever played the game where you have to guess well known phrases with the constants removed you will know that you can read what at first sight appear to be strings of random letter in a sudden flash of understanding.
On the other hand, machine language understanding and processing doesn't seem to "get it" in the same way that we do. If you have used a spelling package recently then you might well have been frustrated by the fact that it can't seem to offer you the obvious correction to your misspelling. Of course it is only obvious to you because you are applying some very special language processing software of your own.
What is really surprising is that the particular language processing software you run is capable of reading text that is severely distorted by letter transpositions. Consider the classic example known as the Cmabrigde Uinervtisy effect:
"Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn’t mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteer be at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe"
If you are a native English speaker/reader then you might well be shocked by how easy this mess is to read. Letter order? Who needs it!
A team of researchers at Johns Hopkins University decided to find out just how much word order does matter. Previous work revealed that reading difficult increased with jumbling the letters in the middle of the word, at the end and at the beginning. Put another way what matters most are the first letters in the word.
Correcting word jumbles, spelling mistakes in other words, sounds like something that could be taught to a neural network but most of the neural networks that we hear about having such huge successes are feedforward networks. Feedforward networks don't have any way to take order into account; for this you need a recurrent neural network where some of the outputs are fed back as inputs. Recurrent neural networks are known to be powerful, but they are more difficult to train. In this case the inputs were the first and last characters and then a bag of characters without the first or last character and without order information.
After training, the neural network was pitted against existing spell checkers on the sort of word jumble we have been considering. You might be able to guess that it did quite a bit better.
This might not be an entirely fair test, however, if you are simply interested in methods of creating spell checkers. The traditional approach to spell checking is to compute some sort of distance measure between the misspelled word and dictionary entries. The checker gives the user the choice of the closest. If the distance measure was designed to take into account the first and last characters as being correct and only matched the inner letters then the results might be very different.
The researchers suggest that the same approach might be useful in normalizing idiosyncratic text such as text speak - Cooooolll to Cool. Perhaps it could improve communication across the generations.
It certainly provides food for thought about exactly how we read. Has anyone done anything to see how dyslexics differ for example? And what will Google make of an article with so many mispelled words?
Robsut Wrod Reocginiton via semi-Character Recurrent Neural Network Keisuke Sakaguchi, Kevin Duh, Matt Post, Benjamin Van Durme
or email your comment to: firstname.lastname@example.org
|Last Updated ( Wednesday, 10 August 2016 )|