Google Translate now has over 200 million monthly active users and can be used to translate between 64 different languages.
It has come a long way from its beginnings in 2001 when it provided a, not-very-good, facility to translate eight languages to and from English.
To mark this milestone Franz Och, who heads Google's machine translation group wrote about its transformation over the years on the Google Blog:
Och was recruited by Google in 2004 from DARPA where he was working on a new, data-driven, approach to machine translation, one he was initially skeptical about scaling up to Google proportions.
Initially the system took 40 hours and 1,000 machines to translate 1,000 sentences but by the end of April 2006 the online version of the Arabic-English system was launched - and since then it has expanded. He sums up the current situation:
In a given day we translate roughly as much text as you’d find in 1 million books. To put it another way: what all the professional human translators in the world produce in a year, our system translates in roughly a single day. By this estimate, most of the translation on the planet is now done by Google Translate.
Google's translation facilities are even more widely used when you take into account how the translate widget is built into websites like this one, allowing users read web pages in any of over fifty languages.
Moreover, the Translate API, which is now a paid-for service is one of Google's most popular APIs as it means that products and services can reach a global audience without all users needing a common language.
Only a few years ago any AI expert would have predicted that the route to machine translation was almost certainly to be found in a study of grammar and syntax and complex mathematical systems that modelled the underlying regularities in language. It would have been a huge shock to reveal the the most successful translation method in use today is essentially statistical and works so well because there is so much data to feed it.
Google Translate is perhaps one of the best examples of "big data" in action.
If you know any C you will know that Brian Kernighan is one half of the team responsible for the seminal book The C Programming Language or just K&R (the R for Dennis Ritchie) that most of us have [ ... ]