Microsoft Translator API
Written by Sue Gee   
Saturday, 02 April 2016

Microsoft has released a new version of its Translator API. This provides developers with the same speech-to-speech facilities as those used in the Skype Translator and in the iOS and Android Microsoft Translator apps.


The blog post announcing the availability of the new Microsoft Translator API Microsoft describes it as:

the first end-to-end speech translation solution optimized for real-life conversations (vs. simple human to machine commands) available on the market. 

It also explains how it works using AI technologies, such as deep neural networks for speech recognition and text translation and outlines the following four stages for performing speech translation.

  1. Automatic Speech Recognition (ASR) — A deep neural network trained on thousands of hours of audio analyzes incoming speech. This model is trained on human-to-human interactions rather than human-to-machine commands, producing speech recognition that is optimized for normal conversations.

  2. TrueText — A Microsoft Research innovation, TrueText takes the literal text and transforms it to more closely reflect user intent. It achieves this by removing speech disfluencies, such as “um”s and “ah”s, as well as stutters and repetitions. The text is also made more readable and translatable by adding sentence breaks, proper punctuation and capitalization. (see picture below)

  3. Translation — The text is translated into any of the 50+ languages supported by Microsoft Translator. The eight speech languages have been further optimized for conversations by training on millions of words of conversational data using deep neural networks powered language models.

  4. Text to Speech — If the target language is one of the eighteen speech languages supported, the text is converted into speech output using speech synthesis. This stage is omitted in speech-to-text translation scenarios such as video subtitling. 


(click to enlarge)

Microsoft Translator covers two types of API use and integration:

1) Speech-to-speech translation is available for English, French, German, Italian, Portuguese, Spanish, Chinese Mandarin and  Arabic.

2) Speech-to-text translation, for scenarios such as webcasts or BI analysis, allows developers to translate any of these eight supported conversation translation languages into any of the supported 50+ text languages.

A two-hour free trial is available. This provides 7,200 transactions where a transaction is equivalent to 1 second of audio input and is the same as the free monthly tier. Beyond this subscriptions are are available: 



The prospect of being able to communicate without language barriers is becoming ever more a reality and the more we use it the better the facility will become. Ironically there's a error in the sample Microsoft uses in its artwork above - Gurdeep is the object of the final sentence in the English and becomes the subject in the French. This sort of error will quickly be corrected by machine learning as more data becomes available.



Digital Play Shown To Be Good For Kids

When designed with their needs in mind, video games can benefit children’s well-being. This finding comes research from  UNICEF in partnership with LEGO and the University of Sheffield.

Grafana Releases Loki 3

Grafana has announced the release of Loki 3, with improvements including query acceleration with Bloom filters and native OpenTelemetry support.

More News


raspberry pi books



or email your comment to:

Last Updated ( Saturday, 02 April 2016 )