Alexa Teacher Models Outperform GPT-3
Written by Sue Gee   
Wednesday, 31 August 2022

Researchers at Amazon Alexa AI are making breakthroughs in conversational AI and natural language processing using models that learn new concepts and transfer knowledge from one language or task to another with minimal human input. 

Thanks to its encoder-decoder architecture, as opposed to decoder only which characterizes other large language models, the Alexa Teacher Model outperforms GPT-3 in tasks such as summarization and machine translation.

Introducing the AlexaTM 20B, a 20-billion parameter sequence-to-sequence (seq2seq) generative language model, Saleh Soltan, a senior applied scientist with Alexa AI explains how this aligns to Alexa AI's move to the new paradigm of "generalizable intelligence", in which models can learn new concepts and transfer knowledge from one language or task to another with minimal human input. Such models allow Alexa AI researchers to efficiently develop new features and improve Alexa on multiple languages at the same time.

Solton and his colleagues are presenting a paper about the Alexa Techer Model at the forthcoming Knowledge Discovery and Data Mining Conference which shows how the 10-billion- and two-billion-parameter AlexaTM models can improve on state-of-art cross-lingual transfer learning and increase Alexa’s accuracy in different locales and have followed this up with an arXiv paper titled "AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2seq Model". The experiments reported in this paper, which use only publicly available data,  show that AlexaTM 20B can not only transfer what it learns across languages but also learn new tasks from just a handful of examples, i.e. few-shot learning.

AlexaTM20B intent

In this example, included in the paper, the model is provided with three examples of different intents, or tasks that the customer wants executed: book-restaurant, play-music, and get-weather. The model can generalize from these to the unfamiliar intent get-news-update and generate utterances corresponding to that intent in multiple languages, Spanish, French German and Hindi.

Another example in the paper shows news summarization by AlexaTM 20B when given only a single example. The input to the encoder is in the yellow box, the decoder’s output in the pink box:

AlexaTM20B summary

Soltan states that Amazon will be releasing the model publicly for non-commercial use to aid the development and evaluation of multilingual large language models. Amazon has also implemented a function to enable loading the model on up to eight GPUs with limited GPU memory for running inference on instances of Amazon Web Services’ EC2 computation service, which he says provides a more flexible way for researchers to use AlexaTM 20B in their own work.

alexaailogo

More Information

20B-parameter Alexa model sets new marks in few-shot learning

Related Articles

Amazon Invests In Conversational AI

The Unreasonable Effectiveness Of GPT-3

 Alexa Prize SocialBot Grand Challenge 5

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

 

Banner


.NET 10 Final Release Candidate Focuses On MAUI
23/10/2025

The final release candidate of .NET 10, the platform created from a combination of .NET Framework and .NET Core, has been released. Overall, this release focuses on quality and stabilization [ ... ]



IBM Launches Granite Version 4.0 and Granite-Docling
23/10/2025

IBM has launched Granite 4.0, the next generation of open-source, small but efficient, IBM language models, together with Granite-Docling, the next gen document format converter.


More News

pico book

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Wednesday, 31 August 2022 )