RT-2 A Breakthrough For Robot Control

Written by Mike James

Wednesday, 16 August 2023

RT-2 is the new version of what Google calls its vision-language-action (VLA) model and at last we can start to see the potential in general purpose robots. While everyone is hyping AI chatbots, this is where the real revolution is happening.

Google DeepMind has announced Robotic Transformer 2 (RT-2), a "first-of-its-kind" vision-language-action (VLA) model that uses data scraped from the Internet to enable better robotic control through plain language commands. The ultimate goal is to create general-purpose robots that can navigate human environments.

RT-2Ketchup-Cube

The key idea is that many things can be cast into a form of language and then they can be implemented using Large Language Models LLMs which are driving the success of AI at the moment. In this case robot actions are treated as a language. The language tokens are translated into actions and this how the robot performs a task. The second component is the Vision Language Module VLM which is trained on images presented to a convolutional network and a language model trained using language describing the images. VLMs can be used to caption images, describe images and so on - once trained you present them with an image and they generate text. In this case the text also includes robot instructions.

Today, we announced ????????-????: a first of its kind vision-language-action model to control robots. ????

It learns from both web and robotics data and translates this knowledge into generalised instructions.

Find out more: https://t.co/UWAzrhTOJG pic.twitter.com/U4VW8IsvAD
— Google DeepMind (@GoogleDeepMind) July 28, 2023

RT-1 was based on PaLM-E and one version of RT-2 still uses it and a second version uses PaLI-X - the two have slightly different strengths and weaknesses.

RT-2commandexamples

RT-2’s ability to transfer information to actions shows promise for robots to more rapidly adapt to novel situations and environments. In testing RT-2 models in more than 6,000 robotic trials, the team found that RT-2 functioned as well as the previous model, RT-1, on tasks in its training data, or “seen” tasks. And it almost doubled its performance on novel, unseen scenarios to 62% from RT-1’s 32%.

RT-2Rock Hammer

In other words, with RT-2, robots are able to learn more like we do — transferring learned concepts to new situations. This ability to generalize is very important. Unlike GPT3 and other language models, RT-2 cannot afford to hallucinate and return untruthful results - just think what it would mean. It is why it is very likely that the next big breakthrough in AI will come from "embodied" AI. That is, putting AI agents in control of a physical body ensures that reality rules and hallucinations are quickly beaten out of the system.

Not only does RT-2 show how advances in AI are cascading rapidly into robotics, it shows enormous promise for more general-purpose robots. While there is still work to be done to enable helpful robots in human-centered environments, RT-2 shows us an exciting future for robotics just within grasp. You can't help but wonder what the result of using this sort of system with Spot or Atlas not to mention self driving cars. All of our current line up of top robots started out long before this sort of approach was possible.

More Information

What is RT-2: Speaking robot: Our new AI model translates vision and language into robotic actions

RT-2: New model translates vision and language into action

Paper: RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

Google's Large Language Model Takes Control

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Google Releases Python Client For Data Commons
01/07/2025

Google has released a new Python client library for Data Commons based on the V2 REST API. They say the new library enhances how data developers can make use of Data Commons.

+ Full Story

Python Parallel? Not Quite Python 14
23/07/2025

Python is a wonderful language, but lack of speed is its main drawback. Can the Python community create a fully threaded Python to be proud of? It's a big challenge.

+ Full Story

More News

Comments

or email your comment to: comments@i-programmer.info

Last Updated ( Wednesday, 16 August 2023 )

Recent Articles

Recent Book Reviews

Popular Articles

More Information

Related Articles

Comments