Google unveils AI with vision-language-action capabilities
Google, Microsoft, Alphabet and AI Artificial Intelligence logos are seen in this illustration taken, May 4, 2023. (Reuters Photo)


Google, the U.S. based global tech giant, has taken another leap forward in artificial intelligence (AI) with the introduction of Robotics Transformer 2 (RT-2). This AI model, revealed on Friday, equips robots with the ability to comprehend vision and language, enabling them to perform a range of specific actions.

In a blog post, Google highlighted that RT-2 is a vision-language-action model, expertly trained on a vast array of text and images gathered from the internet. By absorbing this extensive data, RT-2 gains a grasp of general ideas and concepts, which it can then transfer to guide a robot's behavior effectively.

Unlike simple chatbots, robots require a connection to the real world and a comprehension of their capabilities. Google emphasized that RT-2 serves as a knowledge base, empowering robots to complete tasks that include picking up apples or taking out the trash with ease and efficiency.

This AI's remarkable feature is its capability to enable a single model to execute complex reasoning while providing output for robot actions. This unique characteristic allows it to transfer learned concepts to novel situations, making robots learn more akin to human learning patterns. Google believes this advancement signifies the rapid convergence of AI and robotics, while also demonstrating the immense potential for developing more general-purpose robots.

In June, Google unveiled another significant breakthrough with its self-improving AI agent for robotics, dubbed RoboCat. This innovative agent possesses the ability to learn diverse tasks across different contexts, thereby generating new training data to enhance its techniques autonomously.

Google emphasized that RoboCat can master a new task with as few as 100 demonstrations, drawing from an extensive and diverse dataset. This capability significantly accelerates robotics research by minimizing the need for human-supervised training. It marks a critical step toward creating versatile and general-purpose robots that can adapt to various scenarios.

The integration of AI models like RT-2 and RoboCat showcases Google's continuous commitment to pushing the boundaries of artificial intelligence and its application in robotics. As these cutting-edge technologies mature, the prospects for smarter, more versatile, and autonomous robotic systems become increasingly promising. The world stands on the cusp of a new era in robotics, driven by the fusion of vision, language and action, and Google is at the forefront of this transformative journey.