Home » Google’s Latest AI Lets Robots Understand, Plan, and Act in Real Environments

Google’s Latest AI Lets Robots Understand, Plan, and Act in Real Environments

Google’ new AI model improves robot reasoning, enabling spatial awareness, task completion, and instrument reading in real environments.

by Editor
0 comments

Google has introduced a new AI model designed to help robots better understand and interact with the physical world, addressing one of the biggest challenges in robotics: reasoning beyond instructions.

The model, Gemini Robotics-ER 1.6, focuses on “embodied reasoning,” enabling robots to interpret visual inputs, plan tasks, and determine when a task is complete.

This marks a shift from command-following machines to systems capable of making context-aware decisions.

The update builds on earlier versions by improving spatial reasoning and multi-view understanding, allowing robots to process information from multiple camera feeds and dynamic environments more effectively.

It also introduces new capabilities such as instrument reading, enabling robots to interpret gauges and indicators commonly found in industrial settings.

Bridging digital physical gap

A key improvement lies in how the model handles spatial reasoning tasks. Gemini Robotics-ER 1.6 can identify objects, count them, and determine relationships between them with greater accuracy. It can also point to objects as part of its reasoning process, helping it break down complex tasks into smaller steps.

This capability is critical in real-world environments where robots must interact with objects, navigate cluttered spaces, and make decisions based on incomplete or changing information.

The model also improves success detection, allowing robots to assess whether a task has been completed correctly. This is particularly important in automation workflows, where systems must decide whether to retry an action or move forward.

Multi-view reasoning is another area of advancement. Robots often rely on multiple camera inputs, such as overhead and wrist-mounted views. The model can combine these perspectives to form a more complete understanding of the environment, even in cases of occlusion or poor visibility.

Reading real world signals

One of the most practical additions is the ability to read instruments such as pressure gauges, sight glasses, and digital displays.

This capability was developed in collaboration with Boston Dynamics, where robots like Spot are used for facility inspections.

“Capabilities like instrument reading and more reliable task reasoning will enable Spot to see, understand, and react to real-world challenges completely autonomously,” said Marco da Silva, Vice President and General Manager of Spot at Boston Dynamics.

The model uses a combination of visual reasoning and code execution to interpret readings. It can zoom into images, identify key elements such as needles and markings, and calculate values with high precision.

Performance benchmarks show significant gains. Instrument reading accuracy improved from 23 percent in earlier models to as high as 93 percent with agentic vision enabled.

The model also shows better compliance with safety constraints, such as avoiding unsafe object handling.

Google said the model is its safest robotics system so far, with improved ability to detect hazards and follow physical safety rules in both text and visual scenarios.

Gemini Robotics-ER 1.6 is now available to developers through the Gemini API and Google AI Studio, along with tools to test and build applications using embodied reasoning.

 

 

Originally written by: Neetika Walter

Source: Interesting Engineering

Published on: 14 April 2026

Link to original article: Google’s latest AI lets robots understand, plan, and act in real environments

You may also like

Leave a Comment