Introducing the Gemini 2.5 Computer Use Model

Available in preview via the Gemini API, Google DeepMind has unveiled the Gemini 2.5 Computer Use model — a specialized extension of Gemini 2.5 Pro that empowers AI agents to interact directly with user interfaces just like humans do.

This breakthrough takes AI automation beyond structured APIs, giving developers tools to create agents that can click, type, scroll, fill forms, and navigate complex web apps autonomously — while maintaining top-tier safety and performance.

🚀 What is the Gemini 2.5 Computer Use Model?

Earlier this year, Google hinted that computer-use capabilities were coming to the Gemini API. Now, the Gemini 2.5 Computer Use model is officially available in preview.
It’s built to power agents that can understand on-screen interfaces and perform actions — from filling out forms to managing dashboards and interacting with browser-based workflows.

Unlike conventional automation scripts, this model leverages Gemini’s visual reasoning to truly understand what’s displayed on-screen — enabling it to outperform existing solutions across multiple benchmarks like Online-Mind2Web, WebVoyager, and AndroidWorld.

🧠 How It Works

The model’s capabilities are accessed through the new computer_use tool in the Gemini API. Here’s the process:

The system takes a user request, screenshot of the interface, and recent action history as input.
The model then analyzes these inputs and generates a function call — such as a click, scroll, or text entry.
The client executes the action, captures a new screenshot, and sends it back — forming an iterative agent loop until the task is complete.

This allows agents to perform dynamic, multi-step tasks, such as logging into web portals, managing spreadsheets, or booking appointments — all while asking for user confirmation for sensitive actions like purchases.

Although primarily optimized for web browsers, the model already shows strong promise in mobile UI control and will expand its capabilities further in future releases.

🧩 Real-World Demos

Here are a few demos Google shared to showcase what the Gemini 2.5 Computer Use model can do:

Pet Spa Task Automation:
The agent retrieves data from one site, fills a CRM form on another, and schedules a follow-up — entirely through UI interactions.
Sticky Note Organizer:
The model autonomously organizes digital sticky notes on a web app by dragging and dropping them into categories.

These demos highlight how the model understands layouts and context — not just text — enabling real, interactive workflows.

📊 Performance and Latency

Benchmark testing shows Gemini 2.5 Computer Use leading in both accuracy and speed.

It achieves over 70% accuracy with significantly lower latency (~225 seconds) on Browserbase’s Online-Mind2Web evaluation, outperforming other browser control agents by a large margin.

🔒 Safety and Reliability

Google DeepMind emphasizes a responsible-first approach to AI agents.
The Computer Use model includes multiple layers of built-in safety features designed to prevent misuse or unintended actions, including:

Per-step safety service: Evaluates every model action before execution.
System-level instructions: Let developers require confirmation for high-stakes operations.
Hard safety blocks: Prevent the model from performing harmful actions like bypassing CAPTCHAs or controlling sensitive systems.

Developers are encouraged to use these tools and thoroughly test their workflows before deploying agents into production.

🧪 Early Use Cases

Early adopters and internal teams have already been using the Gemini 2.5 Computer Use model for:

Automated UI testing (cutting development time dramatically)
Workflow automation and virtual assistants
Data collection from web interfaces

Google’s payments platform team reported that it helped fix over 60% of failed UI test executions, while companies like Autotab and Poke.com praised the model’s speed and contextual accuracy.

🧰 Get Started

You can try the Gemini 2.5 Computer Use model today:

🧑‍💻 Demo it: Available via Browserbase’s hosted environment.
⚙️ Build with it: Access the API through Google AI Studio or Vertex AI.
💬 Join the community: Share your feedback and ideas in the Gemini Developer Forum.

“The Gemini 2.5 Computer Use model marks a turning point — bringing us closer to AI agents that truly understand and navigate digital environments like humans do.”

Introducing the Gemini 2.5 Computer Use Model

🚀 What is the Gemini 2.5 Computer Use Model?

🧠 How It Works

🧩 Real-World Demos

📊 Performance and Latency

🔒 Safety and Reliability

🧪 Early Use Cases

🧰 Get Started

200+ Google AI Contractors Laid Off and Fears They’re Being Replaced by the Very AI They Helped Build

Dart AI Powered Project Management Made Simple

Meta Glasses Demo Fail at Connect 2025: Not Wi-Fi, Says CTO

Meta’s Smart Glasses Might Make You Smarter. They’ll Certainly Make You More Awkward

Microsoft Launches Copilot Fall Release: A Human-Centered AI Experience

Categories

Introducing the Gemini 2.5 Computer Use Model

🚀 What is the Gemini 2.5 Computer Use Model?

🧠 How It Works

🧩 Real-World Demos

📊 Performance and Latency

🔒 Safety and Reliability

🧪 Early Use Cases

🧰 Get Started

You might like