AI Agents That Can Use Your Computer: The Next Big Thing in AI?

Photo of author
Written By Derrick Rodriguez

Scientific writer

The Emergence of Computer-Use Agents

In the wake of the generative AI boom kicked off by ChatGPT, tech giants like Anthropic, Google DeepMind, and OpenAI are now unveiling a new breed of AI agents that can navigate and interact with computers much like humans do. These agents can search the web, fill out forms, click buttons, and even perform tasks like ordering groceries, calling a ride-share service, or booking flights and restaurant reservations โ€“ all with a bit of guidance from the user.

Anthropic was the first to announce this new capability in October 2022, revealing that its Claude chatbot can now “use computers the way humans do.” Google DeepMind followed suit in December with its Project Mariner, an “early research prototype” built on top of Google’s Gemini 2 language model. Not to be outdone, OpenAI unveiled its own Operator agent in January, calling it a “research preview” available only to premium subscribers for now.

How Computer-Use Agents Work

These computer-use agents rely on chain-of-thought reasoning to break down instructions into a series of tasks they can complete. They navigate by viewing screenshots of the user’s screen and counting pixels to move the cursor and click on various elements. If they need additional information, they pause and ask the user for input. Before taking a final action, like placing an order or making a reservation, the agents request confirmation from the user.

Currently, these agents have limitations โ€“ they cannot log into sites, agree to terms of service, solve captchas, or enter payment details. In such cases, control is handed back to the user. Companies like OpenAI and Anthropic have also acknowledged potential safety risks, such as prompt injection attacks, where malicious actors could trick the agents into taking unintended actions.

See also  Trump Administration Unveils Ambitious $500 Billion AI Investment Plan

The Future of AI Agents and Human-AI Collaboration

While these computer-use agents are still in their early stages and not yet widely available to consumers, their emergence signals a significant shift in the AI landscape. As Zachary Lipton, an associate professor of machine learning at Carnegie Mellon University, notes, “What’s intriguing here is the possibility of people starting to hand over the keys.”

Yash Kumar, an engineer on OpenAI’s Operator team, sees these agents as a stepping stone toward artificial general intelligence (AGI) and a more collaborative future between humans and AI. “The ability to use the same interfaces and tools that humans interact with on a daily basis broadens the utility of AI, helping people save time on everyday tasks,” he says.

While the full implications of this technology remain to be seen, it’s clear that AI agents capable of using computers like humans could significantly impact how we work, shop, and navigate the digital world. As these agents become more advanced and widely available, they may reshape our relationship with technology, ushering in a new era of human-AI collaboration.

Original Source: https://spectrum.ieee.org/ai-agents-computer-use