The Rise of Computer-Use AI Agents
In the two years since the generative AI boom began with the launch of ChatGPT, we’ve become accustomed to having AI assistants available in our web browsers and phones, ready to answer our questions. However, the next frontier in AI is the emergence of agents that can take action on our behalf. While AI agents have already made inroads for power users like coders, everyday consumers have yet to experience the convenience of these AI assistants.
That’s about to change. Tech giants like Anthropic, Google DeepMind, and OpenAI have recently unveiled experimental models that can navigate computers like humans โ searching the web, filling out forms, and clicking buttons. With guidance from the user, these AI agents can handle tasks such as ordering groceries, booking rides, price comparisons, and travel planning. Although these early models have limited abilities and aren’t widely available yet, they offer a glimpse into the future of AI-human collaboration.
The Key Players: Anthropic, Google DeepMind, and OpenAI
Anthropic was the first to introduce computer-use capabilities with its Claude chatbot in October 2022. Claude navigates by analyzing screenshots and counting pixels to move the cursor and click. However, it’s currently only available to developers building on Anthropic’s language models, according to an Anthropic spokesperson.
Next up was Google DeepMind’s Project Mariner, built on the Gemini 2 language model. Mariner, described as an “early research prototype,” is only accessible to “trusted testers” for now and operates solely within the Chrome browser’s active tab.
OpenAI followed in January with its Operator computer-use agent (CUA). Currently a “research preview” available only to OpenAI’s $200/month premium users, Operator aims to work with any website, according to OpenAI engineer Yash Kumar. Kumar notes that Operator could potentially expand beyond browsers to work with desktop apps.
Safety Concerns and Limitations
While these AI agents offer exciting possibilities, they also raise safety concerns. Anthropic has warned about prompt injection attacks, where malicious prompts could cause the model to take unexpected actions. Zachary Lipton, an associate professor at Carnegie Mellon University, questions the extent of the risks, noting that the companies haven’t revealed much about how the agents work.
Currently, these AI agents cannot log in to sites, agree to terms of service, solve captchas, or enter payment details. If faced with such obstacles, they hand control back to the user. OpenAI also states that Operator doesn’t capture screenshots during login or payment processes.
The Future of AI Agents and Human-AI Collaboration
While the companies haven’t provided timelines for broader consumer access, it’s likely that we’ll see AI agents become more widely available in 2023 โ either from the tech giants or startups creating more affordable alternatives.
OpenAI’s Kumar views Operator as a stepping stone toward artificial general intelligence (AGI), broadening AI’s utility and helping people save time on everyday tasks. As we move closer to a world where AI agents handle routine tasks, the companies will undoubtedly shift their focus to developing more advanced, personalized AI assistants โ perhaps even achieving the level of sophistication depicted in the movie Her.
Source: IEEE Spectrum