In the field of generative AI, Anthropic is a company that innovates. After launching Artefacts on Claude in June, a feature copied by ChatGPT, the company is adding a new, unprecedented capability to its chat agent: PC control. In addition to this announcement, which is part of an update to Claude 3.5 Sonnet, Anthropic is introducing its new language model: Claude 3.5 Haiku. An overview of what's new.
Claude 3.5 Sonnet: Better computer control and code capabilities
in Blog post On Tuesday, October 22, 2024, Anthropic unveiled a major update to Claude 3.5 Sonnet, the most efficient language model. It includes two major new features: AI computer control and enhanced coding capabilities.
Claude can now control your computer
The Claude 3.5 Sonnet update introduces a particularly innovative option: the ability to perform tasks independently on a PC. This feature, available in public beta on the API, allows you to delegate tasks to AI on your computer using a simple text request.
Developers can ask Claude to use computers as people do, by looking at the screen, moving the cursor, clicking buttons, and typing text, Anthropic said in the press release.
In the example shown in the video (see below), the user asks the AI to fill out an online form using data from a spreadsheet. After scanning the screen, Claude realized that the spreadsheet was not actually the correct one, and he conducted an online search to find the required information, before filling out the form.
However, Claude specifies that the tool is still in the experimental phase, “Sometimes cumbersome and error-prone.”. In the video, all the items needed to perform the task are already open on the desktop, which indicates the shortcomings of Claude 3.5 Sonnet in finding the information itself. The option also seems to display some slowness in performing tasks.
Improved coding capabilities
The Claude 3.5 Sonnet update also allows the model to improve its overall capabilities. According to Anthropic, her programming skills, in particular, have increased tenfold. In SWE-bench Verified, a benchmark tool developed by OpenAI to evaluate performance on complex software development tasks, Claude 3.5 Sonnet improved its accuracy from 33.4% to 49%. A result that now exceeds that shown by OpenAI o1-preview.
An advancement that, according to Anthropic, was highlighted by several first-time users: “GitLab, which tested the DevSecOps task model, found that it enables better reasoning (up to 10% depending on use cases) without additional latency. […] The browser company, which uses the model to automate web workflows, noted that Claude 3.5 Sonnet outperformed all previously tested models.identifies the company.
Cloud 3.5 Haiku: A new model that emphasizes efficiency
Cloud 3.5 Haiku is the successor to Cloud 3 Haiku, which was unveiled last March. Haiku models are more compact and faster, designed for light tasks that require immediate response. The Claude 3.5 Haiku, like the Claude 3.5 Sonnet, offers improved capabilities in programming tasks (it received a score of 40.6% on the SWE-bench Verified test), while offering lower reaction time and better instruction following. Anthropy determines that the model is appropriate “For user-facing products, specialized sub-agent tasks, and creating personalized experiences from massive amounts of data like purchase history, pricing, or inventory records.”. Claude 3.5 Haiku will be available in the coming weeks via the API.
“Certified gamer. Problem solver. Internet enthusiast. Twitter scholar. Infuriatingly humble alcohol geek. Tv guru.”