Learn about a groundbreaking research presentation that introduces Agent S, an innovative agentic framework enabling computers to interact with Graphical User Interfaces (GUIs) in a human-like manner. Explore how this framework tackles major challenges in task automation through an experience-augmented hierarchical planning system that combines external knowledge retrieval with internal experience recall. Discover the integration of the Agent-Computer Interface (ACI) with Multimodal Large Language Models (MLLMs) for enhanced reasoning and control capabilities. Examine the framework's impressive performance on the OSWorld benchmark, where it achieved an 83.6% relative improvement in success rate compared to existing methods, and its successful generalization across different operating systems demonstrated through the WindowsAgentArena benchmark. Access the open-source codebase to explore and contribute to advancing human-computer interaction capabilities through this cutting-edge framework.
Overview
Syllabus
Fellowship: Agent S, An Open Agentic Framework that Uses Computers Like a Human
Taught by
Launchpad