LiteWebAgent
The Open-Source Suite for VLM-Based Web-Agent Applications
Authors: Danqing Zhang, Balaji Rama, Jingyi Ni, Shiying He, Fu Zhao, Kunyu Chen, Arnold Chen, Junyu Cao
PathOnAI.org, University of Texas at Austin
Published at NAACL 2025Abstract
We introduce LiteWebAgent, an open-source suite for VLM-based web agent applications. LiteWebAgent addresses a critical gap in the web agent ecosystem by providing an extensible core agent framework featuring planning, memory, and tree search capabilities, alongside a production-ready solution that combines minimal serverless backend configuration and intuitive user and browser interfaces.
- Extensible Agent Framework: A decoupled action generation and grounding model with support for various agent types and integration with advanced research components.
- Synchronous and Asynchronous APIs: Seamless integration with FastAPI for asynchronous calls and serverless functions, requiring minimal deployment effort.
- Flexible User Interface: A comprehensive system configuration panel and chat interface that features voice integration and task execution visualization.
LiteWebAgent bridges the gap between research frameworks and production systems, providing researchers and developers with a comprehensive toolkit for deploying VLM-based web agents in real-world applications.
Video Demonstration
Watch the demonstration of LiteWebAgent in action, showcasing its lightweight implementation and efficient web automation capabilities.
Demonstration
Full-Stack Web Application

The web application provides an interactive chat interface with voice integration and live-action visualization. The backend features asynchronous serverless API built with FastAPI and Playwright, allowing for efficient remote browser control and agent execution.
Chrome Extension

The Chrome extension allows users to control their local browser via the Chrome DevTools Protocol (CDP), offering enhanced privacy and personalized browser sessions. The intuitive user interface integrates directly within the browser environment, making it easy to issue commands and monitor agent actions.
System Architecture

The LiteWebAgent system consists of three main components:
Frontend
Provides a user interface with parameter configuration options, a chat interface for issuing commands, and a browser interface that displays the agent's actions in real-time.
Backend
Handles action generation and grounding, transforming natural language instructions into executable code that can be run in the browser environment.
Browser Environment
Provides the execution environment for agent actions, with support for remote browsers, Chrome DevTools Protocol, and Chromium instances.
Agent Framework


Decoupled Action Generation and Grounding
LiteWebAgent separates action generation from grounding, enhancing control over web interactions. A VLM produces natural language actions via function calls, while grounding converts these into executable Playwright code using webpage observations.
Agent Planning
Supports various strategies: function calling agents using LLM planning, high-level planning agents that replan based on execution trajectory, and context-aware planning agents that incorporate environment observations.
Agent Memory
Incorporates Agent Workflow Memory (AWM) in planning and replanning, enabling the agent to reference relevant workflows and past experiences for more effective task completion and error recovery.
Tree Search Capabilities
Implements various search algorithms (BFS, DFS, MCTS) that enable the agent to explore multiple trajectories and balance exploitation with exploration when navigating complex web environments.
Deployed Systems
Vercel-based Web Application
A production-ready full-stack web application that provides users with an agent-controlled remote browser. The application features interactive chat with voice integration, live-action visualization, and an asynchronous serverless API built with FastAPI and Playwright.
Chrome Extension
A Chrome extension that leverages LiteWebAgent's API to control an existing Chrome browser via the Chrome DevTools Protocol (CDP). This approach offers enhanced privacy and personalized browser sessions, with an intuitive user interface integrated directly within the browser environment.
Get Started
Interested in trying LiteWebAgent? Our open-source implementation provides researchers and developers with a comprehensive toolkit for deploying VLM-based web agents in real-world applications.