browser-use + Ertas

Automate any web task with browser-use — the open-source Playwright + LLM agent that navigates, clicks, types, and extracts from web pages, with first-class support for fine-tuned local models via Ertas.

Overview

browser-use is the leading open-source browser automation agent, with over 50K GitHub stars by mid-2026 and an MIT license. The framework wraps Playwright with an LLM-driven control loop: the model receives a screenshot or accessibility tree of the current page, decides what action to take (click, type, scroll, navigate, extract), and the framework executes that action in a real browser. This pattern enables agents to operate any web interface — including ones without APIs — by interacting with them the same way humans do.

The framework supports both vision-based control (where the model sees screenshots) and DOM-based control (where the model reads the accessibility tree). Recent improvements have pushed browser-use to 88%+ accuracy on standard browser-task benchmarks, making it production-viable for use cases like automated form filling, web scraping, account-management workflows, lead enrichment, and end-to-end testing of web applications. The combination of MIT licensing, broad LLM compatibility, and strong benchmark performance has made browser-use the default choice for open-source browser automation in 2026.

How Ertas Integrates

Ertas-trained models work with browser-use through any OpenAI-compatible endpoint. After fine-tuning a model on browser-task traces in Ertas Studio (screenshots paired with action sequences and reasoning), you deploy via Ollama, vLLM, or Ertas Cloud and point browser-use at the endpoint. Fine-tuned models can substantially outperform general-purpose models on domain-specific browser tasks: a model fine-tuned on your specific SaaS workflows, dashboard layouts, and form patterns will navigate them more reliably than a frontier general model that has never seen them.

For cost-sensitive deployments, the Ertas + browser-use combination is particularly valuable. Browser tasks tend to be repetitive within a single product or domain, which means a small fine-tuned model (7B-14B class) can match or exceed frontier model performance on the specific browsing patterns it was trained on. Combined with self-hosted browser-use deployment, this enables web automation at orders of magnitude lower cost per task than using GPT-5.5 or Claude Opus 4.7 via API for the same workflows. Privacy-sensitive applications (anything involving user credentials, internal dashboards, or proprietary data) also benefit from the fully self-hosted pattern.

Getting Started

1
Collect or generate browser-task training data
Record successful browser-task traces (screenshots + actions + reasoning) for your domain. Ertas Studio supports this multimodal training data format natively.
2
Fine-tune a vision-capable model in Ertas Studio
Use a multimodal base (e.g., Gemma 4, Qwen3-VL) and fine-tune on your browser-task corpus to produce a model specialized for your specific web workflows.
3
Deploy to a vision-enabled inference endpoint
Serve via vLLM, Ollama, or Ertas Cloud with multimodal support enabled. browser-use will call this endpoint with screenshots and prompts.
4
Install browser-use and configure the model
Install browser-use and configure the LLM provider to point at your Ertas inference endpoint. Choose vision-based or DOM-based control mode based on your tasks.
5
Run automated workflows
Issue natural-language tasks; browser-use orchestrates the LLM and browser to complete them. Log successful and failed traces for ongoing model refinement.

python

from browser_use import Agent
from langchain_openai import ChatOpenAI

# Point browser-use at your Ertas-trained vision-capable model
llm = ChatOpenAI(
    base_url="http://localhost:8000/v1",  # vLLM with multimodal support
    model="ertas-browser-agent-7b",
    api_key="not-needed",
    temperature=0.1,
)

agent = Agent(
    task="""
        Log into our admin dashboard at admin.example.com,
        navigate to the user management page, and export
        the list of all users created in the last 30 days
        as a CSV file.
    """,
    llm=llm,
)

result = await agent.run()
print(f"Task completed: {result.success}")
print(f"Output file: {result.artifacts}")

Run a browser-use agent backed by an Ertas-trained model that has been specialized on your specific dashboard workflows.

Benefits

Automate any web interface — including ones without APIs — through real-browser interaction
MIT license with no commercial restrictions on derivative work
88%+ accuracy on standard browser-task benchmarks with frontier models
Fine-tuned domain-specific models can match frontier accuracy at fraction of inference cost
Fully self-hosted deployment for privacy-sensitive credentials and internal dashboards
Active community of 50K+ stars with regular framework improvements