browser-use + Ertas

    Automate any web task with browser-use — the open-source Playwright + LLM agent that navigates, clicks, types, and extracts from web pages, with first-class support for fine-tuned local models via Ertas.

    Overview

    browser-use is the leading open-source browser automation agent, with over 50K GitHub stars by mid-2026 and an MIT license. The framework wraps Playwright with an LLM-driven control loop: the model receives a screenshot or accessibility tree of the current page, decides what action to take (click, type, scroll, navigate, extract), and the framework executes that action in a real browser. This pattern enables agents to operate any web interface — including ones without APIs — by interacting with them the same way humans do.

    The framework supports both vision-based control (where the model sees screenshots) and DOM-based control (where the model reads the accessibility tree). Recent improvements have pushed browser-use to 88%+ accuracy on standard browser-task benchmarks, making it production-viable for use cases like automated form filling, web scraping, account-management workflows, lead enrichment, and end-to-end testing of web applications. The combination of MIT licensing, broad LLM compatibility, and strong benchmark performance has made browser-use the default choice for open-source browser automation in 2026.

    How Ertas Integrates

    Ertas-trained models work with browser-use through any OpenAI-compatible endpoint. After fine-tuning a model on browser-task traces in Ertas Studio (screenshots paired with action sequences and reasoning), you deploy via Ollama, vLLM, or Ertas Cloud and point browser-use at the endpoint. Fine-tuned models can substantially outperform general-purpose models on domain-specific browser tasks: a model fine-tuned on your specific SaaS workflows, dashboard layouts, and form patterns will navigate them more reliably than a frontier general model that has never seen them.

    For cost-sensitive deployments, the Ertas + browser-use combination is particularly valuable. Browser tasks tend to be repetitive within a single product or domain, which means a small fine-tuned model (7B-14B class) can match or exceed frontier model performance on the specific browsing patterns it was trained on. Combined with self-hosted browser-use deployment, this enables web automation at orders of magnitude lower cost per task than using GPT-5.5 or Claude Opus 4.7 via API for the same workflows. Privacy-sensitive applications (anything involving user credentials, internal dashboards, or proprietary data) also benefit from the fully self-hosted pattern.

    Getting Started

    1. 1

      Collect or generate browser-task training data

      Record successful browser-task traces (screenshots + actions + reasoning) for your domain. Ertas Studio supports this multimodal training data format natively.

    2. 2

      Fine-tune a vision-capable model in Ertas Studio

      Use a multimodal base (e.g., Gemma 4, Qwen3-VL) and fine-tune on your browser-task corpus to produce a model specialized for your specific web workflows.

    3. 3

      Deploy to a vision-enabled inference endpoint

      Serve via vLLM, Ollama, or Ertas Cloud with multimodal support enabled. browser-use will call this endpoint with screenshots and prompts.

    4. 4

      Install browser-use and configure the model

      Install browser-use and configure the LLM provider to point at your Ertas inference endpoint. Choose vision-based or DOM-based control mode based on your tasks.

    5. 5

      Run automated workflows

      Issue natural-language tasks; browser-use orchestrates the LLM and browser to complete them. Log successful and failed traces for ongoing model refinement.

    python
    from browser_use import Agent
    from langchain_openai import ChatOpenAI
    
    # Point browser-use at your Ertas-trained vision-capable model
    llm = ChatOpenAI(
        base_url="http://localhost:8000/v1",  # vLLM with multimodal support
        model="ertas-browser-agent-7b",
        api_key="not-needed",
        temperature=0.1,
    )
    
    agent = Agent(
        task="""
            Log into our admin dashboard at admin.example.com,
            navigate to the user management page, and export
            the list of all users created in the last 30 days
            as a CSV file.
        """,
        llm=llm,
    )
    
    result = await agent.run()
    print(f"Task completed: {result.success}")
    print(f"Output file: {result.artifacts}")
    Run a browser-use agent backed by an Ertas-trained model that has been specialized on your specific dashboard workflows.

    Benefits

    • Automate any web interface — including ones without APIs — through real-browser interaction
    • MIT license with no commercial restrictions on derivative work
    • 88%+ accuracy on standard browser-task benchmarks with frontier models
    • Fine-tuned domain-specific models can match frontier accuracy at fraction of inference cost
    • Fully self-hosted deployment for privacy-sensitive credentials and internal dashboards
    • Active community of 50K+ stars with regular framework improvements

    Related Resources

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.