browser-use + Ertas
Automate any web task with browser-use — the open-source Playwright + LLM agent that navigates, clicks, types, and extracts from web pages, with first-class support for fine-tuned local models via Ertas.
Overview
browser-use is the leading open-source browser automation agent, with over 50K GitHub stars by mid-2026 and an MIT license. The framework wraps Playwright with an LLM-driven control loop: the model receives a screenshot or accessibility tree of the current page, decides what action to take (click, type, scroll, navigate, extract), and the framework executes that action in a real browser. This pattern enables agents to operate any web interface — including ones without APIs — by interacting with them the same way humans do.
The framework supports both vision-based control (where the model sees screenshots) and DOM-based control (where the model reads the accessibility tree). Recent improvements have pushed browser-use to 88%+ accuracy on standard browser-task benchmarks, making it production-viable for use cases like automated form filling, web scraping, account-management workflows, lead enrichment, and end-to-end testing of web applications. The combination of MIT licensing, broad LLM compatibility, and strong benchmark performance has made browser-use the default choice for open-source browser automation in 2026.
How Ertas Integrates
Ertas-trained models work with browser-use through any OpenAI-compatible endpoint. After fine-tuning a model on browser-task traces in Ertas Studio (screenshots paired with action sequences and reasoning), you deploy via Ollama, vLLM, or Ertas Cloud and point browser-use at the endpoint. Fine-tuned models can substantially outperform general-purpose models on domain-specific browser tasks: a model fine-tuned on your specific SaaS workflows, dashboard layouts, and form patterns will navigate them more reliably than a frontier general model that has never seen them.
For cost-sensitive deployments, the Ertas + browser-use combination is particularly valuable. Browser tasks tend to be repetitive within a single product or domain, which means a small fine-tuned model (7B-14B class) can match or exceed frontier model performance on the specific browsing patterns it was trained on. Combined with self-hosted browser-use deployment, this enables web automation at orders of magnitude lower cost per task than using GPT-5.5 or Claude Opus 4.7 via API for the same workflows. Privacy-sensitive applications (anything involving user credentials, internal dashboards, or proprietary data) also benefit from the fully self-hosted pattern.
Getting Started
- 1
Collect or generate browser-task training data
Record successful browser-task traces (screenshots + actions + reasoning) for your domain. Ertas Studio supports this multimodal training data format natively.
- 2
Fine-tune a vision-capable model in Ertas Studio
Use a multimodal base (e.g., Gemma 4, Qwen3-VL) and fine-tune on your browser-task corpus to produce a model specialized for your specific web workflows.
- 3
Deploy to a vision-enabled inference endpoint
Serve via vLLM, Ollama, or Ertas Cloud with multimodal support enabled. browser-use will call this endpoint with screenshots and prompts.
- 4
Install browser-use and configure the model
Install browser-use and configure the LLM provider to point at your Ertas inference endpoint. Choose vision-based or DOM-based control mode based on your tasks.
- 5
Run automated workflows
Issue natural-language tasks; browser-use orchestrates the LLM and browser to complete them. Log successful and failed traces for ongoing model refinement.
from browser_use import Agent
from langchain_openai import ChatOpenAI
# Point browser-use at your Ertas-trained vision-capable model
llm = ChatOpenAI(
base_url="http://localhost:8000/v1", # vLLM with multimodal support
model="ertas-browser-agent-7b",
api_key="not-needed",
temperature=0.1,
)
agent = Agent(
task="""
Log into our admin dashboard at admin.example.com,
navigate to the user management page, and export
the list of all users created in the last 30 days
as a CSV file.
""",
llm=llm,
)
result = await agent.run()
print(f"Task completed: {result.success}")
print(f"Output file: {result.artifacts}")Benefits
- Automate any web interface — including ones without APIs — through real-browser interaction
- MIT license with no commercial restrictions on derivative work
- 88%+ accuracy on standard browser-task benchmarks with frontier models
- Fine-tuned domain-specific models can match frontier accuracy at fraction of inference cost
- Fully self-hosted deployment for privacy-sensitive credentials and internal dashboards
- Active community of 50K+ stars with regular framework improvements
Related Resources
Fine-Tuning
Inference
LoRA
Getting Started with Ertas: Fine-Tune and Deploy Custom AI Models
Fine-Tune AI Models Without Writing Code
Running AI Models Locally: The Complete Guide to Local LLM Inference
LangChain
Make.com
n8n
Ollama
vLLM
Ertas for Customer Support
Ertas for Data Extraction
Ertas for AI Automation Agencies
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.