Hermes Agent vs Hermes 4: What's the Difference?

If you've been keeping up with the open-source AI ecosystem in 2026, you've almost certainly seen "Hermes" mentioned multiple times in different contexts — and you may have noticed that the references don't quite line up. That's because there are now two distinct things from Nous Research sharing the Hermes name: a model family and an agent framework. They're related conceptually but operationally separate, and conflating them produces real confusion when planning deployments.

This is a quick disambiguation guide. We'll cover what each is, when you'd use one versus the other, and how they relate.

TL;DR

Hermes 4 is an open-weight LLM family released August 2025 — Llama-3.1-based fine-tunes in 14B, 70B, and 405B sizes with hybrid <think> reasoning and neutrally-aligned post-training.
Hermes Agent is an open-source agent framework released February 2026 — built around the GEPA self-improvement mechanism where agents create reusable skills from successful task completions.

You use Hermes 4 when you need a strong reasoning model with minimal refusal training (security research, mature creative work, education on sensitive topics). You use Hermes Agent when you want self-improving agent behavior — typically with Hermes 4 or another base model underneath.

Hermes 4: The Model Family

Hermes 4, released August 30, 2025, is the fourth generation of the Hermes model family from Nous Research. The family ships in three sizes — 14B, 70B, and 405B parameters — all derived from Meta's Llama 3.1 base models via Nous's post-training pipeline.

The three things that distinguish Hermes 4 from base Llama 3 Instruct:

Hybrid <think> reasoning. Hermes 4 was trained to support extended chain-of-thought reasoning marked with explicit <think>...</think> tags. The model decides whether to think or respond directly based on query complexity — fast direct responses for simple queries, extended reasoning traces for hard problems. This is similar in spirit to the unified thinking modes in Qwen 3+ and DeepSeek V3.2+, but achieved through targeted post-training rather than from-scratch architectural design.

Atropos RL post-training. Nous trained Hermes 4 using their Atropos reinforcement learning framework with approximately 1,000 task-specific verifiers — automated graders that score model outputs on factual accuracy, code correctness, mathematical validity, and other domain-specific signals. The result is measurably better reasoning capability than base Llama 3 Instruct: Hermes 4 70B substantially outperforms Llama 3 70B Instruct on AIME, GPQA Diamond, and complex code generation.

Neutral alignment. Nous explicitly avoided heavy-handed RLHF refusal training. Hermes 4 follows instructions without the layered refusal patterns common in mainstream releases. This is significant for legitimate use cases that require the model to engage with content other models reject — security research and CTF challenges, fiction with mature themes, historical content analysis, and educational discussion of sensitive topics.

Because Hermes 4 is built on Llama 3.1, it inherits the entire Llama deployment ecosystem. It runs in llama.cpp, vLLM, Ollama, LM Studio, and TensorRT-LLM with no special configuration. The 14B variant fine-tunes on consumer GPUs (12-16GB VRAM with QLoRA); the 70B fits on a single 48GB GPU; the 405B requires multi-GPU server infrastructure.

Weights are available on Hugging Face under NousResearch/Hermes-4-405B, NousResearch/Hermes-4-70B, and NousResearch/Hermes-4-14B. The license is inherited from Llama 3.1 (the Llama Community License), which is commercial-permissive with usage caps and attribution requirements.

Hermes Agent: The Framework

Hermes Agent, released February 2026, is something completely different — an open-source agent framework, not a model. The framework's defining capability is its GEPA (Generalized Experience-based Procedural Acquisition) self-improvement mechanism: agents create reusable "skills" from successful task completions, refine them through use, and accumulate a personal skill library that compounds in capability over time.

The pattern is simple but powerful. When an agent completes a complex task successfully, GEPA writes that solution as a reusable skill — typically as readable code or a structured prompt. The next time a similar task appears, the agent invokes the existing skill rather than re-deriving the solution. Empirical results from Nous show Hermes agents getting approximately 40% faster on repeated tasks after building 20+ self-generated skills, with the speedup coming entirely from skill reuse.

This is meaningfully different from most agent frameworks where each task starts from scratch. With Hermes Agent, an agent's accumulated experience becomes a first-class artifact: skills are persisted, refined, and reused. The skills themselves are inspectable — readable code or prompts — rather than opaque learned weights, which makes the system debuggable and editable in ways that fine-tuning-based approaches aren't.

By April 2026, Hermes Agent has crossed 103K GitHub stars and is one of the fastest-growing open-source agent frameworks. The framework is MIT-licensed, with self-hosting starting at €5/month for managed infrastructure.

Critically: Hermes Agent works with any LLM, not just Hermes 4. The framework calls models through standard OpenAI-compatible endpoints, which means you can use Hermes Agent with Llama 3, Qwen 3.6, DeepSeek V4, GPT-OSS, or any other model served via Ollama, vLLM, or a hosted API. The Hermes 4 model family is a particularly natural fit (the hybrid <think> reasoning pairs well with skill creation), but it's not required.

When to Use Which

The two products serve different needs and aren't substitutes for each other.

Use Hermes 4 (the model) when:

You need a strong reasoning capability and the safety alignment of mainstream models is blocking legitimate use cases
You're building security research tools, CTF training environments, or red-team evaluation systems
You need a Llama 3.1-compatible model with substantially better reasoning than base Llama 3 Instruct
You're fine-tuning for specialized reasoning workloads and want a strong starting point
Your deployment infrastructure is built around the Llama 3 ecosystem

Use Hermes Agent (the framework) when:

You're building production agentic systems and want self-improvement to compound capability over time
You need an inspectable skill library rather than opaque learned weights
You want agents to get faster on repeated tasks without continuous fine-tuning cycles
You're already using LangGraph, CrewAI, or similar frameworks but want to add accumulated-skills behavior
You're shipping agent products where users will run similar tasks repeatedly (research, coding, analysis)

Use both together when:

You want the strongest possible self-improving agent stack — Hermes 4's hybrid <think> reasoning pairs naturally with Hermes Agent's skill creation, and the combination produces particularly high-quality skill libraries
You're in regulated environments where neutral alignment in the underlying model and inspectable skills in the agent framework together address compliance concerns
You want to close the loop with fine-tuning: export GEPA skills as training data and fine-tune Hermes 4 in Ertas Studio on its own self-generated procedural knowledge

How They Relate Conceptually

The product strategy connection is real even though the operational separation is clean. Nous's broader thesis is around steerable, capability-first AI systems — models that follow instructions reliably and frameworks that compound capability through use rather than relying solely on the underlying model getting better.

Hermes 4 (the model) embodies this on the model side: better reasoning capability without imposing additional alignment constraints. Hermes Agent (the framework) embodies it on the system side: agents that improve through accumulated experience rather than only through model retraining.

Used together, they produce a stack with two complementary improvement loops: the model can be fine-tuned on domain data (improving base capability), and the agent framework accumulates skills from production runs (improving applied capability). The skills themselves can be exported as training data for the next fine-tuning cycle, creating a compounding improvement pattern that neither component achieves alone.

How Ertas Fits In

For teams running either or both of these products, Ertas Studio supports the relevant fine-tuning workflows:

Fine-tuning Hermes 4 directly. The 14B variant fits on consumer GPUs (12-16GB VRAM), the 70B on a 48GB GPU. Ertas Studio's QLoRA pipeline handles the Llama 3.1 base architecture natively, including preservation of the hybrid <think> reasoning behavior in the fine-tuned output.
Distillation from Hermes 4. Use Hermes 4 405B as a teacher to generate synthetic reasoning-trace data, then fine-tune a smaller base model (Qwen 32B, Llama 70B, or DeepSeek-R1 distilled variants) on that data. This produces a domain-specialized model at single-GPU deployment cost while inheriting Hermes 4's reasoning patterns.
Skill-library distillation from Hermes Agent. Export the GEPA skill library from production Hermes Agent runs as training data, then fine-tune your underlying base model on its own self-generated procedural knowledge. The fine-tuned model then performs better on the patterns it has seen most, reducing the need for skill-library lookups for common tasks while preserving skill-based handling for novel ones.

If you're evaluating either product for production deployment, the right starting point is to clarify which problem you're solving. Hermes 4 the model is the right answer when alignment patterns of mainstream models are the obstacle. Hermes Agent the framework is the right answer when you want compounding capability from agent experience. Both at once is the right answer when you're building self-improving agent products at scale and the model-side and system-side improvements need to work together.

Hermes Agent vs Hermes 4: What's the Difference?

TL;DR

Hermes 4: The Model Family

Hermes Agent: The Framework

When to Use Which

How They Relate Conceptually

How Ertas Fits In

Ship AI that runs on your users' devices.

Keep reading

How to Distill Open-Source Models Legally: A Step-by-Step Guide

Distilling Claude/GPT into a 7B Model for Production: Step-by-Step

From API-Dependent to Model Owner: A 90-Day Migration Playbook