Letta + Ertas

Build agents with persistent memory using Letta — the framework formerly known as MemGPT, with stateful memory architectures that survive across sessions and an official Vercel AI SDK provider for fine-tuned models.

Overview

Letta is the production successor to the MemGPT research framework, focused on building agents with persistent, stateful memory that survives across sessions. Where most agent frameworks treat memory as ephemeral context within a single run, Letta makes memory a first-class primitive: agents have a structured memory hierarchy (working memory, archival memory, recall memory) that the agent itself manages and updates over time. This enables long-running agents that genuinely remember interactions weeks or months apart, can reason over information that exceeds any single context window, and develop persistent personalities and knowledge bases.

The framework is built on the architecture pioneered in the MemGPT research — operating-system-style memory paging with the LLM as the 'CPU.' The agent autonomously decides what to keep in working memory, what to flush to archival storage, and what to retrieve when needed. This pattern means that even with a model that has only a 32K context window, a Letta agent can effectively operate over arbitrarily long histories by treating its context like a memory hierarchy. Letta ships with an official Vercel AI SDK provider, making it straightforward to use with TypeScript codebases.

How Ertas Integrates

Ertas-trained models power Letta agents through the standard model configuration interface. Letta supports any OpenAI-compatible endpoint, so a model fine-tuned in Ertas Studio and deployed via Ollama, vLLM, or Ertas Cloud plugs into a Letta agent with a few lines of configuration. The combination is particularly powerful for use cases where domain-specific knowledge needs to compound over time — Letta's persistent memory captures user-specific facts, preferences, and history while the Ertas-trained model provides domain-specific reasoning at the weights level.

For agent applications that need to evolve, Letta + Ertas creates a continuous improvement loop. Letta's archival memory stores all past interactions; you can periodically extract high-quality conversation traces from Letta's memory store and use them as additional training data in Ertas Studio to refine the model further. The fine-tuned model then performs better on the patterns it has seen most in production, while Letta's persistent memory continues to handle individual context that doesn't generalize. This split between 'baked-in domain knowledge' (model weights via Ertas) and 'per-user persistent state' (memory via Letta) is a clean architecture for long-running personal AI applications.

Getting Started

1
Fine-tune your domain model in Ertas Studio
Train a model on your domain corpus. The fine-tuned model captures stable domain knowledge that all your Letta agents will share.
2
Deploy to an OpenAI-compatible endpoint
Export to GGUF and serve via Ollama, vLLM, or Ertas Cloud. Letta calls any standard chat-completion endpoint.
3
Install Letta and configure the model
Install letta-client (Python) or @letta-ai/letta (TypeScript). Configure your model provider to point at the Ertas inference endpoint.
4
Create a stateful agent with memory
Define a Letta agent with persistent memory. Letta automatically manages the memory hierarchy as conversations grow beyond the context window.
5
Operate, evolve, and refine the model
Run the agent in production. Periodically extract high-signal conversations from Letta's archival memory to feed back into Ertas Studio for ongoing model improvement.

python

from letta_client import Letta, LlmConfig

# Point Letta at your Ertas-trained model served via vLLM
client = Letta(base_url="http://localhost:8283")

llm_config = LlmConfig(
    model="ertas-personal-assistant-14b",
    model_endpoint_type="openai",
    model_endpoint="http://localhost:8000/v1",
    context_window=32000,
)

# Create a stateful agent with persistent memory
agent = client.agents.create(
    name="alex-personal-assistant",
    llm_config=llm_config,
    memory={
        "human": "User is Alex, an enterprise architect at FinTech Corp.",
        "persona": "I'm a personal AI assistant that learns Alex's preferences over time.",
    },
)

# First conversation
client.agents.messages.create(
    agent_id=agent.id,
    messages=[{"role": "user", "content": "I prefer concise responses, under 100 words."}],
)

# Weeks later — agent still remembers
response = client.agents.messages.create(
    agent_id=agent.id,
    messages=[{"role": "user", "content": "Summarize the AWS Q3 financials."}],
)
# Response is concise, matching Alex's stated preference from weeks ago

Create a Letta agent backed by an Ertas-trained model. The agent's persistent memory survives across sessions, weeks, and months without context window limits.

Benefits

Persistent memory survives across sessions, days, and months — true long-running agents
Operating-system-style memory paging means small context windows still produce long-context behavior
Official Vercel AI SDK provider for first-class TypeScript integration
Successor to MemGPT with mature production patterns and stable APIs
Pair domain-knowledge in fine-tuned weights with per-user state in persistent memory
Continuous improvement loop: extract memory traces to fine-tune Ertas models