LangChain + Ertas
Integrate Ertas-trained models into LangChain pipelines for retrieval-augmented generation, agents, and complex multi-step reasoning workflows.
Overview
LangChain is the most widely adopted framework for building applications powered by large language models. It provides composable abstractions for prompt management, chain orchestration, tool usage, memory, and retrieval-augmented generation (RAG). Developers use LangChain to construct complex AI pipelines where an LLM reasons over retrieved documents, calls external APIs, and maintains conversational context across multi-turn interactions.
For teams fine-tuning models with Ertas, LangChain is the natural application layer. Rather than relying on generic foundation models through expensive API calls, you can plug your Ertas-trained model — served locally via Ollama, llama.cpp, or vLLM — directly into LangChain chains and agents. This gives you domain-specific intelligence at every step of your pipeline: better retrieval reranking, more accurate tool selection, and fewer hallucinations in the final output because the model already understands your domain vocabulary and reasoning patterns.
How Ertas Integrates
After fine-tuning a model in Ertas Studio, you can deploy it to any OpenAI-compatible inference endpoint — Ollama, vLLM, LM Studio, or Ertas Cloud — and connect it to LangChain using the standard ChatOpenAI or ChatOllama classes. LangChain's provider-agnostic interface means switching from a cloud API to your Ertas-trained local model requires changing only the base URL and model name; your chains, prompts, and retrieval logic remain untouched.
Ertas Hub provides curated prompt templates and chain configurations optimized for common fine-tuning use cases like document Q&A, structured extraction, and multi-step classification. These templates are designed to work with the chat formats and system prompts that Ertas Studio bakes into your fine-tuned model, ensuring that the prompt structure in LangChain matches what the model was trained on. This alignment between training-time and inference-time prompting is critical for getting maximum quality out of fine-tuned models and is a common source of subtle bugs when teams assemble their own pipelines.
Getting Started
- 1
Fine-tune your model in Ertas Studio
Train a domain-specific model using your JSONL dataset in Ertas Studio. Choose a base model from Hub and apply LoRA adapters for efficient training on your specific use case.
- 2
Deploy to an OpenAI-compatible endpoint
Export your model in GGUF format and serve it via Ollama, vLLM, or Ertas Cloud. Any endpoint that exposes the /v1/chat/completions API works with LangChain.
- 3
Install LangChain and configure the LLM provider
Install langchain and the relevant provider package (langchain-openai or langchain-ollama). Point the client to your local or cloud inference endpoint.
- 4
Build your chain or agent
Use LangChain's LCEL (LangChain Expression Language) to compose retrieval chains, tool-calling agents, or multi-step reasoning pipelines that leverage your fine-tuned model.
- 5
Iterate with Ertas feedback loops
Collect inference logs and user feedback through LangChain callbacks, then feed them back into Ertas Studio as additional training data to continuously improve model quality.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
# Point to your Ertas-trained model served via Ollama
llm = ChatOpenAI(
base_url="http://localhost:11434/v1",
model="ertas-legal-7b",
api_key="not-needed",
temperature=0.1,
)
# Build a simple RAG chain
prompt = ChatPromptTemplate.from_messages([
("system", "You are a legal assistant. Use the context to answer questions accurately."),
("human", "Context: {context}\n\nQuestion: {question}"),
])
chain = prompt | llm | StrOutputParser()
response = chain.invoke({
"context": "The contract stipulates a 30-day termination notice period...",
"question": "What is the required notice period for termination?",
})
print(response)Benefits
- Use fine-tuned models as drop-in replacements in any LangChain chain or agent
- Reduce hallucinations in RAG pipelines with domain-trained retrieval and generation
- Keep all inference local or in your VPC — no data sent to third-party APIs
- Leverage LangChain's ecosystem of 700+ integrations with Ertas-quality models
- Align training-time and inference-time prompts for maximum model performance
- Continuously improve models by feeding LangChain logs back into Ertas Studio
Related Resources
Fine-Tuning
GGUF
Inference
LoRA
Getting Started with Ertas: Fine-Tune and Deploy Custom AI Models
How to Fine-Tune an LLM: The Complete 2026 Guide
Running AI Models Locally: The Complete Guide to Local LLM Inference
Privacy-Conscious AI Development: Fine-Tune in the Cloud, Run on Your Terms
Fine-Tune AI Models Without Writing Code
Haystack
Hugging Face
LlamaIndex
Ollama
vLLM
Ertas for Healthcare
Ertas for Customer Support
Ertas for Legal
Ertas for Data Extraction
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.