Haystack + Ertas
Integrate Ertas-trained models into Haystack's modular NLP pipelines for document retrieval, question answering, and semantic search at enterprise scale.
Overview
Haystack, developed by deepset, is an open-source NLP framework designed for building production-ready search and question answering systems. Unlike general-purpose LLM frameworks, Haystack is pipeline-first: every component — retriever, reader, generator, ranker — is a modular node that can be swapped, chained, and configured independently. This architecture makes Haystack particularly well-suited for enterprise deployments where reliability, observability, and component-level testing matter more than prototyping speed.
Haystack 2.x introduced a fully redesigned pipeline API with first-class support for LLM-powered generation, making it a strong choice for RAG applications that need to go beyond simple prompt-and-retrieve patterns. Its built-in evaluation framework lets teams measure retrieval recall, answer quality, and faithfulness metrics out of the box — capabilities that are essential when deploying fine-tuned models into production and tracking whether model updates actually improve downstream performance.
How Ertas Integrates
Ertas-trained models slot directly into Haystack pipelines as generator or reader components. After fine-tuning in Ertas Studio, you deploy the model to an OpenAI-compatible endpoint and configure Haystack's OpenAIGenerator or OllamaGenerator to point to your local or cloud inference server. Because Haystack treats the LLM as just another pipeline component, you can A/B test your Ertas-trained model against a generic model by running parallel pipelines and comparing outputs using Haystack's evaluation nodes.
The combination of Ertas fine-tuning and Haystack's evaluation framework creates a powerful optimization loop. You can measure exactly how much your fine-tuned model improves retrieval-augmented answers on your domain-specific evaluation set, identify failure patterns, generate targeted training examples from those failures, and retrain in Ertas Studio. This data flywheel approach — where production failures feed directly into training improvements — is the most reliable way to build AI systems that get better over time rather than degrading as edge cases accumulate.
Getting Started
- 1
Fine-tune a domain model in Ertas Studio
Train a model on your domain corpus using Ertas Studio. Focus on the specific task your Haystack pipeline will perform — question answering, summarization, or extraction.
- 2
Deploy to a supported inference backend
Export the GGUF model and serve it through Ollama, vLLM, or any OpenAI-compatible endpoint. Haystack supports multiple generator backends natively.
- 3
Build your Haystack pipeline
Assemble a Haystack pipeline with your choice of retriever, ranker, and generator components. Point the generator to your Ertas-trained model endpoint.
- 4
Evaluate with Haystack's built-in metrics
Run your pipeline against a labeled evaluation set and measure answer accuracy, faithfulness, and retrieval recall to quantify the impact of fine-tuning.
- 5
Iterate and retrain
Analyze pipeline failures, generate new training examples, and retrain in Ertas Studio. Redeploy the improved model without changing your Haystack pipeline configuration.
from haystack import Pipeline
from haystack.components.generators import OpenAIGenerator
from haystack.components.builders import PromptBuilder
# Configure generator with your Ertas-trained model
generator = OpenAIGenerator(
api_base_url="http://localhost:11434/v1",
model="ertas-support-7b",
api_key="not-needed",
)
prompt = PromptBuilder(
template="""Answer the question based on the context.
Context: {{ context }}
Question: {{ question }}
Answer:"""
)
# Build the pipeline
pipe = Pipeline()
pipe.add_component("prompt", prompt)
pipe.add_component("generator", generator)
pipe.connect("prompt", "generator")
result = pipe.run({
"prompt": {
"context": "Our return policy allows returns within 30 days...",
"question": "How long do I have to return an item?",
}
})
print(result["generator"]["replies"][0])Benefits
- Modular pipeline architecture lets you swap models without rewriting application logic
- Built-in evaluation framework quantifies fine-tuning impact on production metrics
- Enterprise-grade observability with pipeline-level logging and tracing
- A/B test Ertas-trained models against baselines in parallel pipelines
- Production-ready document processing with support for PDF, DOCX, and HTML
- Strong community and enterprise support from deepset for mission-critical deployments
Related Resources
Fine-Tuning
GGUF
Inference
Getting Started with Ertas: Fine-Tune and Deploy Custom AI Models
How to Fine-Tune an LLM: The Complete 2026 Guide
Running AI Models Locally: The Complete Guide to Local LLM Inference
Privacy-Conscious AI Development: Fine-Tune in the Cloud, Run on Your Terms
Fine-Tune AI Models Without Writing Code
Hugging Face
LangChain
LlamaIndex
Ollama
vLLM
Ertas for Healthcare
Ertas for Customer Support
Ertas for Legal
Ertas for Data Extraction
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.