Haystack + Ertas

    Integrate Ertas-trained models into Haystack's modular NLP pipelines for document retrieval, question answering, and semantic search at enterprise scale.

    Overview

    Haystack, developed by deepset, is an open-source NLP framework designed for building production-ready search and question answering systems. Unlike general-purpose LLM frameworks, Haystack is pipeline-first: every component — retriever, reader, generator, ranker — is a modular node that can be swapped, chained, and configured independently. This architecture makes Haystack particularly well-suited for enterprise deployments where reliability, observability, and component-level testing matter more than prototyping speed.

    Haystack 2.x introduced a fully redesigned pipeline API with first-class support for LLM-powered generation, making it a strong choice for RAG applications that need to go beyond simple prompt-and-retrieve patterns. Its built-in evaluation framework lets teams measure retrieval recall, answer quality, and faithfulness metrics out of the box — capabilities that are essential when deploying fine-tuned models into production and tracking whether model updates actually improve downstream performance.

    How Ertas Integrates

    Ertas-trained models slot directly into Haystack pipelines as generator or reader components. After fine-tuning in Ertas Studio, you deploy the model to an OpenAI-compatible endpoint and configure Haystack's OpenAIGenerator or OllamaGenerator to point to your local or cloud inference server. Because Haystack treats the LLM as just another pipeline component, you can A/B test your Ertas-trained model against a generic model by running parallel pipelines and comparing outputs using Haystack's evaluation nodes.

    The combination of Ertas fine-tuning and Haystack's evaluation framework creates a powerful optimization loop. You can measure exactly how much your fine-tuned model improves retrieval-augmented answers on your domain-specific evaluation set, identify failure patterns, generate targeted training examples from those failures, and retrain in Ertas Studio. This data flywheel approach — where production failures feed directly into training improvements — is the most reliable way to build AI systems that get better over time rather than degrading as edge cases accumulate.

    Getting Started

    1. 1

      Fine-tune a domain model in Ertas Studio

      Train a model on your domain corpus using Ertas Studio. Focus on the specific task your Haystack pipeline will perform — question answering, summarization, or extraction.

    2. 2

      Deploy to a supported inference backend

      Export the GGUF model and serve it through Ollama, vLLM, or any OpenAI-compatible endpoint. Haystack supports multiple generator backends natively.

    3. 3

      Build your Haystack pipeline

      Assemble a Haystack pipeline with your choice of retriever, ranker, and generator components. Point the generator to your Ertas-trained model endpoint.

    4. 4

      Evaluate with Haystack's built-in metrics

      Run your pipeline against a labeled evaluation set and measure answer accuracy, faithfulness, and retrieval recall to quantify the impact of fine-tuning.

    5. 5

      Iterate and retrain

      Analyze pipeline failures, generate new training examples, and retrain in Ertas Studio. Redeploy the improved model without changing your Haystack pipeline configuration.

    python
    from haystack import Pipeline
    from haystack.components.generators import OpenAIGenerator
    from haystack.components.builders import PromptBuilder
    
    # Configure generator with your Ertas-trained model
    generator = OpenAIGenerator(
        api_base_url="http://localhost:11434/v1",
        model="ertas-support-7b",
        api_key="not-needed",
    )
    
    prompt = PromptBuilder(
        template="""Answer the question based on the context.
    Context: {{ context }}
    Question: {{ question }}
    Answer:"""
    )
    
    # Build the pipeline
    pipe = Pipeline()
    pipe.add_component("prompt", prompt)
    pipe.add_component("generator", generator)
    pipe.connect("prompt", "generator")
    
    result = pipe.run({
        "prompt": {
            "context": "Our return policy allows returns within 30 days...",
            "question": "How long do I have to return an item?",
        }
    })
    print(result["generator"]["replies"][0])
    Use an Ertas-trained model as the generator in a Haystack pipeline for domain-specific question answering.

    Benefits

    • Modular pipeline architecture lets you swap models without rewriting application logic
    • Built-in evaluation framework quantifies fine-tuning impact on production metrics
    • Enterprise-grade observability with pipeline-level logging and tracing
    • A/B test Ertas-trained models against baselines in parallel pipelines
    • Production-ready document processing with support for PDF, DOCX, and HTML
    • Strong community and enterprise support from deepset for mission-critical deployments

    Related Resources

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.