RAG Without LangChain: Building Production Retrieval Pipelines Without a Python Framework

LangChain has become the default recommendation whenever someone asks "how do I build RAG?" on Reddit, Stack Overflow, or any AI Discord. And for good reason: it got people from zero to a working prototype faster than anything else in 2023-2024. You could wire up a vector database, an embedding model, and a chat completion endpoint in fifty lines of Python.

But a growing number of production teams are quietly ripping it out. Not because LangChain is bad — it genuinely helped bootstrap the RAG ecosystem. They're removing it because the things that make LangChain great for prototyping become liabilities at production scale.

If you're evaluating how to build a RAG pipeline without LangChain, or wondering whether you should migrate away, this is the honest breakdown: what's actually wrong, what the alternatives are, and when LangChain is still the right choice.

The Abstraction Tax

Every framework imposes a cost. You get convenience in exchange for control. That tradeoff is worth it when the abstractions are stable, well-documented, and map cleanly to your mental model of the underlying system.

LangChain's abstractions have struggled on all three counts in production environments.

Version churn. LangChain's API surface has changed aggressively across releases. Teams that built against langchain==0.0.x found their code broken by 0.1.x, then again by the langchain-core / langchain-community split. In a prototyping context, you pin a version and move on. In a production system with CI/CD pipelines, dependency auditing, and security reviews, every breaking change costs engineering hours.

Debugging black boxes. When a RAG pipeline returns a bad answer, you need to inspect every stage: the query transformation, the embedding, the retrieval step, the reranking, and the prompt assembly. LangChain wraps each of these in layers of abstraction — chains, runnables, callbacks — that make it hard to see what actually happened at each step. Teams report spending more time debugging the framework than debugging the pipeline logic.

Over-engineering simple pipelines. Most production RAG systems follow a straightforward pattern: embed the query, search a vector store, assemble the context into a prompt, call a model. This is maybe 100 lines of direct code. LangChain introduces concepts like chains, agents, output parsers, retrievers, and memory objects for what is fundamentally a linear data flow. The cognitive overhead is real, especially when onboarding new engineers to the codebase.

Vendor coupling. LangChain supports dozens of LLM providers, vector stores, and embedding models through its integration layer. But this integration layer means you're depending on LangChain to maintain wrappers for each vendor. When a provider updates their SDK, you wait for LangChain to catch up. When you want to use a provider-specific feature, you fight the abstraction to access it.

None of this means LangChain is a bad project. It means the framework is optimized for breadth and getting-started speed, not for the operational concerns that matter in production: debuggability, stability, and transparency.

Alternative 1: Direct SDK Calls

The most common path teams take when leaving LangChain is the simplest: drop the framework entirely and call provider SDKs directly.

A production RAG pipeline built with direct calls typically looks like this:

Embed the query using your embedding provider's SDK (OpenAI, Cohere, or a local model via sentence-transformers)
Search your vector store using the store's native client (Pinecone, Weaviate, Qdrant, pgvector)
Optionally rerank the retrieved chunks using a cross-encoder or a reranking API
Assemble the prompt by inserting retrieved context into your prompt template
Call the LLM using the provider's SDK with your assembled prompt

Each step is a function call. Each function call returns data you can log, inspect, and test independently. There's no framework state to manage, no callback system to configure, no chain abstraction between you and the actual operation.

When this fits: Teams with strong Python/TypeScript engineering, well-defined pipelines that won't change shape frequently, and a need for full observability at every step. This is the best on-prem alternative to LangChain when you want zero external dependencies beyond the services you already use.

The cost: You write more code. You build your own retry logic, your own batching, your own prompt management. For a single pipeline this is trivial. For ten pipelines across multiple teams, you'll end up building an internal library — which is, in effect, your own mini-framework.

Alternative 2: LlamaIndex

LlamaIndex (formerly GPT Index) occupies a middle ground. It's still a framework, but one designed specifically for retrieval and indexing rather than general-purpose LLM orchestration.

The key difference in philosophy: LangChain tries to abstract the entire LLM application stack (agents, tools, memory, chains). LlamaIndex focuses narrowly on the data indexing and retrieval problem — how to get the right context to the model.

In practice, this means LlamaIndex's abstractions map more closely to what RAG pipelines actually do. Its core concepts — nodes, indices, query engines, and retrievers — correspond directly to stages in a retrieval pipeline. You spend less time fighting the framework because the framework's mental model matches the problem.

LlamaIndex has also been more conservative about API changes, though it has had its own share of restructuring (the llama-index-core split mirrored LangChain's modularization).

When this fits: Teams that want framework conveniences — prebuilt integrations, sensible defaults, document parsing utilities — without the sprawling abstraction surface of LangChain. LlamaIndex is particularly strong if your pipeline involves complex document processing (hierarchical chunking, multi-document synthesis, structured extraction from PDFs).

The cost: You're still in a framework. You still depend on the maintainers to keep integrations updated. But the surface area is smaller, so the dependency risk is lower.

Alternative 3: Visual Pipeline Builders

A third category has emerged for teams that want to build RAG pipelines without LangChain and without writing pipeline orchestration code at all: visual pipeline builders and low-code platforms.

Tools like Flowise, Langflow, Haystack Studio, and dedicated enterprise platforms let you assemble retrieval pipelines by connecting nodes in a visual graph. Each node represents a step — embedding, retrieval, reranking, generation — and the platform handles execution, monitoring, and deployment.

Some of these tools use LangChain or LlamaIndex under the hood but shield you from the framework's complexity. Others are built on their own execution engines.

When this fits: Teams where the people building RAG pipelines are domain experts (data analysts, product managers, solution engineers) rather than backend engineers. Also useful for rapid experimentation — you can try twenty different chunking strategies in an afternoon by swapping nodes.

The cost: You trade code-level control for ease of use. Customization beyond what the platform supports requires workarounds or plugins. Performance tuning is harder when you can't see or modify the underlying code. And you add a platform dependency on top of your infrastructure.

When LangChain Is Still the Right Call

It would be dishonest to write an article about RAG without LangChain and not acknowledge where LangChain genuinely shines.

Learning and prototyping. If you're new to RAG and want to understand the moving parts, LangChain's tutorials and community resources are unmatched. You'll get a working prototype faster with LangChain than with any other approach. The abstractions that become liabilities in production are assets when you're learning — they let you focus on concepts without drowning in implementation details.

Rapid experimentation. When you need to test a dozen different retriever strategies, model providers, and prompt patterns in a week, LangChain's plug-and-play integrations save real time. The framework's breadth is a feature when you're exploring, not yet committed to a specific architecture.

Agent-heavy architectures. If your system genuinely needs agentic behavior — tool use, multi-step reasoning, dynamic routing — LangChain's agent abstractions (especially via LangGraph) are among the most mature available. Direct SDK calls get complicated fast when you're building autonomous agents.

Small teams with broad needs. A three-person startup building an AI feature doesn't need a bespoke pipeline architecture. LangChain gets them to market faster, and the operational costs of its abstractions don't materialize until they reach a scale that's a good problem to have.

The Decision Framework

Here's a simple way to think about which approach fits your situation:

Factor	Direct SDKs	LlamaIndex	Visual Builder	LangChain
Pipeline complexity	Low-medium	Medium-high	Medium	Any
Team engineering depth	High	Medium-high	Low-medium	Any
Need for full observability	Critical	Important	Nice to have	Flexible
Integration stability priority	High	Medium	Low	Low
Time to first prototype	Slowest	Medium	Fastest	Fast
Long-term maintenance cost	Lowest	Low-medium	Medium	Highest

The honest answer is that there's no universally best approach. Teams building production retrieval pipelines without a Python framework tend to converge on direct SDK calls for core pipelines and keep a framework around for experimentation. Teams that need to move fast and aren't yet worried about operational maturity are well-served by LangChain.

The important thing is to make the choice deliberately, based on your team's specific constraints — not because "everyone uses LangChain" or because "frameworks are bad." Both of those statements collapse under the weight of any real engineering decision.

Where Ertas Fits

Ertas is designed for teams that have already decided they want control over their AI infrastructure. Whether you're calling provider SDKs directly, using LlamaIndex, or migrating off LangChain, Ertas handles the operational layer underneath: model deployment, data pipeline management, and governance controls.

You build the RAG pipeline your way. Ertas makes sure it runs reliably, stays compliant, and scales without re-architecture. No framework opinions imposed — just infrastructure that stays out of your way.

RAG Without LangChain: Building Production Retrieval Pipelines Without a Python Framework

The Abstraction Tax

Alternative 1: Direct SDK Calls

Alternative 2: LlamaIndex

Alternative 3: Visual Pipeline Builders

When LangChain Is Still the Right Call

The Decision Framework

Where Ertas Fits

Turn unstructured data into AI-ready datasets — without it leaving the building.

Keep reading

Best On-Premise Alternative to LangChain for Enterprise RAG Pipelines

Why Your RAG Pipeline Fails Silently — And How to Make It Observable

How to Deploy a RAG Pipeline as an API Endpoint Your AI Agent Can Call