LangChain + 微调本地模型：零API费用构建管道

LangChain是构建AI管道最广泛使用的框架：文档处理、RAG、代理、链。大多数LangChain教程指向OpenAI的API，你的生产账单也反映了这一点。

LangChain通过Ollama支持本地模型——Ollama的OpenAI兼容接口意味着你可以用最少的代码更改替换LangChain管道中的AI后端。结合微调模型，你可以获得在领域任务上更快、大规模更便宜、且默认保护隐私的管道。

LangChain + Ollama集成选项

LangChain有两种Ollama集成路径：

选项1：ChatOllama（LangChain原生）

from langchain_ollama import ChatOllama

llm = ChatOllama(
    model="your-fine-tuned-model",
    base_url="http://localhost:11434",
    temperature=0.3
)

# 用法与ChatOpenAI完全相同
response = llm.invoke("Generate a listing for this property: ...")
print(response.content)

选项2：ChatOpenAI 使用Ollama base URL（首选的直接替换方式）

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="your-fine-tuned-model",
    base_url="http://localhost:11434/v1",
    api_key="ollama",  # 必填字段，不做验证
    temperature=0.3
)

# 与云端ChatOpenAI接口完全相同

选项2是最简洁的迁移方式：如果你已有使用ChatOpenAI的LangChain代码，唯一的更改是base_url和model。

常见管道模式

模式1：文档处理链

之前（GPT-4 API）：

from langchain_openai import ChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate

llm = ChatOpenAI(model="gpt-4o")  # $0.005/1K输入token

template = PromptTemplate.from_template(
    "Classify this support ticket:\n{ticket}\nOutput: category, priority, suggested_response"
)
chain = LLMChain(llm=llm, prompt=template)

result = chain.invoke({"ticket": ticket_text})

之后（微调本地模型）：

from langchain_openai import ChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate

# 只改一行
llm = ChatOpenAI(
    model="support-classifier-v3",  # 你的微调分类器
    base_url="http://localhost:11434/v1",
    api_key="ollama"
)

# 链定义不变
template = PromptTemplate.from_template(
    "Classify this support ticket:\n{ticket}\nOutput: category, priority, suggested_response"
)
chain = LLMChain(llm=llm, prompt=template)

result = chain.invoke({"ticket": ticket_text})

处理10,000张工单：

之前：10,000 x $0.005 = $50 API费用
之后：$50/月VPS费用，摊薄到所有处理中

模式2：使用微调阅读器的RAG管道

对于检索增强生成（RAG），你通常希望检索模型（嵌入）和阅读器模型（答案生成）都针对你的领域进行校准。

from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.chains import RetrievalQA
from langchain_chroma import Chroma

# 通过Ollama的本地嵌入（nomic-embed-text效果不错）
embeddings = OpenAIEmbeddings(
    model="nomic-embed-text",
    base_url="http://localhost:11434/v1",
    api_key="ollama"
)

# 微调阅读器模型
reader_llm = ChatOpenAI(
    model="your-domain-reader-model",
    base_url="http://localhost:11434/v1",
    api_key="ollama"
)

# 从你的文档构建向量存储
vectorstore = Chroma.from_documents(
    documents=your_docs,
    embedding=embeddings,
    persist_directory="./chroma_db"
)

# RAG链——推理时零云API调用
qa_chain = RetrievalQA.from_chain_type(
    llm=reader_llm,
    retriever=vectorstore.as_retriever(search_kwargs={"k": 5})
)

answer = qa_chain.invoke({"query": "What is our return policy for sale items?"})

嵌入和生成都在本地进行。检索模型理解你的领域术语。阅读器模型在你的领域问答上微调。零API调用。

模式3：使用本地工具执行器的LangGraph代理

LangGraph（LangChain的代理框架）兼容任何LangChain兼容的LLM：

from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

# 代理使用本地模型进行编排
orchestrator = ChatOpenAI(
    model="your-orchestrator-model",
    base_url="http://localhost:11434/v1",
    api_key="ollama"
)

# 工具执行器使用专用微调模型
domain_executor = ChatOpenAI(
    model="your-domain-model",
    base_url="http://localhost:11434/v1",
    api_key="ollama"
)

def run_domain_task(state):
    task = state["current_task"]
    result = domain_executor.invoke(task)
    return {"result": result.content}

# 构建图
graph = StateGraph(dict)
graph.add_node("domain_executor", run_domain_task)
# ... 添加编排逻辑

app = graph.compile()

LangChain + Ollama性能调优

批处理： 对于批量处理（分类5,000张工单），使用LangChain的batch方法来并行化调用：

# 并发处理100张工单
results = await chain.abatch(
    [{"ticket": t} for t in tickets],
    config={"max_concurrency": 10}  # 10个并发Ollama调用
)

缓存： 启用LangChain的语义缓存以避免冗余模型调用：

from langchain.globals import set_llm_cache
from langchain_community.cache import SQLiteCache

set_llm_cache(SQLiteCache(database_path=".langchain.db"))
# 相同的提示返回缓存结果——无需Ollama调用

上下文长度： 7B模型通常支持4K-8K上下文。对于长文档处理，使用LangChain的文本分割器在传递给模型之前分块。

何时在LangChain管道中使用微调本地vs云端

任务	本地微调	云端（GPT-4）
领域分类	更好更便宜	大材小用
领域生成	更好更便宜	大材小用
复杂推理链	需考虑	更好
时事/网络	不适用	必需
高量批处理	便宜得多	昂贵
一次性/低量	均可	均可