
Why 'We Use the API' Means You Have No Control Over Your AI in Production
Every team that depends on a cloud AI API has silently outsourced control of their AI behavior. Here's exactly what you give up when the model lives in someone else's infrastructure.
Most teams building on cloud AI APIs believe they control their AI. They write the prompts. They set the system instructions. They choose the temperature and the context window. They feel in control.
They're not.
Control — real control — means you determine what happens when a given input arrives at your system. That's the model's job. The model decides. And the model isn't yours.
The prompt you wrote is a request. The model decides how to honor it based on training choices, safety filters, and RLHF values that were made by someone else, for purposes that may or may not align with your use case. You're writing suggestions to a black box someone else built and maintains.
This isn't a theoretical complaint. It has concrete operational consequences. Here are the six dimensions of control you give up when the model lives in someone else's infrastructure.
1. Model Updates: Silent Behavior Changes
Cloud AI vendors update their models. Sometimes they announce it; often they don't. When gpt-4-turbo gets updated, every application using that endpoint gets a new model without any deployment action on your part. The change is invisible at the infrastructure level — same endpoint, same API key, different behavior.
What does "different behavior" look like in practice? Shorter outputs. Changed formatting preferences. Shifted classification thresholds. Increased refusal rates on certain topics. Altered summarization style. None of these changes trigger a deployment alert. None of them appear in your application logs as a version change. Your product behavior changed and you probably won't know until a user tells you something is different.
This is not hypothetical. It is documented across the industry. Teams with LLM-powered products build regression test suites specifically because they've been burned by silent model updates.
2. Training Data: Choices You Didn't Make
The model's behavior — what it knows, what it emphasizes, what it tends to refuse, how it frames ambiguous topics — reflects choices made during training. Those choices include what data was included, what data was filtered out, how the data was weighted, and what human raters evaluated as good versus bad during RLHF.
You had no input into any of that. The training data reflects the vendor's priorities, legal exposure, geographic considerations, and available datasets — not your domain expertise or your users' needs.
This matters more than it sounds. A model trained predominantly on English internet text has embedded assumptions about language, culture, and context that may not match your deployment context. A model where raters were instructed to prefer shorter answers will produce shorter answers — whether that's appropriate for your use case or not. A model where legal exposure shaped data filtering will have gaps that may be exactly your domain.
You're not configuring a tool with a system prompt. You're inheriting a full set of encoded preferences.
3. Inference Infrastructure: Your SLA Is Their SLA
Your product's availability is bounded above by your AI vendor's uptime. If the API is down, your AI feature is down. If latency spikes, your latency spikes. Your product's performance characteristics are partially outside your control.
Most major providers offer 99.9% uptime SLAs. That's 8.7 hours of downtime per year under the SLA — before any planned maintenance or edge cases that fall within SLA language but still cause degradation. If your product is business-critical, you are accepting that your vendor's infrastructure issues become your production incidents.
The vendor outage in November 2024 that took down Claude's API for several hours is a concrete example. Every product depending on that API had a production incident with no mitigation path other than waiting.
4. Pricing: Unilateral Cost Changes
Per-token pricing can change. It has changed. When a vendor updates pricing — either raising rates or changing tier structures — your unit economics shift without any action on your part.
OpenAI changed GPT-4 pricing multiple times. Anthropic updated Claude pricing when new model versions launched. Each change required engineering teams to re-evaluate build-versus-buy decisions, update financial models, and sometimes rearchitect to use cheaper endpoints.
For high-volume production workloads, this exposure is significant. A 20% price increase on a million daily API calls is a material budget impact that you have no contractual protection against beyond the terms you agreed to at signup.
5. Policy Changes: Retroactive Use Case Restrictions
Acceptable use policies evolve. What a vendor permits today, they may restrict tomorrow — particularly as AI regulation advances globally and vendors adjust policies to maintain compliance in different jurisdictions.
If your use case sits near any policy boundary — legal research, medical information, security tooling, financial advice, political content — you carry the risk that a policy update narrows the space your application operates in. The vendor will give you notice, probably. They will not grandfather your use case.
This creates a category of product risk that has no analog in traditional software dependencies. A library doesn't update its acceptable use policy. An API can.
6. Strategic Pivots: Your Vendor's Mission Just Changed
In early 2026, OpenAI signed a contract with the US Department of Defense to provide AI services for military applications. This is a factual business decision by a private company.
Here's what it means for every enterprise building on OpenAI APIs: your AI vendor is now also a defense contractor. The US Department of Defense is an implicit stakeholder in your AI stack. You didn't vote for that. It wasn't in any vendor selection criteria. It happened unilaterally.
Does this change how OpenAI develops models? Does it affect training priorities? Does it change how safety filtering is calibrated? Does it affect what use cases OpenAI prioritizes or deprioritizes? Probably not dramatically, in the short term. But you don't know. You can't see inside the model. You have no audit rights over how vendor priorities affect model behavior.
This is the sharpest version of the control problem: your vendor can make a strategic decision that materially changes what their AI is optimized for, and you will find out about it when it's announced publicly.
The Governance Gap This Creates
Every enterprise AI governance framework has policies, controls, and accountability chains for systems the enterprise controls. The provider boundary is a gap in that framework.
You can document your prompts. You can log your inputs and outputs. You can monitor latency and error rates. But you cannot audit the model's training data. You cannot observe a model update before it reaches production. You cannot pin to an exact model state and guarantee it won't change. You cannot verify that the vendor's internal processes align with your governance requirements.
AI Model Governance in Production covers the full governance framework this gap sits inside. The point here is specific: the gap exists structurally because you don't own the model.
What Model Ownership Actually Looks Like
The alternative is not building your own foundation model from scratch. It's fine-tuning an open-source foundation model on your domain data, owning the resulting weights, and controlling deployment yourself.
Concretely: you take a model like Llama 3, Mistral, or Qwen. You fine-tune it on your proprietary dataset — customer support conversations, domain-specific documents, labeled examples of your task. You now own a model checkpoint that produces outputs calibrated to your domain.
You export that checkpoint to GGUF format. GGUF is a portable, quantized model format that runs on Ollama, llama.cpp, and LM Studio. You run inference on your own hardware — a workstation, a server, or an edge device. The model does not change unless you decide to retrain. Updates are explicit. Rollback is possible. The training data lineage is yours to document.
This resolves all six control dimensions:
- No silent model updates — the weights are static until you retrain
- Training data is your data — you made those choices
- Inference runs on your infrastructure — your SLA, your uptime
- No per-token pricing — compute is a fixed or predictable cost
- No acceptable use policy — it's your model on your hardware
- No vendor strategic pivots — you're not dependent on anyone's mission
The Economics Are Better at Scale
API pricing for high-volume workloads is significantly more expensive than locally-run inference at scale. The math:
A GPT-4-class API call runs roughly $0.01-0.03 per 1,000 tokens on current pricing. A fine-tuned 7B parameter model running on a mid-range GPU runs at roughly $0.00004-0.0001 per 1,000 tokens in electricity cost at full utilization. That's a 99.6% cost reduction for comparable task performance on domain-specific tasks — where fine-tuned smaller models frequently match or exceed larger general-purpose models.
The hardware cost amortizes quickly at meaningful volume. At 500,000 API calls per month, the savings from locally-run inference pay for a dedicated inference machine in weeks.
See how build vs. rent economics work out →
The Path
Fine-tuning requires a labeled dataset, a training run, and an evaluation process. The tooling overhead has been the barrier for most teams — not the concept, but the infrastructure required to execute it.
Ertas Fine-Tuning SaaS is built to remove that barrier. Upload your dataset, configure your fine-tune through a visual interface, run on cloud GPUs, download the resulting GGUF. No MLOps infrastructure required. The resulting model is yours: portable, version-pinned, and deployable anywhere llama.cpp runs.
If you're running high-volume AI workloads on a cloud API, the question isn't whether fine-tuning is worth exploring. It's why you haven't done it yet.
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Keep reading

Who Controls Your AI Model's Behavior in Production? (It Might Not Be You)
Model behavior in production is determined by training data, RLHF choices, and safety filters — decisions made by the vendor, not you. Here's what that means for your business.

When Your AI Vendor Makes a Geopolitical Decision: What Enterprise Buyers Need to Know
OpenAI is now a defense contractor. Anthropic walked away. These are geopolitical decisions with operational consequences for every enterprise that depends on these models.

The Real Cost of API Dependency in Production AI: Beyond the Token Bill
Per-token costs are the visible part of API dependency. The invisible costs — operational risk, migration work, compliance exposure, behavioral lock-in — are usually larger.