Human-in-the-Loop vs. Human-on-the-Loop vs. Human-out-of-the-Loop: What's the Difference

In early 2026, OpenAI signed a contract with the US Department of Defense to provide AI services for military applications. Anthropic declined a similar deal, citing concerns about AI autonomy in lethal decision-making. The specific sticking point, in both cases, was a question that sounds academic until it isn't: where does the human sit relative to the decision?

That question — and its answer — is what separates human-in-the-loop from human-on-the-loop from human-out-of-the-loop. The terms get used interchangeably by vendors who should know better. They are not interchangeable. They represent fundamentally different control structures, risk profiles, and regulatory postures.

Precise Definitions

Human-in-the-Loop (HITL)

The human must act before the system proceeds. The AI proposes a decision, a recommended action, or a classification. The system then stops and waits. A qualified human reviews the AI's output and either approves, modifies, or rejects it. Only then does the action execute.

The key word is "stops." The AI cannot proceed without explicit human authorization.

Examples: A radiologist who must sign off on an AI-flagged imaging scan before the result enters the patient record. A credit officer who must approve or deny a loan application after reviewing the AI's score and reasoning. A pharmacist who must confirm a medication dosage recommendation before it's administered.

Human-on-the-Loop (HOTL)

The AI acts autonomously. A human monitors the system's outputs and has the ability to intervene — but the action happens before the human decides anything. The human's role is surveillance and override, not approval.

Examples: An autonomous trading algorithm that executes orders while a trader watches a dashboard. A content moderation AI that removes posts immediately, with a human moderator able to reverse decisions within a 24-hour window. An automated email campaign system with a human supervisor who can halt it if response metrics look wrong.

Human-out-of-the-Loop (HOOTL)

Fully autonomous. No human is involved in individual decisions. Humans set the parameters and may review aggregate performance, but the system runs without per-decision human involvement.

Examples: A spam filter that routes email without human review. A real-time fraud detection system that blocks transactions in milliseconds. A product recommendation engine that personalizes content for millions of users simultaneously.

Side-by-Side Comparison

Dimension	HITL	HOTL	HOOTL
Who decides	Human (AI recommends)	AI (human can override)	AI only
Decision latency	Human-speed	AI-speed	AI-speed
Error recovery	Pre-action: errors blocked	Post-action: errors reversible if caught	Post-action: errors may compound undetected
Regulatory standing	Required for high-risk AI (EU AI Act, FDA SaMD Class II/III)	Accepted for medium-risk with audit trail	Accepted only for low-risk, low-consequence decisions
Trust required	Lower (human validates each decision)	Higher (human must trust AI behavior)	Highest (full trust in AI system integrity)
Appropriate risk level	High-consequence, low-reversibility	Medium-consequence, reversible	Low-consequence or high-frequency, low-stakes

The OpenAI/DoD Question

The debate over OpenAI's DoD contract and Anthropic's refusal was, at its core, a disagreement about where on this spectrum military AI systems should sit.

Autonomous weapons systems — systems that identify and engage targets without per-target human authorization — are HOOTL by definition. A human sets the rules of engagement; the AI executes. No human approves the individual targeting decision.

HOTL weapons systems have a human watching but not blocking. The human could intervene, but the system fires by default unless overridden. In practice, the latency window for military engagement often makes HOTL functionally equivalent to HOOTL.

Anthropic's public position was that AI systems making lethal decisions require human authorization at the individual action level — HITL. That's not a philosophical nicety. It's a specific architectural requirement.

The reason this matters for enterprise AI buyers: you are often using the same foundation models, the same APIs, and the same vendor relationships as defense applications. The governance frameworks your vendor has chosen to build for their highest-stakes use cases signal how they think about human oversight for all use cases. It's worth understanding.

The Automation Bias Problem

The hardest thing about HOTL is that it's less autonomous than it looks on paper.

Decades of human factors research show that when humans monitor autonomous systems, they systematically over-trust them. Automation bias leads people to:

Fail to detect system errors they would have caught if doing the task manually
Accept AI recommendations without engaging their independent judgment
Respond more slowly and less accurately to anomalies because monitoring is cognitively different from deciding

A 1999 study of automated cockpit systems found that pilots who were "monitoring" automation missed simulated failures they would have caught when flying manually. The same phenomenon shows up in radiology, where readers reviewing AI-flagged images catch fewer cancers than readers who haven't seen the AI's annotation. The AI anchors their perception.

What this means in practice: HOTL systems frequently become de facto HOOTL systems, because the human monitor becomes a passive observer rather than an active reviewer.

This is why regulated industries increasingly require HITL rather than accepting HOTL as equivalent oversight. A human who can override but almost never does because the AI always looks right is not a meaningful control.

Regulatory Positions

FDA (SaMD): Class II and Class III Software as a Medical Device must provide decision support that a qualified clinician reviews and acts upon. Autonomous clinical AI that acts without clinician approval is not approvable for these risk classes. This is a HITL requirement.

Federal Reserve SR 11-7: Requires "effective challenge" — qualified humans who can independently assess AI model outputs, assumptions, and limitations. A monitoring dashboard that nobody meaningfully interrogates does not satisfy SR 11-7's effective challenge standard. The expectation is closer to HITL than HOTL for consequential financial decisions.

EU AI Act: High-risk AI systems must enable humans to "monitor, understand, and effectively override" AI outputs. The key word is "effectively" — not theoretically possible to override, but actually designed for meaningful intervention. Regulators have indicated that HOTL systems that are functionally HOOTL in practice will not satisfy this standard.

ABA Model Rules 5.1 and 5.3: Attorneys remain responsible for supervising work produced using AI tools. "I delegated it to AI" is not a defense in a bar complaint. This effectively requires HITL for any AI output used in legal representation.

When HOTL and HOOTL Are Appropriate

Not every decision warrants HITL. The architecture is expensive — in human time, in latency, in infrastructure. The right model depends on two factors: consequence severity and reversibility.

HOOTL is appropriate when: decisions are high-frequency, low-consequence, and easily reversible. Spam filtering. Product recommendations. Internal search ranking. If the AI is wrong, users see an irrelevant result or a false positive gets cleared from a spam folder. The error rate is manageable and the cost of human review far exceeds the cost of occasional errors.

HOTL is appropriate when: decisions are medium-consequence, mostly reversible, and the action window allows meaningful human review. Automated marketing emails with a 48-hour intervention window. Fraud holds that can be released by a customer service rep. Scheduled social media posts with a monitoring dashboard.

HITL is required when: the decision is high-consequence, difficult or impossible to reverse, and the error cost exceeds the overhead of human review. Clinical decisions. Financial determinations that affect people's livelihoods. Legal filings. Anything where being wrong creates regulatory, ethical, or legal liability.

A Maturity Framework

Organizations deploying AI for the first time often start at HITL even for low-stakes decisions — the overhead is worth the confidence. As they accumulate track record data and validate model performance, they can migrate lower-stakes decision types toward HOTL and HOOTL.

The maturity progression looks like this:

All decisions HITL — build the baseline, understand error rates, validate AI performance
Segment by risk — move high-confidence, low-consequence decisions to HOTL or HOOTL
Ongoing monitoring — maintain HITL sampling even for HOOTL decisions to catch drift
Recalibrate regularly — distribution shift, model updates, and process changes can move a decision that was safe at HOOTL back into HITL territory

The movement is not always toward more autonomy. Conditions change. Regulations change. Models degrade. A governance framework needs to support moving decisions back up the stack as well as down.

For the full implementation guide on designing HITL workflows, see What Is Human-in-the-Loop AI? — the hub article for this pillar covers risk tier frameworks, the three types of HITL, and regulatory requirements across industries.

The Bottom Line

HITL, HOTL, and HOOTL are not synonyms for "we have humans involved somehow." They describe where in the decision chain a human can actually affect the outcome. In high-stakes enterprise AI — healthcare, legal, financial services, defense — the distinction is the difference between governed AI and liability.

The vendors who conflate these terms are either confused or hoping you are.

Ertas Data Suite is built for teams that need genuine HITL-compatible AI pipelines: on-premise data preparation, operator-logged annotation, full audit trail, and no data egress. The architecture assumes human experts are in the loop at every stage — because that's what the regulations require and what the risk demands.

Book a discovery call with Ertas →

The question isn't whether to involve humans in your AI decisions. The question is whether your current architecture makes that involvement meaningful or theatrical.