On-Premise vs Self-Hosted vs Air-Gapped: Choosing the Right AI Deployment for Sensitive Data

Three terms appear constantly in vendor documentation, procurement checklists, and architecture reviews: on-premise, self-hosted, and air-gapped. They are used interchangeably. They should not be. Each describes a meaningfully different deployment model with different compliance implications, different operational costs, and different guarantees about where your data actually goes.

The imprecision matters because vendors exploit it. A tool marketed as "on-premise compatible" might mean Docker on AWS. A "self-hosted option" might still require the vendor's license server to phone home on startup. "Air-gapped support" might mean the binary runs without internet, but the configuration documentation assumes network connectivity for updates.

This article defines each term precisely, explains the compliance implications of each deployment model, and gives practical guidance for choosing the right one based on your regulatory context.

Precise Definitions

On-Premise

On-premise means the software runs on hardware that your organization owns or leases, in a physical location that you control — your building, your server room, your data center. The hardware is yours. The physical space is yours. Network traffic stays within your perimeter.

Key characteristic: the data center is yours. The servers are yours (or leased under your control). You are not renting compute from a cloud provider. AWS, Azure, and GCP deployments are never on-premise, regardless of how much control you have over the configuration.

This matters because data sovereignty requirements in some regulatory frameworks are specifically concerned with physical infrastructure ownership and jurisdiction, not just logical access control.

Self-Hosted

Self-hosted means your team manages the deployment, including installation, configuration, upgrades, and maintenance. The critical distinction from on-premise: self-hosted says nothing about where the infrastructure lives. Docker on AWS EC2 is self-hosted. Kubernetes on GKE is self-hosted. A VPS on Hetzner is self-hosted. None of these are on-premise.

Self-hosted implies operational responsibility, not infrastructure ownership. Your team runs the software, not the vendor. But the physical hardware may be owned by Amazon, Microsoft, Google, or another cloud provider in a data center in any jurisdiction.

This is where the vendor terminology gap is most damaging. Many tools market themselves as "self-hosted" as if it implies the privacy guarantees of on-premise deployment. It does not. A self-hosted deployment on a cloud provider means your data resides on hardware owned by that provider, subject to that provider's data agreements, potentially accessible under that jurisdiction's legal process.

Air-Gapped

Air-gapped means no network connectivity at runtime. The system is physically or logically isolated from external networks — including the internet, vendor update servers, and license validation endpoints. Data cannot enter or leave the system through a network connection because no network connection exists during operation.

Air-gapped is the most restrictive deployment model and provides the strongest guarantee against data exfiltration through network channels. It's required in some defense, intelligence, and critical infrastructure contexts, and increasingly relevant in healthcare and financial contexts where the sensitivity of the data justifies the operational complexity.

Note: air-gapped doesn't mean there's no network ever. It means the network is absent or isolated at runtime, when the sensitive data is being processed. Updates, patches, and initial installation typically happen on separate networks or via physical media (USB, optical disk) under controlled procedures.

Why Vendors Blur These Distinctions

The marketing incentive is straightforward: "self-hosted" sounds like "on-premise" to buyers who haven't thought carefully about the distinction, and "self-hosted" is much easier to build and sell than truly on-premise or air-gapped software.

Genuinely on-premise software must work without any cloud dependencies — no license servers, no telemetry, no update checks, no CDN-hosted assets. It must be packageable for internal distribution, installable on networks that have no internet connectivity, and maintainable without vendor access to the infrastructure. This is more expensive to build.

Air-gapped software must go further: no assumptions about network connectivity at any point during operation, no DNS lookups, no external API calls, no hardcoded CDN URLs, no background services that try to reach the internet. Many software tools that claim air-gap support fail in practice because a dependency somewhere in the stack tries to reach an external endpoint.

When evaluating vendor claims, the right questions are:

Does the software make any network calls during operation? Can you verify this?
Does the license validation require connectivity to a vendor server?
Does the software transmit telemetry, error reports, or usage data?
Will the software function without any internet connectivity for 90 days without manual intervention?
What happens when it can't reach an update server — does it degrade or fail?

Compliance Implications by Deployment Model

GDPR's Chapter V restricts transfers of personal data to third countries (countries outside the EU/EEA) that don't have an adequacy decision. Running a workload on AWS EU-West-1 (Ireland) may satisfy GDPR data residency requirements, but it's a self-hosted deployment, not on-premise. The data resides on AWS hardware, and AWS is a U.S.-headquartered company subject to U.S. legal process under frameworks like CLOUD Act.

Whether this is acceptable depends on your specific data, your legal team's interpretation, and your organization's risk tolerance. What it is not is equivalent to on-premise storage under your physical control.

For organizations subject to national data sovereignty requirements stricter than GDPR baseline — German BSI regulations, French SecNumCloud requirements, certain government agency mandates — "self-hosted on EU cloud" may not satisfy the requirement. Only hardware under your physical control satisfies these.

HIPAA: Covered Entities and Business Associates

Under HIPAA, when you use a cloud service provider to store or process PHI, that provider becomes a Business Associate. A Business Associate Agreement (BAA) is required. The existence of a BAA means data flows to that provider — they have contractual obligations around it, but they receive it.

Self-hosted on a cloud that provides a BAA (AWS, Azure, and GCP all do) satisfies HIPAA's BAA requirement. But the data still lives on the cloud provider's hardware. The cloud provider can still receive legal demands for data under U.S. law. Their employees (under strict controls) can still access infrastructure your data runs on.

On-premise deployment means PHI never reaches a cloud provider's infrastructure. There's no BAA to negotiate because there's no third-party processor. For the most sensitive clinical data — particularly in research contexts where patient privacy is paramount — this is a meaningfully different posture.

Air-gapped on-premise deployment goes further: there is no network path through which PHI can be exfiltrated, even accidentally. This is the appropriate model for the highest-sensitivity clinical research environments.

EU AI Act Article 10: Training Data Governance

The EU AI Act Article 10 requirements for high-risk AI systems include data governance obligations across the entire training data lifecycle. The regulation requires providers to document data collection, preparation, labeling, and quality assurance processes.

This isn't a deployment model requirement per se — it's a documentation and governance requirement. But deployment model affects your ability to satisfy it. If your training data preparation happens on a cloud platform with shared infrastructure, reconstructing a complete audit trail of data handling decisions is harder than if the entire pipeline runs on infrastructure you control and log entirely.

For EU-based organizations building high-risk AI systems, on-premise deployment of the data preparation pipeline (not just the model training step) makes the Article 10 documentation obligation more tractable.

Financial Services: FCA, DORA, and Data Residency

Financial services regulators in the UK (FCA), EU (DORA), and the U.S. (OCC, FINRA) have varying requirements around where financial data can reside and how operational continuity must be maintained. DORA specifically addresses ICT risk management and requires financial entities to maintain control over their critical ICT dependencies.

Cloud deployment for financial services is permitted under most of these frameworks but requires specific contractual terms, exit strategies, and operational resilience documentation. For the most sensitive data — trading algorithms, credit models, customer financial records — some institutions choose on-premise deployment to maintain unambiguous control.

Air-Gapped Requirements: Defense and Critical Infrastructure

Defense contractors, intelligence community vendors, and operators of critical infrastructure (power grids, water systems, healthcare systems) may be required to operate AI systems in classified or sensitive compartmented environments. These environments are physically isolated from unclassified networks by design.

In these contexts, "self-hosted on a government cloud" (GovCloud, C2S, etc.) is not air-gapped — it's still networked infrastructure. True air-gap means the system runs on hardware with no external network connection, and data moves via controlled physical media under documented chain-of-custody procedures.

Software that claims air-gap support for these environments must be tested, not trusted. Common failure modes include activation/licensing checks on startup, telemetry that attempts to phone home and times out, and dependency libraries that make DNS lookups.

Comparison Table

Dimension	On-Premise	Self-Hosted (Cloud)	Air-Gapped
Hardware ownership	Your organization	Cloud provider	Your organization
Data residency guarantee	Yes (physical control)	Depends on provider region	Yes (physical control)
GDPR data transfer compliance	Strong	Depends on adequacy decision	Strong
HIPAA (PHI)	No BAA needed	BAA required with cloud provider	No BAA needed
Data accessible to cloud provider	No	Yes (under their terms)	No
Setup complexity	High (hardware procurement)	Moderate (DevOps)	Very high (secure logistics)
Maintenance burden	High (your team)	Moderate-high (your team)	Very high (your team, isolated)
Internet connectivity required	No (typically)	Yes (during operation)	No (by definition)
Right for:	Regulated industries, data sovereignty	Tech-fluent teams, no hard sovereignty req.	Defense, classified, highest-sensitivity

Practical Guidance by Regulatory Context

HIPAA-covered entities processing PHI for AI training: Self-hosted on a HIPAA BAA-covered cloud is technically compliant but means PHI reaches the cloud provider's infrastructure. On-premise is cleaner. Air-gapped on-premise is appropriate for the most sensitive research data.

EU organizations subject to GDPR and EU AI Act: Self-hosted on EU-region cloud satisfies most requirements for non-sovereign data. For national security contexts or sectors with stricter data sovereignty mandates (healthcare research, government AI systems), on-premise is appropriate.

Financial institutions with model risk management requirements: Cloud with proper contracts and operational resilience documentation is acceptable for most regulatory frameworks. For proprietary models where competitive sensitivity is extreme, on-premise is preferred.

Defense contractors and government agencies with classified requirements: Air-gapped only. Self-hosted and on-premise networked environments are not equivalent to classified facility requirements.

Legal services (privilege and confidentiality): Self-hosted on your own infrastructure is often sufficient. On-premise provides cleaner privilege arguments (data never leaves the firm's physical control). Air-gapped is rarely required but exists in national security legal contexts.

What to Ask Vendors

When a vendor tells you their tool supports your deployment model, ask:

"Does the software make any outbound network connections during operation?" Get a specific answer, not "it can be configured to run offline."
"Does license validation require connectivity to your servers?" If yes, that's a dependency on the vendor's infrastructure.
"Can you provide documentation of all external endpoints the software contacts?" Security teams can verify this with network monitoring.
"Has this been deployed in a certified air-gapped environment? Can you provide a reference?" Claimed air-gap support and verified air-gap deployments are different.
"What is your data telemetry policy? What data does the software transmit and where?" This should be documented in privacy policy or DPA, not just verbal assurance.

The answers to these questions reveal which deployment model a tool actually supports, regardless of what the marketing materials say.

On-Premise AI Data Preparation for Compliance — Why deployment model choice is foundational for regulated industry AI teams
EU AI Act Article 10: Training Data Requirements — The specific data governance obligations for high-risk AI systems
HIPAA-Compliant AI Training Data Guide — What HIPAA actually requires for AI training data workflows
Air-Gapped Machine Learning Pipelines — Practical considerations for operating ML pipelines without network connectivity
Data Sovereignty in Enterprise AI (2026) — How data sovereignty requirements are shaping enterprise AI deployment decisions

On-Premise vs Self-Hosted vs Air-Gapped: Choosing the Right AI Deployment for Sensitive Data

Precise Definitions

On-Premise

Self-Hosted

Air-Gapped

Why Vendors Blur These Distinctions

Compliance Implications by Deployment Model

HIPAA: Covered Entities and Business Associates

EU AI Act Article 10: Training Data Governance

Financial Services: FCA, DORA, and Data Residency

Air-Gapped Requirements: Defense and Critical Infrastructure

Comparison Table

Practical Guidance by Regulatory Context

What to Ask Vendors

Turn unstructured data into AI-ready datasets — without it leaving the building.

Keep reading

How Cybersecurity Teams Build AI in Air-Gapped Environments

Best RAG Pipeline for Financial Services: Air-Gapped Retrieval for PII-Heavy Data

The Real Cost of Cloud Data Prep in Regulated Industries (2026)

Precise Definitions

On-Premise

Self-Hosted

Air-Gapped

Why Vendors Blur These Distinctions

Compliance Implications by Deployment Model

GDPR and Data Transfer Restrictions

HIPAA: Covered Entities and Business Associates

EU AI Act Article 10: Training Data Governance

Financial Services: FCA, DORA, and Data Residency

Air-Gapped Requirements: Defense and Critical Infrastructure

Comparison Table

Practical Guidance by Regulatory Context

What to Ask Vendors

Related Reading

Turn unstructured data into AI-ready datasets — without it leaving the building.

Keep reading

How Cybersecurity Teams Build AI in Air-Gapped Environments

Best RAG Pipeline for Financial Services: Air-Gapped Retrieval for PII-Heavy Data

The Real Cost of Cloud Data Prep in Regulated Industries (2026)