
On-Premise vs Self-Hosted vs Air-Gapped: Choosing the Right AI Deployment for Sensitive Data
On-premise, self-hosted, and air-gapped are used interchangeably — but they mean different things and offer different compliance guarantees. Here's how to choose the right deployment model for sensitive AI data workloads.
Three terms appear constantly in vendor documentation, procurement checklists, and architecture reviews: on-premise, self-hosted, and air-gapped. They are used interchangeably. They should not be. Each describes a meaningfully different deployment model with different compliance implications, different operational costs, and different guarantees about where your data actually goes.
The imprecision matters because vendors exploit it. A tool marketed as "on-premise compatible" might mean Docker on AWS. A "self-hosted option" might still require the vendor's license server to phone home on startup. "Air-gapped support" might mean the binary runs without internet, but the configuration documentation assumes network connectivity for updates.
This article defines each term precisely, explains the compliance implications of each deployment model, and gives practical guidance for choosing the right one based on your regulatory context.
Precise Definitions
On-Premise
On-premise means the software runs on hardware that your organization owns or leases, in a physical location that you control — your building, your server room, your data center. The hardware is yours. The physical space is yours. Network traffic stays within your perimeter.
Key characteristic: the data center is yours. The servers are yours (or leased under your control). You are not renting compute from a cloud provider. AWS, Azure, and GCP deployments are never on-premise, regardless of how much control you have over the configuration.
This matters because data sovereignty requirements in some regulatory frameworks are specifically concerned with physical infrastructure ownership and jurisdiction, not just logical access control.
Self-Hosted
Self-hosted means your team manages the deployment, including installation, configuration, upgrades, and maintenance. The critical distinction from on-premise: self-hosted says nothing about where the infrastructure lives. Docker on AWS EC2 is self-hosted. Kubernetes on GKE is self-hosted. A VPS on Hetzner is self-hosted. None of these are on-premise.
Self-hosted implies operational responsibility, not infrastructure ownership. Your team runs the software, not the vendor. But the physical hardware may be owned by Amazon, Microsoft, Google, or another cloud provider in a data center in any jurisdiction.
This is where the vendor terminology gap is most damaging. Many tools market themselves as "self-hosted" as if it implies the privacy guarantees of on-premise deployment. It does not. A self-hosted deployment on a cloud provider means your data resides on hardware owned by that provider, subject to that provider's data agreements, potentially accessible under that jurisdiction's legal process.
Air-Gapped
Air-gapped means no network connectivity at runtime. The system is physically or logically isolated from external networks — including the internet, vendor update servers, and license validation endpoints. Data cannot enter or leave the system through a network connection because no network connection exists during operation.
Air-gapped is the most restrictive deployment model and provides the strongest guarantee against data exfiltration through network channels. It's required in some defense, intelligence, and critical infrastructure contexts, and increasingly relevant in healthcare and financial contexts where the sensitivity of the data justifies the operational complexity.
Note: air-gapped doesn't mean there's no network ever. It means the network is absent or isolated at runtime, when the sensitive data is being processed. Updates, patches, and initial installation typically happen on separate networks or via physical media (USB, optical disk) under controlled procedures.
Why Vendors Blur These Distinctions
The marketing incentive is straightforward: "self-hosted" sounds like "on-premise" to buyers who haven't thought carefully about the distinction, and "self-hosted" is much easier to build and sell than truly on-premise or air-gapped software.
Genuinely on-premise software must work without any cloud dependencies — no license servers, no telemetry, no update checks, no CDN-hosted assets. It must be packageable for internal distribution, installable on networks that have no internet connectivity, and maintainable without vendor access to the infrastructure. This is more expensive to build.
Air-gapped software must go further: no assumptions about network connectivity at any point during operation, no DNS lookups, no external API calls, no hardcoded CDN URLs, no background services that try to reach the internet. Many software tools that claim air-gap support fail in practice because a dependency somewhere in the stack tries to reach an external endpoint.
When evaluating vendor claims, the right questions are:
- Does the software make any network calls during operation? Can you verify this?
- Does the license validation require connectivity to a vendor server?
- Does the software transmit telemetry, error reports, or usage data?
- Will the software function without any internet connectivity for 90 days without manual intervention?
- What happens when it can't reach an update server — does it degrade or fail?
Compliance Implications by Deployment Model
GDPR and Data Transfer Restrictions
GDPR's Chapter V restricts transfers of personal data to third countries (countries outside the EU/EEA) that don't have an adequacy decision. Running a workload on AWS EU-West-1 (Ireland) may satisfy GDPR data residency requirements, but it's a self-hosted deployment, not on-premise. The data resides on AWS hardware, and AWS is a U.S.-headquartered company subject to U.S. legal process under frameworks like CLOUD Act.
Whether this is acceptable depends on your specific data, your legal team's interpretation, and your organization's risk tolerance. What it is not is equivalent to on-premise storage under your physical control.
For organizations subject to national data sovereignty requirements stricter than GDPR baseline — German BSI regulations, French SecNumCloud requirements, certain government agency mandates — "self-hosted on EU cloud" may not satisfy the requirement. Only hardware under your physical control satisfies these.
HIPAA: Covered Entities and Business Associates
Under HIPAA, when you use a cloud service provider to store or process PHI, that provider becomes a Business Associate. A Business Associate Agreement (BAA) is required. The existence of a BAA means data flows to that provider — they have contractual obligations around it, but they receive it.
Self-hosted on a cloud that provides a BAA (AWS, Azure, and GCP all do) satisfies HIPAA's BAA requirement. But the data still lives on the cloud provider's hardware. The cloud provider can still receive legal demands for data under U.S. law. Their employees (under strict controls) can still access infrastructure your data runs on.
On-premise deployment means PHI never reaches a cloud provider's infrastructure. There's no BAA to negotiate because there's no third-party processor. For the most sensitive clinical data — particularly in research contexts where patient privacy is paramount — this is a meaningfully different posture.
Air-gapped on-premise deployment goes further: there is no network path through which PHI can be exfiltrated, even accidentally. This is the appropriate model for the highest-sensitivity clinical research environments.
EU AI Act Article 10: Training Data Governance
The EU AI Act Article 10 requirements for high-risk AI systems include data governance obligations across the entire training data lifecycle. The regulation requires providers to document data collection, preparation, labeling, and quality assurance processes.
This isn't a deployment model requirement per se — it's a documentation and governance requirement. But deployment model affects your ability to satisfy it. If your training data preparation happens on a cloud platform with shared infrastructure, reconstructing a complete audit trail of data handling decisions is harder than if the entire pipeline runs on infrastructure you control and log entirely.
For EU-based organizations building high-risk AI systems, on-premise deployment of the data preparation pipeline (not just the model training step) makes the Article 10 documentation obligation more tractable.
Financial Services: FCA, DORA, and Data Residency
Financial services regulators in the UK (FCA), EU (DORA), and the U.S. (OCC, FINRA) have varying requirements around where financial data can reside and how operational continuity must be maintained. DORA specifically addresses ICT risk management and requires financial entities to maintain control over their critical ICT dependencies.
Cloud deployment for financial services is permitted under most of these frameworks but requires specific contractual terms, exit strategies, and operational resilience documentation. For the most sensitive data — trading algorithms, credit models, customer financial records — some institutions choose on-premise deployment to maintain unambiguous control.
Air-Gapped Requirements: Defense and Critical Infrastructure
Defense contractors, intelligence community vendors, and operators of critical infrastructure (power grids, water systems, healthcare systems) may be required to operate AI systems in classified or sensitive compartmented environments. These environments are physically isolated from unclassified networks by design.
In these contexts, "self-hosted on a government cloud" (GovCloud, C2S, etc.) is not air-gapped — it's still networked infrastructure. True air-gap means the system runs on hardware with no external network connection, and data moves via controlled physical media under documented chain-of-custody procedures.
Software that claims air-gap support for these environments must be tested, not trusted. Common failure modes include activation/licensing checks on startup, telemetry that attempts to phone home and times out, and dependency libraries that make DNS lookups.
Comparison Table
| Dimension | On-Premise | Self-Hosted (Cloud) | Air-Gapped |
|---|---|---|---|
| Hardware ownership | Your organization | Cloud provider | Your organization |
| Data residency guarantee | Yes (physical control) | Depends on provider region | Yes (physical control) |
| GDPR data transfer compliance | Strong | Depends on adequacy decision | Strong |
| HIPAA (PHI) | No BAA needed | BAA required with cloud provider | No BAA needed |
| Data accessible to cloud provider | No | Yes (under their terms) | No |
| Setup complexity | High (hardware procurement) | Moderate (DevOps) | Very high (secure logistics) |
| Maintenance burden | High (your team) | Moderate-high (your team) | Very high (your team, isolated) |
| Internet connectivity required | No (typically) | Yes (during operation) | No (by definition) |
| Right for: | Regulated industries, data sovereignty | Tech-fluent teams, no hard sovereignty req. | Defense, classified, highest-sensitivity |
Practical Guidance by Regulatory Context
HIPAA-covered entities processing PHI for AI training: Self-hosted on a HIPAA BAA-covered cloud is technically compliant but means PHI reaches the cloud provider's infrastructure. On-premise is cleaner. Air-gapped on-premise is appropriate for the most sensitive research data.
EU organizations subject to GDPR and EU AI Act: Self-hosted on EU-region cloud satisfies most requirements for non-sovereign data. For national security contexts or sectors with stricter data sovereignty mandates (healthcare research, government AI systems), on-premise is appropriate.
Financial institutions with model risk management requirements: Cloud with proper contracts and operational resilience documentation is acceptable for most regulatory frameworks. For proprietary models where competitive sensitivity is extreme, on-premise is preferred.
Defense contractors and government agencies with classified requirements: Air-gapped only. Self-hosted and on-premise networked environments are not equivalent to classified facility requirements.
Legal services (privilege and confidentiality): Self-hosted on your own infrastructure is often sufficient. On-premise provides cleaner privilege arguments (data never leaves the firm's physical control). Air-gapped is rarely required but exists in national security legal contexts.
What to Ask Vendors
When a vendor tells you their tool supports your deployment model, ask:
- "Does the software make any outbound network connections during operation?" Get a specific answer, not "it can be configured to run offline."
- "Does license validation require connectivity to your servers?" If yes, that's a dependency on the vendor's infrastructure.
- "Can you provide documentation of all external endpoints the software contacts?" Security teams can verify this with network monitoring.
- "Has this been deployed in a certified air-gapped environment? Can you provide a reference?" Claimed air-gap support and verified air-gap deployments are different.
- "What is your data telemetry policy? What data does the software transmit and where?" This should be documented in privacy policy or DPA, not just verbal assurance.
The answers to these questions reveal which deployment model a tool actually supports, regardless of what the marketing materials say.
Your data is the bottleneck — not your models.
Ertas Data Suite turns unstructured enterprise files into AI-ready datasets — on-premise, air-gapped, with full audit trail. One platform replaces 3–7 tools.
Related Reading
- On-Premise AI Data Preparation for Compliance — Why deployment model choice is foundational for regulated industry AI teams
- EU AI Act Article 10: Training Data Requirements — The specific data governance obligations for high-risk AI systems
- HIPAA-Compliant AI Training Data Guide — What HIPAA actually requires for AI training data workflows
- Air-Gapped Machine Learning Pipelines — Practical considerations for operating ML pipelines without network connectivity
- Data Sovereignty in Enterprise AI (2026) — How data sovereignty requirements are shaping enterprise AI deployment decisions
Turn unstructured data into AI-ready datasets — without it leaving the building.
On-premise data preparation with full audit trail. No data egress. No fragmented toolchains. EU AI Act Article 30 compliance built in.
Keep reading

How Cybersecurity Teams Build AI in Air-Gapped Environments
Cybersecurity teams deal with the most sensitive organizational data. Here's how to build AI data preparation and training pipelines that never touch the internet — including synthetic data generation with local LLMs.

Best RAG Pipeline for Financial Services: Air-Gapped Retrieval for PII-Heavy Data
Financial institutions handle PII-dense documents that cannot touch cloud infrastructure. Here is how to build an air-gapped RAG pipeline that meets SOC 2, GDPR, and internal audit requirements while keeping retrieval fast.

The Real Cost of Cloud Data Prep in Regulated Industries (2026)
Cloud data prep tools require compliance approvals that cost $50K–$150K and take 6–18 months. On-premise alternatives eliminate these costs entirely. Here's the TCO comparison regulated industries need.