Running Ollama for AI-Assisted Data Prep in Air-Gapped Enterprise Environments

Ollama's normal workflow assumes internet access: ollama pull mistral downloads model weights from a registry. In an air-gapped environment, there is no internet. No registry access, no dependency downloads, no phone-home telemetry. Everything needed to run must be physically transferred through an approved process.

This is the reality for data preparation projects in defense, intelligence, critical infrastructure, and high-security financial environments. The AI-assisted labeling workflow that works on your laptop with ollama run needs a different deployment path when the target machine has never seen the internet.

This guide covers the complete air-gapped Ollama deployment workflow: from model preparation on an internet-connected machine to validated operation on the isolated target.

Prerequisites

Before starting the transfer process, confirm these on the air-gapped target machine:

Operating system: Linux (Ubuntu 22.04/24.04, RHEL 8/9, Rocky Linux) or Windows 10/11. macOS is possible but less common in enterprise air-gapped environments.

GPU drivers: NVIDIA drivers must already be installed and functional on the target. Run nvidia-smi to verify. If GPU drivers aren't installed, that's a separate (and often painful) offline installation process — NVIDIA driver packages have their own dependency chain.

CUDA toolkit: Required for GPU inference. Must match the driver version. Verify with nvcc --version or check /usr/local/cuda/version.txt.

Disk space: Ollama's model storage defaults to ~/.ollama/models. A 7B Q4 model is ~4 GB. A 14B Q4 model is ~8 GB. Budget 50–100 GB for model storage if you'll maintain multiple models and quantization variants.

Approved transfer media: USB drives, optical media, or whatever the facility's information transfer policy allows. Some environments require specific media types with write-protection or encryption.

Step 1: Prepare Models on an Internet-Connected Machine

On a machine with internet access (your development workstation or a designated staging machine):

Install Ollama

curl -fsSL https://ollama.ai/install.sh | sh

Pull Target Models

Pull every model and quantization variant you'll need. You cannot pull additional models after transfer.

# Core labeling models
ollama pull mistral:7b-instruct-v0.3-q4_K_M
ollama pull mistral:7b-instruct-v0.3-q8_0
ollama pull qwen2.5:14b-instruct-q4_K_M
ollama pull qwen2.5:14b-instruct-q5_K_M

# Smaller model for lightweight tasks
ollama pull phi3:3.8b-mini-instruct-4k-q4_K_M

# Embedding model (if needed for dedup/similarity)
ollama pull nomic-embed-text

Verify Models Work

Run a test inference for each model to confirm they load and generate correctly:

ollama run mistral:7b-instruct-v0.3-q4_K_M "Classify this text as positive or negative: The product arrived on time."

Locate Model Files

Ollama stores model blobs in its model directory:

# Default locations
# Linux: ~/.ollama/models
# macOS: ~/.ollama/models
# Windows: C:\Users\<user>\.ollama\models

ls ~/.ollama/models/manifests/registry.ollama.ai/library/
ls ~/.ollama/models/blobs/

The manifests directory contains metadata (JSON files mapping model names to blob hashes). The blobs directory contains the actual model weights and configuration files.

Step 2: Package for Transfer

Option A: Copy the Entire Ollama Directory

The simplest approach — copy the complete ~/.ollama directory to the transfer media.

# Calculate the total size
du -sh ~/.ollama/models

# Copy to transfer media (e.g., encrypted USB drive)
cp -r ~/.ollama/models /media/transfer-drive/ollama-models/

Advantage: Guaranteed to include everything — manifests, blobs, and metadata. Disadvantage: Includes all models, even ones you might not need on the target. Can be very large if you've pulled many models.

Option B: Selective Model Export

For environments with strict transfer size limits, export only the models you need.

# Create a clean export directory
mkdir -p /media/transfer-drive/ollama-export

# Copy manifests for specific models
for model in mistral qwen2.5 phi3 nomic-embed-text; do
  cp -r ~/.ollama/models/manifests/registry.ollama.ai/library/$model \
    /media/transfer-drive/ollama-export/manifests/registry.ollama.ai/library/
done

# Read the manifests to identify required blobs, then copy those blobs
# (Each manifest references blob SHA256 hashes)

This is more complex but produces a smaller transfer package. A scripted approach is recommended to avoid missing blobs that manifests reference.

Option C: Use GGUF Files Directly (llama.cpp Fallback)

If the Ollama directory copy approach doesn't work cleanly, download GGUF files directly from HuggingFace and plan to use llama.cpp on the target:

# Download GGUF files
wget https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.3-GGUF/resolve/main/mistral-7b-instruct-v0.3.Q4_K_M.gguf

GGUF files are self-contained — a single file has the model weights, tokenizer, and configuration. This is the most portable format for air-gapped deployment.

Prepare the Ollama Binary

Also transfer the Ollama binary itself:

# Download the Ollama binary for the target OS/architecture
# Linux x86_64:
curl -L https://ollama.ai/download/ollama-linux-amd64 -o ollama
chmod +x ollama

# Copy to transfer media
cp ollama /media/transfer-drive/ollama-binary/

Calculate and Record Checksums

Verify transfer integrity:

sha256sum /media/transfer-drive/ollama-binary/ollama > /media/transfer-drive/checksums.txt
find /media/transfer-drive/ollama-models/ -type f -exec sha256sum {} + >> /media/transfer-drive/checksums.txt

Step 3: Transfer to Air-Gapped Machine

This step follows your organization's information transfer policy. Common patterns:

USB drive: Copy files to an approved USB drive. Some environments require encrypted drives or drives that have been scanned by a data diode.
Optical media: Burn to DVD or Blu-ray. Read-only media is sometimes preferred for security — it can't be written to by the receiving system.
Data diode transfer: Some high-security environments use hardware data diodes that allow one-way data transfer from low-security to high-security networks.

Whatever the mechanism, verify checksums after transfer:

sha256sum -c checksums.txt

Any checksum mismatch means the file was corrupted during transfer. Do not proceed with corrupted model files — they will either fail to load or produce incorrect outputs.

Step 4: Install and Configure on the Target Machine

Install Ollama Binary

# Copy binary to system path
sudo cp /media/transfer/ollama-binary/ollama /usr/local/bin/
sudo chmod +x /usr/local/bin/ollama

Set Up Model Directory

# If using Option A (full directory copy):
mkdir -p ~/.ollama
cp -r /media/transfer/ollama-models ~/.ollama/models

# Verify Ollama can see the models
ollama list

Expected output should show all transferred models with their sizes.

Configure Environment

# Prevent Ollama from attempting any network access
export OLLAMA_HOST=127.0.0.1:11434
export OLLAMA_ORIGINS=*

# Optional: custom model storage location
export OLLAMA_MODELS=/path/to/custom/model/directory

# For systems with multiple GPUs, specify which to use
export CUDA_VISIBLE_DEVICES=0

# Set parallel request count
export OLLAMA_NUM_PARALLEL=2

Start and Validate

# Start Ollama server
ollama serve &

# Verify model loads and runs
ollama run mistral:7b-instruct-v0.3-q4_K_M "Respond with 'OK' if you are working."

If the model loads and generates a response, the deployment is functional.

Model Selection for Data Prep Tasks

For air-gapped environments, model selection requires more thought because you can't easily swap models after deployment. Transfer the right models upfront.

Classification and Labeling

Primary: Mistral 7B Instruct or Llama 3.1 8B Instruct at Q4_K_M. Fast, accurate for binary and multi-class classification. Handles most document categorization tasks.

Fallback: Same model at Q8_0 for improved accuracy if Q4 proves insufficient. Transfer both quantizations so you have the option.

Entity Extraction

Primary: Qwen 2.5 14B Instruct at Q4_K_M or Q5_K_M. The larger model provides better accuracy on extracting specific entities (names, dates, amounts, legal citations) from complex documents.

Synthetic Data Generation

Primary: Qwen 2.5 14B Instruct at Q5_K_M. Generation quality benefits from higher quantization more than classification does.

Lightweight Tasks

Primary: Phi-3 Mini 3.8B at Q4_K_M. For simple binary classification, format detection, or language identification where a 7B model is overkill and a 3.8B model runs faster.

Recommended Transfer Package

For a general-purpose data preparation deployment, transfer:

Model	Quantization	Size	Purpose
Mistral 7B Instruct v0.3	Q4_K_M	~4 GB	Primary classification/labeling
Mistral 7B Instruct v0.3	Q8_0	~7.5 GB	High-accuracy fallback
Qwen 2.5 14B Instruct	Q4_K_M	~8 GB	Entity extraction, generation
Qwen 2.5 14B Instruct	Q5_K_M	~10 GB	High-quality generation
Phi-3 Mini 3.8B	Q4_K_M	~2.3 GB	Lightweight tasks
nomic-embed-text	Default	~275 MB	Embeddings (dedup/similarity)
Total		~32 GB

This package fits on a single 64 GB USB drive with room for the Ollama binary and documentation.

Operational Concerns

Model Updates Without Internet

When a new model version is released that improves labeling accuracy, the update cycle is:

Pull the new model on the internet-connected staging machine
Verify it works on representative test data
Transfer via the approved process
Install on the air-gapped machine
Validate against a held-out test set to confirm improvement

This cycle takes days to weeks depending on the organization's transfer approval process. Plan for it. Don't promise rapid model iteration in air-gapped environments.

Managing Multiple Model Versions

Keep previous model versions on the target machine until the new version is validated. The disk space cost is minimal compared to the risk of deploying a model that performs worse on your specific task.

# Ollama handles versioning by tag
# Both versions coexist
ollama list
# mistral:7b-instruct-v0.3-q4_K_M    4.1 GB
# mistral:7b-instruct-v0.2-q4_K_M    4.0 GB

GPU Memory Management

Ollama loads the model into GPU memory on first request and keeps it there. On a machine with limited VRAM serving multiple model sizes:

# Set timeout for unloading idle models (in minutes)
export OLLAMA_KEEP_ALIVE=5m

With a 5-minute keep-alive, switching between a 7B and 14B model takes ~10–15 seconds (model load time) after the previous model is evicted.

Common Failure Modes

Missing CUDA Libraries

Symptom: Ollama starts but inference runs on CPU (extremely slow) or fails with a CUDA error.

Cause: CUDA toolkit or cuDNN libraries not installed, or installed version doesn't match the GPU driver.

Fix: Verify nvidia-smi shows the GPU. Verify nvcc --version shows the CUDA toolkit. Ensure the CUDA toolkit version is compatible with the installed driver version. NVIDIA maintains a compatibility matrix.

Model Weight Corruption During Transfer

Symptom: Model fails to load with a cryptic error about invalid GGUF header or tensor shape mismatch.

Cause: File corrupted during the transfer process — incomplete copy, bad sector on USB media, or transfer interruption.

Fix: Compare checksums against the original. Re-transfer the corrupted files.

Insufficient VRAM

Symptom: Model loads partially, then crashes with an out-of-memory error.

Cause: The model (including KV cache for the configured context window) doesn't fit in GPU VRAM.

Fix: Use a smaller model or lower quantization. Or reduce OLLAMA_NUM_PARALLEL to 1 (fewer concurrent context windows). Or reduce the context window size.

Ollama Can't Find Models

Symptom: ollama list shows no models despite files being present on disk.

Cause: Model directory structure doesn't match what Ollama expects. The manifests reference blob hashes that must exist in the blobs directory.

Fix: Ensure the directory structure under ~/.ollama/models/ is intact: manifests/registry.ollama.ai/library/<model-name>/<tag> must point to blobs in blobs/sha256-<hash>.

DNS/Network Errors at Startup

Symptom: Ollama logs show DNS resolution failures or connection timeouts.

Cause: Ollama attempts to check for updates or resolve its registry hostname.

Fix: Set OLLAMA_HOST=127.0.0.1:11434 and ensure no proxy environment variables are set (unset http_proxy https_proxy HTTP_PROXY HTTPS_PROXY). Ollama should function without network access, but stale proxy configurations can cause timeouts that delay startup.

Integration with Data Preparation Tools

Ollama exposes an OpenAI-compatible API on localhost:11434. Any tool that can call the OpenAI API can be pointed at Ollama by changing the base URL.

Ertas Data Suite integrates with Ollama and llama.cpp natively. In an air-gapped deployment, the application detects Ollama running on localhost and presents available models in the labeling interface. The user selects a model, configures prompt templates for their labeling schema, and starts AI-assisted annotation — all without any network connectivity.

The native desktop architecture means the data preparation tool itself also requires no network access. There's no license server to contact, no telemetry to send, no cloud service to authenticate against. The entire stack — application, inference backend, and data — operates on a single air-gapped machine.

Deployment Checklist

Before declaring the air-gapped deployment operational:

nvidia-smi shows the correct GPU(s)
ollama list shows all expected models
Each model runs inference successfully
GPU utilization during inference is >0% (confirming GPU, not CPU, execution)
Inference speed matches expectations (~30–60 tok/s for 7B Q4 on consumer GPU)
No network errors in Ollama logs
Checksums for all transferred files verified
Model outputs validated against known-good examples from the staging environment

Get this right before the engagement team arrives. Debugging GPU drivers in an air-gapped SCIF is not how anyone wants to spend billable hours.