Back to blog
    Running Ollama for AI-Assisted Data Prep in Air-Gapped Enterprise Environments
    ollamaair-gappedon-premiselocal-llmdata-preparationenterprisesecurityofflinesegment:service-provider

    Running Ollama for AI-Assisted Data Prep in Air-Gapped Enterprise Environments

    Step-by-step guide to deploying Ollama for AI-assisted data labeling in air-gapped environments — model transfer, offline setup, GPU configuration, and common failure modes.

    EErtas Team·

    Ollama's normal workflow assumes internet access: ollama pull mistral downloads model weights from a registry. In an air-gapped environment, there is no internet. No registry access, no dependency downloads, no phone-home telemetry. Everything needed to run must be physically transferred through an approved process.

    This is the reality for data preparation projects in defense, intelligence, critical infrastructure, and high-security financial environments. The AI-assisted labeling workflow that works on your laptop with ollama run needs a different deployment path when the target machine has never seen the internet.

    This guide covers the complete air-gapped Ollama deployment workflow: from model preparation on an internet-connected machine to validated operation on the isolated target.


    Prerequisites

    Before starting the transfer process, confirm these on the air-gapped target machine:

    Operating system: Linux (Ubuntu 22.04/24.04, RHEL 8/9, Rocky Linux) or Windows 10/11. macOS is possible but less common in enterprise air-gapped environments.

    GPU drivers: NVIDIA drivers must already be installed and functional on the target. Run nvidia-smi to verify. If GPU drivers aren't installed, that's a separate (and often painful) offline installation process — NVIDIA driver packages have their own dependency chain.

    CUDA toolkit: Required for GPU inference. Must match the driver version. Verify with nvcc --version or check /usr/local/cuda/version.txt.

    Disk space: Ollama's model storage defaults to ~/.ollama/models. A 7B Q4 model is ~4 GB. A 14B Q4 model is ~8 GB. Budget 50–100 GB for model storage if you'll maintain multiple models and quantization variants.

    Approved transfer media: USB drives, optical media, or whatever the facility's information transfer policy allows. Some environments require specific media types with write-protection or encryption.


    Step 1: Prepare Models on an Internet-Connected Machine

    On a machine with internet access (your development workstation or a designated staging machine):

    Install Ollama

    curl -fsSL https://ollama.ai/install.sh | sh
    

    Pull Target Models

    Pull every model and quantization variant you'll need. You cannot pull additional models after transfer.

    # Core labeling models
    ollama pull mistral:7b-instruct-v0.3-q4_K_M
    ollama pull mistral:7b-instruct-v0.3-q8_0
    ollama pull qwen2.5:14b-instruct-q4_K_M
    ollama pull qwen2.5:14b-instruct-q5_K_M
    
    # Smaller model for lightweight tasks
    ollama pull phi3:3.8b-mini-instruct-4k-q4_K_M
    
    # Embedding model (if needed for dedup/similarity)
    ollama pull nomic-embed-text
    

    Verify Models Work

    Run a test inference for each model to confirm they load and generate correctly:

    ollama run mistral:7b-instruct-v0.3-q4_K_M "Classify this text as positive or negative: The product arrived on time."
    

    Locate Model Files

    Ollama stores model blobs in its model directory:

    # Default locations
    # Linux: ~/.ollama/models
    # macOS: ~/.ollama/models
    # Windows: C:\Users\<user>\.ollama\models
    
    ls ~/.ollama/models/manifests/registry.ollama.ai/library/
    ls ~/.ollama/models/blobs/
    

    The manifests directory contains metadata (JSON files mapping model names to blob hashes). The blobs directory contains the actual model weights and configuration files.


    Step 2: Package for Transfer

    Option A: Copy the Entire Ollama Directory

    The simplest approach — copy the complete ~/.ollama directory to the transfer media.

    # Calculate the total size
    du -sh ~/.ollama/models
    
    # Copy to transfer media (e.g., encrypted USB drive)
    cp -r ~/.ollama/models /media/transfer-drive/ollama-models/
    

    Advantage: Guaranteed to include everything — manifests, blobs, and metadata. Disadvantage: Includes all models, even ones you might not need on the target. Can be very large if you've pulled many models.

    Option B: Selective Model Export

    For environments with strict transfer size limits, export only the models you need.

    # Create a clean export directory
    mkdir -p /media/transfer-drive/ollama-export
    
    # Copy manifests for specific models
    for model in mistral qwen2.5 phi3 nomic-embed-text; do
      cp -r ~/.ollama/models/manifests/registry.ollama.ai/library/$model \
        /media/transfer-drive/ollama-export/manifests/registry.ollama.ai/library/
    done
    
    # Read the manifests to identify required blobs, then copy those blobs
    # (Each manifest references blob SHA256 hashes)
    

    This is more complex but produces a smaller transfer package. A scripted approach is recommended to avoid missing blobs that manifests reference.

    Option C: Use GGUF Files Directly (llama.cpp Fallback)

    If the Ollama directory copy approach doesn't work cleanly, download GGUF files directly from HuggingFace and plan to use llama.cpp on the target:

    # Download GGUF files
    wget https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.3-GGUF/resolve/main/mistral-7b-instruct-v0.3.Q4_K_M.gguf
    

    GGUF files are self-contained — a single file has the model weights, tokenizer, and configuration. This is the most portable format for air-gapped deployment.

    Prepare the Ollama Binary

    Also transfer the Ollama binary itself:

    # Download the Ollama binary for the target OS/architecture
    # Linux x86_64:
    curl -L https://ollama.ai/download/ollama-linux-amd64 -o ollama
    chmod +x ollama
    
    # Copy to transfer media
    cp ollama /media/transfer-drive/ollama-binary/
    

    Calculate and Record Checksums

    Verify transfer integrity:

    sha256sum /media/transfer-drive/ollama-binary/ollama > /media/transfer-drive/checksums.txt
    find /media/transfer-drive/ollama-models/ -type f -exec sha256sum {} + >> /media/transfer-drive/checksums.txt
    

    Step 3: Transfer to Air-Gapped Machine

    This step follows your organization's information transfer policy. Common patterns:

    1. USB drive: Copy files to an approved USB drive. Some environments require encrypted drives or drives that have been scanned by a data diode.
    2. Optical media: Burn to DVD or Blu-ray. Read-only media is sometimes preferred for security — it can't be written to by the receiving system.
    3. Data diode transfer: Some high-security environments use hardware data diodes that allow one-way data transfer from low-security to high-security networks.

    Whatever the mechanism, verify checksums after transfer:

    sha256sum -c checksums.txt
    

    Any checksum mismatch means the file was corrupted during transfer. Do not proceed with corrupted model files — they will either fail to load or produce incorrect outputs.


    Step 4: Install and Configure on the Target Machine

    Install Ollama Binary

    # Copy binary to system path
    sudo cp /media/transfer/ollama-binary/ollama /usr/local/bin/
    sudo chmod +x /usr/local/bin/ollama
    

    Set Up Model Directory

    # If using Option A (full directory copy):
    mkdir -p ~/.ollama
    cp -r /media/transfer/ollama-models ~/.ollama/models
    
    # Verify Ollama can see the models
    ollama list
    

    Expected output should show all transferred models with their sizes.

    Configure Environment

    # Prevent Ollama from attempting any network access
    export OLLAMA_HOST=127.0.0.1:11434
    export OLLAMA_ORIGINS=*
    
    # Optional: custom model storage location
    export OLLAMA_MODELS=/path/to/custom/model/directory
    
    # For systems with multiple GPUs, specify which to use
    export CUDA_VISIBLE_DEVICES=0
    
    # Set parallel request count
    export OLLAMA_NUM_PARALLEL=2
    

    Start and Validate

    # Start Ollama server
    ollama serve &
    
    # Verify model loads and runs
    ollama run mistral:7b-instruct-v0.3-q4_K_M "Respond with 'OK' if you are working."
    

    If the model loads and generates a response, the deployment is functional.


    Model Selection for Data Prep Tasks

    For air-gapped environments, model selection requires more thought because you can't easily swap models after deployment. Transfer the right models upfront.

    Classification and Labeling

    Primary: Mistral 7B Instruct or Llama 3.1 8B Instruct at Q4_K_M. Fast, accurate for binary and multi-class classification. Handles most document categorization tasks.

    Fallback: Same model at Q8_0 for improved accuracy if Q4 proves insufficient. Transfer both quantizations so you have the option.

    Entity Extraction

    Primary: Qwen 2.5 14B Instruct at Q4_K_M or Q5_K_M. The larger model provides better accuracy on extracting specific entities (names, dates, amounts, legal citations) from complex documents.

    Synthetic Data Generation

    Primary: Qwen 2.5 14B Instruct at Q5_K_M. Generation quality benefits from higher quantization more than classification does.

    Lightweight Tasks

    Primary: Phi-3 Mini 3.8B at Q4_K_M. For simple binary classification, format detection, or language identification where a 7B model is overkill and a 3.8B model runs faster.

    For a general-purpose data preparation deployment, transfer:

    ModelQuantizationSizePurpose
    Mistral 7B Instruct v0.3Q4_K_M~4 GBPrimary classification/labeling
    Mistral 7B Instruct v0.3Q8_0~7.5 GBHigh-accuracy fallback
    Qwen 2.5 14B InstructQ4_K_M~8 GBEntity extraction, generation
    Qwen 2.5 14B InstructQ5_K_M~10 GBHigh-quality generation
    Phi-3 Mini 3.8BQ4_K_M~2.3 GBLightweight tasks
    nomic-embed-textDefault~275 MBEmbeddings (dedup/similarity)
    Total~32 GB

    This package fits on a single 64 GB USB drive with room for the Ollama binary and documentation.


    Operational Concerns

    Model Updates Without Internet

    When a new model version is released that improves labeling accuracy, the update cycle is:

    1. Pull the new model on the internet-connected staging machine
    2. Verify it works on representative test data
    3. Transfer via the approved process
    4. Install on the air-gapped machine
    5. Validate against a held-out test set to confirm improvement

    This cycle takes days to weeks depending on the organization's transfer approval process. Plan for it. Don't promise rapid model iteration in air-gapped environments.

    Managing Multiple Model Versions

    Keep previous model versions on the target machine until the new version is validated. The disk space cost is minimal compared to the risk of deploying a model that performs worse on your specific task.

    # Ollama handles versioning by tag
    # Both versions coexist
    ollama list
    # mistral:7b-instruct-v0.3-q4_K_M    4.1 GB
    # mistral:7b-instruct-v0.2-q4_K_M    4.0 GB
    

    GPU Memory Management

    Ollama loads the model into GPU memory on first request and keeps it there. On a machine with limited VRAM serving multiple model sizes:

    # Set timeout for unloading idle models (in minutes)
    export OLLAMA_KEEP_ALIVE=5m
    

    With a 5-minute keep-alive, switching between a 7B and 14B model takes ~10–15 seconds (model load time) after the previous model is evicted.


    Common Failure Modes

    Missing CUDA Libraries

    Symptom: Ollama starts but inference runs on CPU (extremely slow) or fails with a CUDA error.

    Cause: CUDA toolkit or cuDNN libraries not installed, or installed version doesn't match the GPU driver.

    Fix: Verify nvidia-smi shows the GPU. Verify nvcc --version shows the CUDA toolkit. Ensure the CUDA toolkit version is compatible with the installed driver version. NVIDIA maintains a compatibility matrix.

    Model Weight Corruption During Transfer

    Symptom: Model fails to load with a cryptic error about invalid GGUF header or tensor shape mismatch.

    Cause: File corrupted during the transfer process — incomplete copy, bad sector on USB media, or transfer interruption.

    Fix: Compare checksums against the original. Re-transfer the corrupted files.

    Insufficient VRAM

    Symptom: Model loads partially, then crashes with an out-of-memory error.

    Cause: The model (including KV cache for the configured context window) doesn't fit in GPU VRAM.

    Fix: Use a smaller model or lower quantization. Or reduce OLLAMA_NUM_PARALLEL to 1 (fewer concurrent context windows). Or reduce the context window size.

    Ollama Can't Find Models

    Symptom: ollama list shows no models despite files being present on disk.

    Cause: Model directory structure doesn't match what Ollama expects. The manifests reference blob hashes that must exist in the blobs directory.

    Fix: Ensure the directory structure under ~/.ollama/models/ is intact: manifests/registry.ollama.ai/library/<model-name>/<tag> must point to blobs in blobs/sha256-<hash>.

    DNS/Network Errors at Startup

    Symptom: Ollama logs show DNS resolution failures or connection timeouts.

    Cause: Ollama attempts to check for updates or resolve its registry hostname.

    Fix: Set OLLAMA_HOST=127.0.0.1:11434 and ensure no proxy environment variables are set (unset http_proxy https_proxy HTTP_PROXY HTTPS_PROXY). Ollama should function without network access, but stale proxy configurations can cause timeouts that delay startup.


    Integration with Data Preparation Tools

    Ollama exposes an OpenAI-compatible API on localhost:11434. Any tool that can call the OpenAI API can be pointed at Ollama by changing the base URL.

    Ertas Data Suite integrates with Ollama and llama.cpp natively. In an air-gapped deployment, the application detects Ollama running on localhost and presents available models in the labeling interface. The user selects a model, configures prompt templates for their labeling schema, and starts AI-assisted annotation — all without any network connectivity.

    The native desktop architecture means the data preparation tool itself also requires no network access. There's no license server to contact, no telemetry to send, no cloud service to authenticate against. The entire stack — application, inference backend, and data — operates on a single air-gapped machine.


    Deployment Checklist

    Before declaring the air-gapped deployment operational:

    • nvidia-smi shows the correct GPU(s)
    • ollama list shows all expected models
    • Each model runs inference successfully
    • GPU utilization during inference is >0% (confirming GPU, not CPU, execution)
    • Inference speed matches expectations (~30–60 tok/s for 7B Q4 on consumer GPU)
    • No network errors in Ollama logs
    • Checksums for all transferred files verified
    • Model outputs validated against known-good examples from the staging environment

    Get this right before the engagement team arrives. Debugging GPU drivers in an air-gapped SCIF is not how anyone wants to spend billable hours.

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Keep reading