Back to blog
    Multi-Client Project Isolation in On-Premise Data Prep Pipelines
    data-preparationproject-isolationon-premisemulti-tenantdata-securityaudit-trailsegment:service-provider

    Multi-Client Project Isolation in On-Premise Data Prep Pipelines

    How ML service providers can manage 5–20 client projects simultaneously with proper data isolation, audit trails, and zero cross-contamination.

    EErtas Team·

    When you are delivering data preparation services to one enterprise client, isolation is simple — there is only one dataset. When you are managing five, ten, or twenty client projects simultaneously, isolation becomes an operational problem that, if solved poorly, creates legal, compliance, and quality risks.

    This is a technical guide for ML service providers who run on-premise data prep pipelines for multiple enterprise clients concurrently. It covers why isolation matters, what approaches exist, and how to implement project separation without operational overhead that scales linearly with client count.


    Why Client Isolation Matters

    Every enterprise client engagement operates under a contract — an MSA, SOW, NDA, or all three. These contracts typically specify that the client's data will not be commingled with other clients' data. If Client A's training data is accidentally included in Client B's export, you have a contractual breach. In regulated industries, it may also be a regulatory violation.

    Data Confidentiality

    Enterprise data is confidential by default. A healthcare client's clinical notes, a law firm's privileged documents, a financial institution's transaction records — none of these should be visible to anyone working on a different client's project. Even within your own team, access should be scoped to the project.

    Training Data Cross-Contamination

    This is the technical risk that is easy to underestimate. If data from Client A leaks into Client B's training dataset, the resulting model is contaminated. It may contain patterns, terminology, or information from Client A's domain. This is not hypothetical — it happens when pipelines share intermediate storage, when export scripts pull from the wrong project directory, or when labeling queues are not properly filtered.

    Audit Trail Independence

    Each client's data lineage must be independently exportable. When Client A asks for an audit report showing every transformation applied to their data, that report must contain only their data — no references to other clients, no shared processing logs, no ambiguous provenance records.


    Approaches to Client Isolation

    Separate Installations Per Client

    The most conservative approach: install a completely separate instance of every tool for each client. Separate machines, separate storage, separate user accounts.

    Advantages: Maximum isolation. No shared state, no shared storage, no shared configuration.

    Disadvantages: Operational overhead scales linearly. Ten clients means ten installations to maintain, ten sets of updates to apply, ten environments to monitor. For a small team managing many projects, this becomes unworkable.

    Project-Level Isolation Within a Single Tool

    A single installation with built-in project separation: each client's data lives in a named, isolated project. Projects do not share data, labels, configurations, or export outputs. Users are assigned to projects with explicit permissions.

    Advantages: Operational overhead is constant regardless of client count. One installation to maintain. One set of updates. Project switching is fast.

    Disadvantages: Requires that the tool actually enforces isolation at the project level — not just in the UI, but in the storage layer and audit trail. Not all tools do this.

    RBAC (Role-Based Access Control)

    Layer access controls on top of shared infrastructure. Users see only the projects they are authorized to access. Administrators see all projects.

    Advantages: Flexible. Supports team structures where some people work across multiple clients.

    Disadvantages: RBAC alone does not prevent data cross-contamination at the pipeline level. It prevents unauthorized UI access, but if the underlying pipeline shares storage or processing queues, RBAC is a UI guardrail, not a data isolation guarantee.

    Filesystem Isolation

    Each client's data lives in a separate filesystem path, partition, or volume. Pipeline scripts are parameterized to operate on a specific path.

    Advantages: Simple to implement. Works with any tool.

    Disadvantages: Relies on discipline. One misconfigured path parameter, and data leaks between projects. No built-in enforcement — the isolation is only as good as the team's attention to detail.


    The Operational Challenge: 5–20 Simultaneous Projects

    Most ML service providers hit the isolation problem when they scale from 2–3 concurrent projects to 5–20. At this scale, the per-client overhead of separate installations becomes expensive, but the risk of shared-infrastructure approaches becomes real.

    The practical question is: How do you manage 15 client projects without 15 separate environments, while still guaranteeing that Client A's data never touches Client B's pipeline?

    This requires tool-native isolation — not bolted-on filesystem conventions or RBAC overlays, but isolation built into the tool's data model. Each project should be a first-class entity with its own:

    • Data store (ingested files, intermediate transformations, final exports)
    • Labeling configuration (taxonomy, guidelines, annotator assignments)
    • Pipeline configuration (cleaning rules, augmentation settings, export format)
    • Audit trail (independently exportable lineage for that project only)
    • Naming (client label, project identifier, engagement reference)

    DIY Isolation vs. Tool-Native Isolation

    DimensionDIY (Docker + Scripts)Tool-Native Isolation
    Setup time per project2–4 hours (container config, volume mounts, script parameterization)Minutes (create project, assign team)
    Risk of cross-contaminationModerate (depends on script correctness)Low (enforced by tool architecture)
    Audit trail per clientCustom (must build export logic)Built-in (per-project lineage export)
    Maintenance at 10 projectsHigh (10 containers, 10 configs)Low (one installation, 10 projects)
    Team context switchingSlow (switch containers, reload state)Fast (switch projects within tool)
    Compliance evidenceMust assemble from logsSingle report per project

    The DIY approach works at small scale. It breaks down when the number of concurrent projects exceeds the team's capacity to maintain the infrastructure reliably.


    Audit Trail Requirements

    For enterprise clients in regulated industries, the audit trail is not optional — it is a deliverable. Each client needs to see:

    • What data entered the pipeline — source files, formats, timestamps
    • What transformations were applied — cleaning rules, redaction steps, augmentation operations
    • Who applied them — user attribution for manual steps like labeling
    • What data was exported — output files, formats, timestamps, row counts
    • What was excluded and why — records that failed quality checks, files that could not be parsed

    This lineage must be exportable per client without any reference to other clients' data or operations. If your audit trail is a single log file that covers all projects, you need to filter and redact before handing it to a client — which introduces its own risk of error.


    Implementing Isolation in Practice

    If you are building this yourself, here is the minimum viable isolation architecture:

    1. One directory root per client project. All data — raw, intermediate, and exported — lives under that root. Nothing is shared with other project roots.
    2. Pipeline configuration per project. Cleaning rules, labeling taxonomies, and export settings are stored within the project directory, not globally.
    3. Per-project audit logs. Every operation logs to a file within the project directory. Global logs should reference the project ID but contain no data from the project itself.
    4. Access scoping. Team members are assigned to projects. Their tools and dashboards show only the projects they are assigned to.
    5. Export validation. Before delivering a dataset to a client, validate that every record in the export traces back to the correct project root and no foreign records are included.

    This is achievable with custom infrastructure. It is also the kind of plumbing that tools like Ertas Data Suite handle natively. Ertas supports multi-project management with client-labeled projects, per-project audit trails, and built-in data lineage — all running on-premise with no internet dependency. For service providers managing many concurrent engagements, this eliminates the isolation infrastructure that would otherwise require custom engineering.


    Where This Fits

    Client isolation is the operational foundation of a data preparation service practice. Without it, scaling from a few clients to many clients introduces unacceptable risk. With it, the number of concurrent projects is limited by team capacity, not infrastructure constraints.

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Keep reading