
Managing 50+ LoRA Adapters in Production: Versioning and Organization
Practical systems for managing dozens of LoRA adapters across multiple clients, tasks, and base models — covering naming conventions, metadata, registries, multi-LoRA serving, and scaling milestones from 10 to 100+ adapters.
You started with 3 adapters. One per client, all on the same base model, all doing the same task. Easy to manage. You could keep track of everything in your head.
Now you have 47 adapters across 12 clients, 4 task types, and 3 base models. Last Tuesday someone deployed the wrong adapter to production. The legal summarization client got the customer support adapter. Nobody noticed for six hours until the client called asking why their AI was offering refund instructions instead of case summaries.
This is not a tooling problem. It is an organization problem. And it hits every team that scales past 10-15 adapters without a system.
This guide covers the practical infrastructure for managing LoRA adapters at scale: naming conventions, directory structure, metadata tracking, registry design, multi-LoRA serving, and the specific pain points that emerge at 10, 25, 50, and 100+ adapters.
Naming Conventions That Scale
The first system that breaks is naming. "client_model_v2_final" tells you nothing three months later. You need a convention that encodes the critical information and stays consistent as you grow.
Recommended format:
{client}_{task}_{base}_{date}_{version}
Examples:
acmelaw_summarize_llama33-8b_20260215_v3
northmed_triage_qwen25-7b_20260201_v1
globalfin_classify_mistral-7b_20260220_v2
acmelaw_extract_llama33-8b_20260215_v1
Rules:
- Client names are lowercase, no spaces, no special characters. Use abbreviations for long names.
- Task names are single verbs:
summarize,classify,extract,generate,triage,respond. - Base model is abbreviated but unambiguous:
llama33-8b,qwen25-7b,mistral-7b. - Date is the training date in YYYYMMDD format.
- Version is incremented per client-task-base combination, not globally.
Why this works at scale: You can sort by client, filter by task, identify the base model, and know the training date from the name alone. When you have 50 adapters in a list, this structure lets you find the right one in seconds instead of opening metadata files.
What to avoid:
- Sequential numbering (
adapter_001,adapter_002) — tells you nothing - Descriptive names (
better_legal_model) — subjective and ambiguous - Dates without client or task — you will have multiple adapters trained on the same day
Directory Structure
File organization matters more than you think when you are managing dozens of adapters. Here is a structure that scales.
adapters/
├── acmelaw/
│ ├── summarize/
│ │ ├── llama33-8b/
│ │ │ ├── 20260115_v1/
│ │ │ │ ├── adapter_model.safetensors
│ │ │ │ ├── adapter_config.json
│ │ │ │ ├── metadata.json
│ │ │ │ └── eval_results.json
│ │ │ ├── 20260215_v2/
│ │ │ │ ├── adapter_model.safetensors
│ │ │ │ ├── adapter_config.json
│ │ │ │ ├── metadata.json
│ │ │ │ └── eval_results.json
│ │ │ └── ACTIVE → 20260215_v2/
│ │ └── qwen25-7b/
│ │ └── ...
│ └── extract/
│ └── ...
├── northmed/
│ └── ...
└── _archived/
└── ...
Key principles:
- Hierarchy is
client → task → base model → version. This matches how you think about adapters: "Which client? Which task? Which base?" - The
ACTIVEsymlink points to the currently deployed version. Deployment means updating a symlink, and rollback means pointing it back. - Archived adapters move to
_archived/with the same internal structure. They are not deleted — they are moved out of the active search path. - Every version directory is self-contained. You can copy a single version directory to another machine and it has everything needed to serve.
Metadata Files: The Sidecar Pattern
Every adapter version gets a metadata.json sidecar file. This is the single source of truth for everything about that adapter.
{
"adapter_name": "acmelaw_summarize_llama33-8b_20260215_v2",
"client": "acmelaw",
"task": "summarize",
"base_model": "meta-llama/Llama-3.3-8B-Instruct",
"base_model_hash": "sha256:a1b2c3d4...",
"training_date": "2026-02-15",
"version": 2,
"status": "active",
"dataset": {
"name": "acmelaw_summarize_v4",
"hash": "sha256:e5f6g7h8...",
"example_count": 847,
"date_range": "2025-09-01 to 2026-02-10"
},
"training_config": {
"lora_rank": 32,
"lora_alpha": 64,
"epochs": 4,
"learning_rate": 2e-4,
"training_time_seconds": 612
},
"evaluation": {
"accuracy": 0.936,
"format_compliance": 0.978,
"hallucination_rate": 0.021,
"eval_set_size": 85,
"eval_date": "2026-02-15"
},
"deployment": {
"deployed_date": "2026-02-16",
"deployed_by": "jchen",
"quantization": "Q5_K_M",
"serving_config": "ollama"
},
"previous_version": "acmelaw_summarize_llama33-8b_20260115_v1",
"notes": "Added 120 examples from January production corrections. Accuracy improved from 91.2% to 93.6%."
}
Why every field matters:
base_model_hash— Ensures reproducibility. Base models get updated; the hash pins the exact version.dataset.hash— You can verify that a retraining used the dataset you intended.evaluation— Quick comparison across versions without re-running eval.previous_version— Chain of lineage. You can trace any adapter back to v1.status— One ofactive,staging,archived,failed. Only one version per client-task-base should beactive.notes— Human-readable context that metadata alone cannot capture.
The Adapter Registry
At 25+ adapters, browsing directory structures becomes slow. You need a searchable registry.
The registry can be as simple as a JSON file or SQLite database. It does not need to be a distributed system. What it needs is:
Required fields:
- Adapter name (unique key)
- Client, task, base model
- Status (active / staging / archived)
- Accuracy score from latest evaluation
- Deployment date
- File path to adapter weights
Useful queries the registry enables:
- "Show me all active adapters for acmelaw" — instant, instead of browsing directories
- "Which adapters use llama33-8b as a base?" — critical when a base model update drops
- "Sort all adapters by accuracy score" — identifies which models need attention
- "Which adapters have not been retrained in 90+ days?" — maintenance scheduling
- "How many active adapters per client?" — capacity planning and billing
Implementation options:
For teams managing 10-50 adapters, a single registry.json file tracked in git works well. Update it as part of your deployment process. It is searchable with jq and readable by any tool.
For 50+ adapters, SQLite gives you proper querying without infrastructure overhead. A single file, no server, full SQL. Wrap it in a small CLI tool that your team uses for common operations:
# Find all active adapters for a client
ertas-registry list --client acmelaw --status active
# Show adapters that need retraining (>90 days old)
ertas-registry stale --days 90
# Mark an adapter as archived
ertas-registry archive acmelaw_summarize_llama33-8b_20260115_v1
Version Control Strategy
Not everything belongs in git, but more than you think does.
In git (text, small files):
metadata.jsonfor every adapter versionadapter_config.json(LoRA configuration)- Training scripts and configuration files
- Evaluation scripts and benchmark definitions
- Registry file or database schema
- Deployment scripts and serving configuration
In artifact storage (large binary files):
adapter_model.safetensors(adapter weights, typically 50-500 MB each)- Merged GGUF files (4-8 GB each)
- Training datasets (versioned separately)
Why this split matters: Git is excellent for tracking changes to configuration and metadata. It is terrible for large binary files. Adapter weights in git will bloat your repository to unusable sizes within weeks. Use a proper artifact store — S3, GCS, or even a structured NFS share with your naming convention.
The link between them: Your git-tracked metadata.json includes the hash and storage path for the corresponding weights file. Git gives you the history and diffability; artifact storage gives you the actual weights.
Multi-LoRA Serving
When you have dozens of adapters, you cannot keep them all loaded in memory. You need a serving strategy.
Hot-Swapping
Load adapters on demand. When a request comes in for a specific client-task combination, load the corresponding adapter, run inference, and optionally keep it cached.
Adapter loading time: 10-50ms for a typical LoRA adapter (rank 16-64 on a 7B model). This is fast enough for most production use cases. Users will not notice 30ms of adapter loading on top of a 500ms inference time.
Implementation: Map incoming requests to adapter names via a routing layer. The router checks client ID and task type, looks up the active adapter in the registry, and passes the adapter path to the inference server.
LRU Cache for Adapters
Keep the N most recently used adapters loaded in GPU memory. Evict the least recently used adapter when a new one is requested.
Memory per adapter: A rank-32 LoRA adapter on a 7B model uses approximately 50-100 MB of GPU memory. On a 24 GB GPU, you can keep 20-30 adapters cached simultaneously while the base model occupies 4-8 GB (quantized).
When to use LRU caching: When you have clear traffic patterns — some clients send queries constantly, others send bursts. The high-traffic adapters stay cached; low-traffic adapters are loaded on demand.
Pre-Merged High-Traffic Adapters
For your highest-traffic client-task combinations, merge the LoRA adapter into the base model weights and serve the merged model directly. This eliminates adapter loading entirely.
Trade-off: A merged model is a separate full model (4-8 GB quantized). You lose the memory efficiency of LoRA. Only do this for the top 3-5 adapters by traffic volume.
When it makes sense: If one adapter handles 40% of your total traffic, merging it saves adapter loading overhead on nearly half your requests. If no single adapter exceeds 10% of traffic, the memory cost is not worth it.
Performance at Scale
Real numbers from production multi-LoRA deployments.
| Metric | 10 Adapters | 25 Adapters | 50 Adapters | 100 Adapters |
|---|---|---|---|---|
| Registry lookup | under 1ms | under 1ms | under 1ms | 1-2ms |
| Adapter load (cold) | 15ms | 15ms | 15ms | 15ms |
| Adapter swap (cached) | 2ms | 2ms | 2ms | 2ms |
| Memory (all cached) | 0.5 GB | 1.3 GB | 2.5 GB | 5 GB |
| Storage (all versions) | 2 GB | 8 GB | 20 GB | 50 GB |
The per-adapter costs are roughly linear. What changes at scale is not performance but operational complexity — knowing which adapter to load, keeping the registry accurate, and managing the retraining schedule for all of them.
Cleanup and Archival
Storage grows fast when you keep every version of every adapter. You need a cleanup policy.
When to archive:
- An adapter has been superseded by a newer version for 30+ days with no rollback
- A client engagement has ended (archive all their adapters, do not delete)
- An adapter was trained on a base model you no longer support
- Evaluation scores fell below your minimum threshold and retraining produced a replacement
When to delete:
- Almost never. Storage is cheap compared to retraining time. An archived adapter costs $0.02/month on S3. Retraining it from scratch costs 4-8 hours of team time.
- Delete only adapters marked as
failed— training runs that did not produce usable results.
Storage cost reality check:
| Adapter Count | Active Versions | Archived Versions | Total Storage | Monthly Cost (S3) |
|---|---|---|---|---|
| 10 | 10 | 15 | 6 GB | $0.14 |
| 25 | 25 | 50 | 18 GB | $0.41 |
| 50 | 50 | 120 | 42 GB | $0.97 |
| 100 | 100 | 300 | 100 GB | $2.30 |
At $2.30/month for 400 adapter versions, the "should we clean up storage?" question answers itself: no, keep everything. The cost of accidentally deleting an adapter you need later far exceeds the storage savings.
Scaling Milestones
Each order of magnitude introduces new challenges. Here is what to expect.
10 Adapters: The Naming Problem
Everything still fits in your head, but you are starting to confuse adapter versions. Last week you loaded v1 when you meant v2.
What to implement now: Naming convention and metadata files. These cost almost nothing to set up and prevent the first category of errors.
25 Adapters: The Registry Problem
You cannot remember which adapters exist, which are active, and which need retraining. Browsing directories takes minutes.
What to implement now: Adapter registry (JSON or SQLite). Deployment scripts that reference the registry instead of hardcoded paths. A monthly audit checklist.
50 Adapters: The Serving Problem
Loading adapters on demand works, but cache misses are noticeable. Some clients complain about inconsistent latency.
What to implement now: LRU caching with tuned cache size. Pre-merge your top 3-5 adapters. Automated monitoring for adapter load times and cache hit rates. Routing layer that maps requests to adapters without manual configuration.
100+ Adapters: The Operations Problem
No single person can track all adapters. Retraining schedules overlap. Evaluation becomes a bottleneck.
What to implement now: Automated retraining pipeline with evaluation gates. Team-based adapter ownership (each team member owns specific clients). Quarterly full audit with automated staleness detection. Consider Ertas for centralized adapter lifecycle management.
Common Disasters and How to Prevent Them
Deploying the wrong adapter. The most common disaster. Prevention: deployment scripts pull from the registry, never from manual path entry. The ACTIVE symlink pattern means deployment is updating one pointer, not copying files.
Overwriting an active adapter during retraining. Retraining outputs to a new version directory. Never retrain in-place. The new version sits in staging until evaluation passes, then the ACTIVE symlink is updated.
Base model update breaks adapters. A new version of Llama drops and you update the base model without retraining adapters. LoRA adapters are tied to specific base model weights — they are not portable across versions. Prevention: lock the base model hash in metadata. Test all adapters against any base model update before deploying it.
Lost provenance. Six months from now, someone asks "what data was this model trained on?" Without metadata files, you do not know. Prevention: metadata sidecar files with dataset hashes, created automatically as part of the training pipeline.
Single point of failure in adapter knowledge. One team member knows which adapters serve which clients. They go on vacation. Prevention: the registry is the source of truth, not anyone's memory. Any team member should be able to answer "which adapter is serving client X for task Y?" by querying the registry.
Ship AI that runs on your users' devices.
Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Start Organizing Before You Need To
If you have 3 adapters today, start with naming conventions and metadata files. It takes 30 minutes and saves days of confusion later.
If you already have 20+ adapters and no system, do not try to reorganize everything at once. Start with a registry of what exists and what is active. Add metadata files for new adapters going forward. Backfill old adapters as you touch them for retraining.
The pattern is always the same: the team that organizes at 10 adapters scales smoothly to 50. The team that waits until 50 spends a week untangling the mess before they can move forward.
Your adapters are production infrastructure. Treat them accordingly.
Further Reading
- Managing Multiple Fine-Tuned Models in Production — Broader guide to multi-model operations beyond LoRA specifics
- AI Model Versioning for Agencies — Versioning strategies oriented toward client-facing delivery
- LoRA Adapters Explained for Agencies — Non-technical overview of what LoRA adapters are and why they enable multi-client serving
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Keep reading

Multi-Client Fine-Tuning: One Base Model, Custom LoRA Adapters Per Law Firm
How to use LoRA adapters to serve multiple law firm clients from a single base model — covering architecture, training, hot-swapping, cost efficiency, and data isolation guarantees.

CI/CD for Fine-Tuning Pipelines: Automating Train-Evaluate-Deploy
Manual fine-tuning doesn't scale. Learn how to build a complete CI/CD pipeline that automates training, evaluation, promotion gates, and deployment for fine-tuned models.

Rolling Back a Fine-Tuned Model Safely: Deployment Strategies
Deployed a retrained model and things went wrong? Learn blue-green, canary, and shadow deployment strategies that let you roll back a fine-tuned model in seconds, not hours.