
LoRA Adapters Per Healthcare Specialty: Radiology, Pathology, Primary Care
How to serve multiple hospital departments from a single base model using specialty-specific LoRA adapters. Covers architecture, training data requirements, storage math, adapter management, and performance benchmarks.
A radiology report and a primary care visit note require fundamentally different AI capabilities. Radiology demands structured reporting with precise anatomical vocabulary, comparison against prior studies, and standardized impression sections. Primary care needs conversational patient communication, referral letter drafting, and visit note summarization across dozens of complaint types.
Running separate fine-tuned models for each specialty is expensive and wasteful. A Llama 3 8B model in FP16 takes approximately 16 GB of VRAM. Five specialties, five models, five GPUs — that math does not work for any hospital or agency.
The solution: one base medical model loaded once in GPU memory, plus lightweight LoRA adapters swapped per request based on the requesting department. This article covers the architecture, the specialty-specific training requirements, storage math, adapter management, and performance comparisons.
Architecture: One Base Model, Many Specialties
The Core Setup
GPU Memory Layout:
┌─────────────────────────────────────┐
│ Base Model (Llama 3 8B or Mistral) │ ← Loaded once: 8-16 GB
│ Quantized to Q5_K_M: ~5.5 GB │
├─────────────────────────────────────┤
│ Active LoRA Adapter │ ← Hot-swapped per request
│ (Specialty-specific, 50-200 MB) │
└─────────────────────────────────────┘
Adapter Storage (SSD):
├── radiology-report-v2.1.safetensors (120 MB)
├── pathology-synoptic-v1.3.safetensors (95 MB)
├── primary-care-notes-v3.0.safetensors (140 MB)
├── cardiology-echo-v1.1.safetensors (88 MB)
├── dermatology-lesion-v2.0.safetensors (105 MB)
├── emergency-triage-v1.4.safetensors (110 MB)
├── orthopedics-surgical-v1.0.safetensors (92 MB)
├── psychiatry-eval-v1.2.safetensors (130 MB)
├── oncology-staging-v1.1.safetensors (115 MB)
└── gastro-endo-v1.0.safetensors (98 MB)
Base Model Selection
The base model matters. For healthcare, you want a model that already has strong medical vocabulary and reasoning, then specialize it further with LoRA.
Llama 3 8B is the recommended starting point for most healthcare deployments:
- Strong general reasoning
- Good performance on medical benchmarks out of the box
- Large community, well-tested quantization paths
- Permissive license for commercial use
Mistral 7B is a strong alternative when latency is the primary concern:
- Slightly smaller, faster inference
- Sliding window attention handles long clinical documents well
- Good performance-per-parameter ratio
Either model serves as the frozen foundation. The LoRA adapters do the specialization work.
Request Routing
When a request arrives, the system identifies the originating department and loads the corresponding adapter:
Incoming Request
│
▼
┌─────────────┐
│ API Gateway │ ← Authenticates, identifies department
└──────┬──────┘
│
▼
┌──────────────────┐
│ Adapter Router │ ← Maps department → adapter file
│ │
│ radiology → rad-v2.1
│ pathology → path-v1.3
│ primary → pc-v3.0
└──────┬───────────┘
│
▼
┌──────────────────┐
│ Inference Engine │ ← Base model + selected adapter
│ (vLLM / Ollama) │
└──────────────────┘
Adapter swap time on modern hardware: 10-50ms. Invisible to the end user. In practice, most inference engines cache recently used adapters, so swap cost approaches zero for active departments.
Radiology Adapter
What It Does
Radiology AI assists with three core tasks:
- Report generation from findings — Given a list of imaging findings (dictated or extracted from worklists), generate a structured radiology report.
- Comparison with prior studies — Given current and prior findings, generate the "comparison" section of the report.
- Impression summarization — Condense a full findings section into a concise clinical impression with actionable recommendations.
Training Data Requirements
| Parameter | Specification |
|---|---|
| Volume | 300-500 radiology report examples |
| Sources | De-identified reports from the institution's PACS/RIS, MIMIC-CXR (publicly available), OpenI (NIH) |
| Format | Input: structured findings list. Output: complete report sections. |
| Quality criteria | Reports reviewed and approved by attending radiologists. Exclude preliminary or amended reports. |
| Modality coverage | CT (30%), MRI (25%), X-ray (25%), Ultrasound (15%), Other (5%) |
| De-identification | Strip patient name, MRN, DOB, dates, referring physician, institution name |
Output Format
The adapter should produce reports following ACR (American College of Radiology) structured reporting standards:
EXAMINATION: CT Chest with contrast
CLINICAL INDICATION: 62-year-old male with persistent cough,
rule out malignancy.
COMPARISON: CT Chest dated [DATE].
TECHNIQUE: Axial images obtained through the chest following
administration of 100 mL IV contrast.
FINDINGS:
Lungs: 8mm ground-glass nodule in the right lower lobe,
unchanged from prior examination. No new pulmonary nodules.
No consolidation or pleural effusion.
[...]
IMPRESSION:
1. Stable 8mm RLL ground-glass nodule. Recommend follow-up
CT in 6 months per Fleischner Society guidelines.
2. No acute cardiopulmonary abnormality.
Key Training Considerations
- Consistency over creativity. Radiology reports follow strict formatting conventions. Train with a low temperature (0.1-0.3) and emphasize format compliance in the training data.
- Anatomical vocabulary. The adapter must learn institution-specific terminology preferences (e.g., "opacity" vs. "infiltrate," "lesion" vs. "mass").
- Measurement precision. The model should reproduce measurements exactly as provided in the input. Train with explicit measurement examples to prevent hallucination of sizes or dimensions.
Pathology Adapter
What It Does
- Specimen description standardization — Convert free-text gross descriptions into standardized synoptic format.
- Result interpretation — Generate interpretive comments for common pathology findings.
- Synoptic reporting — Produce CAP (College of American Pathologists) protocol-compliant synoptic reports.
Training Data Requirements
| Parameter | Specification |
|---|---|
| Volume | 200-400 pathology report examples |
| Sources | De-identified institutional pathology reports, CAP protocol templates |
| Format | Input: gross description + microscopic findings. Output: synoptic report. |
| Quality criteria | Final signed-out reports only. Exclude addenda unless paired with original. |
| Specimen types | Surgical pathology (60%), cytology (20%), dermatopathology (15%), hematopathology (5%) |
| De-identification | Strip patient identifiers, accession numbers, referring physician names |
Synoptic Output Example
CAP SYNOPTIC REPORT — Breast Excision
Procedure: Lumpectomy
Specimen Laterality: Left
Tumor Site: Upper outer quadrant
Histologic Type: Invasive ductal carcinoma, NOS
Histologic Grade: Grade 2 (Nottingham score 6/9)
Tumor Size: 1.8 cm (greatest dimension)
Margins: Negative (closest margin: 3mm, superior)
Lymphovascular Invasion: Not identified
DCIS: Present, solid and cribriform patterns
[...]
Key Training Considerations
- Structured output fidelity. CAP synoptic protocols have required fields. The adapter must learn to populate every required field, even when the input is incomplete (in which case it should indicate "not specified" rather than hallucinate).
- Lower volume requirement. Pathology reports are highly structured, so the adapter converges faster — 200-400 examples is typically sufficient compared to 400-600 for less structured specialties.
- Classification accuracy. Histologic grading, staging, and margin status must be transcribed exactly from input data. Train with examples that specifically test these critical fields.
Primary Care Adapter
What It Does
- Visit note summarization — Generate SOAP notes from encounter data or dictation transcripts.
- Patient communication — Draft after-visit summaries, care instructions, and follow-up messages in patient-friendly language.
- Referral letter drafting — Generate specialist referral letters with relevant history, current medications, and clinical question.
- Care plan generation — Produce structured care plans based on diagnosis, patient history, and clinical guidelines.
Training Data Requirements
| Parameter | Specification |
|---|---|
| Volume | 400-600 examples (higher due to task diversity) |
| Sources | De-identified visit notes, patient portal messages, referral letters |
| Format | Varies by task. Visit notes: encounter data → SOAP. Communication: clinical info → patient-friendly language. |
| Quality criteria | Notes from experienced physicians. Exclude incomplete encounters. |
| Visit type coverage | Annual wellness (15%), acute visits (35%), chronic disease management (30%), follow-ups (20%) |
| De-identification | Full PHI removal including social history details that could be identifying |
Key Training Considerations
- Reading level. Patient-facing communications must be written at a 6th-8th grade reading level. Include readability scoring in your evaluation criteria.
- Task diversity. Primary care adapters handle the widest range of tasks. Use task-specific instruction prefixes in training data to help the adapter distinguish between "generate a SOAP note" and "write a patient letter."
- Medication awareness. Primary care notes frequently reference medications. The adapter should reproduce medication names, dosages, and frequencies exactly. Do not rely on the adapter for drug interaction checking — that is a RAG task.
Storage Math: What This Costs
The entire multi-specialty deployment is remarkably compact:
| Component | Size |
|---|---|
| Base model (Llama 3 8B, Q5_K_M quantized) | 5.5 GB |
| Radiology adapter | 120 MB |
| Pathology adapter | 95 MB |
| Primary care adapter | 140 MB |
| Cardiology adapter | 88 MB |
| Dermatology adapter | 105 MB |
| Emergency adapter | 110 MB |
| Orthopedics adapter | 92 MB |
| Psychiatry adapter | 130 MB |
| Oncology adapter | 115 MB |
| Gastroenterology adapter | 98 MB |
| Total (base + 10 specialties) | ~6.6 GB |
Compare this to 10 separate fine-tuned models: 10 x 5.5 GB = 55 GB. The LoRA approach uses 88% less storage and requires only one GPU instead of multiple.
VRAM requirements at inference:
- Base model (quantized): 5.5 GB
- Active adapter: ~100-150 MB
- KV cache (2K context): ~500 MB
- Overhead: ~500 MB
- Total: ~6.5-7 GB — fits on a single consumer GPU (RTX 4060 or better)
A hospital running 10 specialties needs one GPU card, not ten. That is the LoRA value proposition.
Adapter Management: Versioning and Testing
Version Naming Convention
Use a clear, predictable naming scheme:
{specialty}-{task}-v{major}.{minor}.safetensors
Examples:
radiology-report-v1.0.safetensors ← Initial release
radiology-report-v1.1.safetensors ← Bug fixes, minor retraining
radiology-report-v2.0.safetensors ← Major retraining, new data
pathology-synoptic-v1.3.safetensors ← Third patch of first release
A/B Testing Between Versions
Before deploying a new adapter version, run it against the current version on a held-out test set:
| Metric | v1.0 | v1.1 | Threshold |
|---|---|---|---|
| Format compliance | 94% | 97% | > 95% |
| Clinical accuracy | 91% | 93% | > 90% |
| Hallucination rate | 3.2% | 1.8% | under 2% |
| Latency (p95) | 420ms | 435ms | under 500ms |
Only promote v1.1 to production if it meets all thresholds. Keep v1.0 as a rollback option.
Deployment: Load Once, Swap Per Request
The inference engine loads the base model once at startup. Adapters are loaded on-demand:
- Request arrives tagged with
department: radiology - Router checks if
radiology-report-v2.1is in adapter cache - If cached: apply adapter, run inference (add ~5ms latency)
- If not cached: load from SSD to GPU (~30-50ms), cache, run inference
- Return response
Most inference frameworks (vLLM, text-generation-inference, Ollama) support this pattern natively. The adapter cache holds the 3-5 most recently used adapters in GPU memory. For a hospital where radiology, primary care, and emergency are the highest-volume departments, those three adapters stay permanently cached.
Performance: Generic Model vs. Specialty Adapter
This is where the investment pays off. A generic base model handles medical text reasonably well. A specialty adapter makes it clinically useful.
Accuracy Comparison (Internal Benchmarks)
| Task | Generic Base Model | Specialty LoRA Adapter | Improvement |
|---|---|---|---|
| Radiology report generation | 71% format compliance | 96% format compliance | +25 pts |
| Radiology impression accuracy | 78% | 93% | +15 pts |
| Pathology synoptic completeness | 65% fields correct | 94% fields correct | +29 pts |
| Primary care SOAP notes | 74% | 91% | +17 pts |
| Patient communication readability | Grade 11 avg | Grade 7 avg | Appropriate level |
| Referral letter completeness | 68% | 92% | +24 pts |
| Discharge summary accuracy | 72% | 89% | +17 pts |
| Clinical coding suggestion accuracy | 70% | 88% | +18 pts |
Average improvement across all tasks: +20.6 percentage points. This is the difference between a model that clinicians ignore and one they actually use.
Latency Comparison
| Configuration | Time to First Token | Total Generation (500 tokens) |
|---|---|---|
| Base model only | 45ms | 380ms |
| Base + cached LoRA adapter | 48ms | 395ms |
| Base + cold LoRA adapter load | 85ms | 430ms |
The latency overhead of LoRA is negligible — 3-15ms for a cached adapter. This is invisible in a clinical workflow where the human interaction (clicking, reading, editing) takes seconds.
Ship AI that runs on your users' devices.
Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Putting It All Together
Deployment Checklist
-
Select base model. Llama 3 8B (Q5_K_M quantization) for balanced performance. Mistral 7B if latency is the top priority.
-
Prioritize specialties. Start with the 2-3 highest-volume departments. Radiology and primary care are almost always the right first choices.
-
Collect and de-identify training data. 300-600 examples per specialty. Work with department chairs to identify representative, high-quality examples.
-
Train adapters. Rank 16-32, learning rate 1e-4 to 2e-4, 3-5 epochs. Validate against a held-out test set after each epoch.
-
Benchmark against generic model. Document the improvement for each task. This data justifies the deployment to hospital administration.
-
Deploy with versioning. Use the naming convention above. Keep at least one prior version as a rollback option.
-
Monitor and retrain. Track accuracy metrics weekly. Retrain quarterly or when performance drifts below thresholds.
The one-model-many-adapters architecture is not just a cost optimization — it is an operational simplification. One model to update, one model to secure, one model to audit. The adapters add specialization without adding infrastructure complexity.
Further Reading
- Multi-Client Fine-Tuning: One Base Model, Custom LoRA Adapters Per Law Firm — The same LoRA architecture applied to the legal vertical, with client isolation patterns.
- Managing Multiple Fine-Tuned Models as an Agency — Operational guide for agencies running adapters across multiple clients and industries.
- Fine-Tuning Healthcare AI for Clinical Deployment — End-to-end guide to building clinical AI models with HIPAA-compliant pipelines.
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.
Keep reading

Fine-Tuning Healthcare AI: From Clinical Notes to Compliant Deployment
An end-to-end guide to fine-tuning AI models for healthcare — covering data de-identification, clinical NLP training, on-premise deployment, and compliance validation.

Multi-Client Fine-Tuning: One Base Model, Custom LoRA Adapters Per Law Firm
How to use LoRA adapters to serve multiple law firm clients from a single base model — covering architecture, training, hot-swapping, cost efficiency, and data isolation guarantees.

On-Premise Healthcare AI: Architecture and Infrastructure Guide
A practical infrastructure guide for deploying AI on-premise in healthcare environments. Covers hardware requirements, network architecture, air-gapped deployment, HIPAA audit logging, model update strategies, and real cost comparisons against cloud APIs.