LoRA Adapters Per Healthcare Specialty: Radiology, Pathology, Primary Care

A radiology report and a primary care visit note require fundamentally different AI capabilities. Radiology demands structured reporting with precise anatomical vocabulary, comparison against prior studies, and standardized impression sections. Primary care needs conversational patient communication, referral letter drafting, and visit note summarization across dozens of complaint types.

Running separate fine-tuned models for each specialty is expensive and wasteful. A Llama 3 8B model in FP16 takes approximately 16 GB of VRAM. Five specialties, five models, five GPUs — that math does not work for any hospital or agency.

The solution: one base medical model loaded once in GPU memory, plus lightweight LoRA adapters swapped per request based on the requesting department. This article covers the architecture, the specialty-specific training requirements, storage math, adapter management, and performance comparisons.

Architecture: One Base Model, Many Specialties

The Core Setup

GPU Memory Layout:
┌─────────────────────────────────────┐
│  Base Model (Llama 3 8B or Mistral) │ ← Loaded once: 8-16 GB
│  Quantized to Q5_K_M: ~5.5 GB      │
├─────────────────────────────────────┤
│  Active LoRA Adapter                │ ← Hot-swapped per request
│  (Specialty-specific, 50-200 MB)    │
└─────────────────────────────────────┘

Adapter Storage (SSD):
├── radiology-report-v2.1.safetensors      (120 MB)
├── pathology-synoptic-v1.3.safetensors    (95 MB)
├── primary-care-notes-v3.0.safetensors    (140 MB)
├── cardiology-echo-v1.1.safetensors       (88 MB)
├── dermatology-lesion-v2.0.safetensors    (105 MB)
├── emergency-triage-v1.4.safetensors      (110 MB)
├── orthopedics-surgical-v1.0.safetensors  (92 MB)
├── psychiatry-eval-v1.2.safetensors       (130 MB)
├── oncology-staging-v1.1.safetensors      (115 MB)
└── gastro-endo-v1.0.safetensors           (98 MB)

Base Model Selection

The base model matters. For healthcare, you want a model that already has strong medical vocabulary and reasoning, then specialize it further with LoRA.

Llama 3 8B is the recommended starting point for most healthcare deployments:

Strong general reasoning
Good performance on medical benchmarks out of the box
Large community, well-tested quantization paths
Permissive license for commercial use

Mistral 7B is a strong alternative when latency is the primary concern:

Slightly smaller, faster inference
Sliding window attention handles long clinical documents well
Good performance-per-parameter ratio

Either model serves as the frozen foundation. The LoRA adapters do the specialization work.

Request Routing

When a request arrives, the system identifies the originating department and loads the corresponding adapter:

Incoming Request
      │
      ▼
┌─────────────┐
│  API Gateway │ ← Authenticates, identifies department
└──────┬──────┘
       │
       ▼
┌──────────────────┐
│ Adapter Router    │ ← Maps department → adapter file
│                   │
│ radiology  → rad-v2.1
│ pathology  → path-v1.3
│ primary    → pc-v3.0
└──────┬───────────┘
       │
       ▼
┌──────────────────┐
│  Inference Engine │ ← Base model + selected adapter
│  (vLLM / Ollama)  │
└──────────────────┘

Adapter swap time on modern hardware: 10-50ms. Invisible to the end user. In practice, most inference engines cache recently used adapters, so swap cost approaches zero for active departments.

Radiology Adapter

What It Does

Radiology AI assists with three core tasks:

Report generation from findings — Given a list of imaging findings (dictated or extracted from worklists), generate a structured radiology report.
Comparison with prior studies — Given current and prior findings, generate the "comparison" section of the report.
Impression summarization — Condense a full findings section into a concise clinical impression with actionable recommendations.

Training Data Requirements

Parameter	Specification
Volume	300-500 radiology report examples
Sources	De-identified reports from the institution's PACS/RIS, MIMIC-CXR (publicly available), OpenI (NIH)
Format	Input: structured findings list. Output: complete report sections.
Quality criteria	Reports reviewed and approved by attending radiologists. Exclude preliminary or amended reports.
Modality coverage	CT (30%), MRI (25%), X-ray (25%), Ultrasound (15%), Other (5%)
De-identification	Strip patient name, MRN, DOB, dates, referring physician, institution name

Output Format

The adapter should produce reports following ACR (American College of Radiology) structured reporting standards:

EXAMINATION: CT Chest with contrast

CLINICAL INDICATION: 62-year-old male with persistent cough,
rule out malignancy.

COMPARISON: CT Chest dated [DATE].

TECHNIQUE: Axial images obtained through the chest following
administration of 100 mL IV contrast.

FINDINGS:
Lungs: 8mm ground-glass nodule in the right lower lobe,
unchanged from prior examination. No new pulmonary nodules.
No consolidation or pleural effusion.
[...]

IMPRESSION:
1. Stable 8mm RLL ground-glass nodule. Recommend follow-up
   CT in 6 months per Fleischner Society guidelines.
2. No acute cardiopulmonary abnormality.

Key Training Considerations

Consistency over creativity. Radiology reports follow strict formatting conventions. Train with a low temperature (0.1-0.3) and emphasize format compliance in the training data.
Anatomical vocabulary. The adapter must learn institution-specific terminology preferences (e.g., "opacity" vs. "infiltrate," "lesion" vs. "mass").
Measurement precision. The model should reproduce measurements exactly as provided in the input. Train with explicit measurement examples to prevent hallucination of sizes or dimensions.

Pathology Adapter

What It Does

Specimen description standardization — Convert free-text gross descriptions into standardized synoptic format.
Result interpretation — Generate interpretive comments for common pathology findings.
Synoptic reporting — Produce CAP (College of American Pathologists) protocol-compliant synoptic reports.

Training Data Requirements

Parameter	Specification
Volume	200-400 pathology report examples
Sources	De-identified institutional pathology reports, CAP protocol templates
Format	Input: gross description + microscopic findings. Output: synoptic report.
Quality criteria	Final signed-out reports only. Exclude addenda unless paired with original.
Specimen types	Surgical pathology (60%), cytology (20%), dermatopathology (15%), hematopathology (5%)
De-identification	Strip patient identifiers, accession numbers, referring physician names

Synoptic Output Example

CAP SYNOPTIC REPORT — Breast Excision

Procedure: Lumpectomy
Specimen Laterality: Left
Tumor Site: Upper outer quadrant
Histologic Type: Invasive ductal carcinoma, NOS
Histologic Grade: Grade 2 (Nottingham score 6/9)
Tumor Size: 1.8 cm (greatest dimension)
Margins: Negative (closest margin: 3mm, superior)
Lymphovascular Invasion: Not identified
DCIS: Present, solid and cribriform patterns
[...]

Key Training Considerations

Structured output fidelity. CAP synoptic protocols have required fields. The adapter must learn to populate every required field, even when the input is incomplete (in which case it should indicate "not specified" rather than hallucinate).
Lower volume requirement. Pathology reports are highly structured, so the adapter converges faster — 200-400 examples is typically sufficient compared to 400-600 for less structured specialties.
Classification accuracy. Histologic grading, staging, and margin status must be transcribed exactly from input data. Train with examples that specifically test these critical fields.

Primary Care Adapter

What It Does

Visit note summarization — Generate SOAP notes from encounter data or dictation transcripts.
Patient communication — Draft after-visit summaries, care instructions, and follow-up messages in patient-friendly language.
Referral letter drafting — Generate specialist referral letters with relevant history, current medications, and clinical question.
Care plan generation — Produce structured care plans based on diagnosis, patient history, and clinical guidelines.

Training Data Requirements

Parameter	Specification
Volume	400-600 examples (higher due to task diversity)
Sources	De-identified visit notes, patient portal messages, referral letters
Format	Varies by task. Visit notes: encounter data → SOAP. Communication: clinical info → patient-friendly language.
Quality criteria	Notes from experienced physicians. Exclude incomplete encounters.
Visit type coverage	Annual wellness (15%), acute visits (35%), chronic disease management (30%), follow-ups (20%)
De-identification	Full PHI removal including social history details that could be identifying

Key Training Considerations

Reading level. Patient-facing communications must be written at a 6th-8th grade reading level. Include readability scoring in your evaluation criteria.
Task diversity. Primary care adapters handle the widest range of tasks. Use task-specific instruction prefixes in training data to help the adapter distinguish between "generate a SOAP note" and "write a patient letter."
Medication awareness. Primary care notes frequently reference medications. The adapter should reproduce medication names, dosages, and frequencies exactly. Do not rely on the adapter for drug interaction checking — that is a RAG task.

Storage Math: What This Costs

The entire multi-specialty deployment is remarkably compact:

Component	Size
Base model (Llama 3 8B, Q5_K_M quantized)	5.5 GB
Radiology adapter	120 MB
Pathology adapter	95 MB
Primary care adapter	140 MB
Cardiology adapter	88 MB
Dermatology adapter	105 MB
Emergency adapter	110 MB
Orthopedics adapter	92 MB
Psychiatry adapter	130 MB
Oncology adapter	115 MB
Gastroenterology adapter	98 MB
Total (base + 10 specialties)	~6.6 GB

Compare this to 10 separate fine-tuned models: 10 x 5.5 GB = 55 GB. The LoRA approach uses 88% less storage and requires only one GPU instead of multiple.

VRAM requirements at inference:

Base model (quantized): 5.5 GB
Active adapter: ~100-150 MB
KV cache (2K context): ~500 MB
Overhead: ~500 MB
Total: ~6.5-7 GB — fits on a single consumer GPU (RTX 4060 or better)

A hospital running 10 specialties needs one GPU card, not ten. That is the LoRA value proposition.

Adapter Management: Versioning and Testing

Version Naming Convention

Use a clear, predictable naming scheme:

{specialty}-{task}-v{major}.{minor}.safetensors

Examples:
radiology-report-v1.0.safetensors    ← Initial release
radiology-report-v1.1.safetensors    ← Bug fixes, minor retraining
radiology-report-v2.0.safetensors    ← Major retraining, new data
pathology-synoptic-v1.3.safetensors  ← Third patch of first release

A/B Testing Between Versions

Before deploying a new adapter version, run it against the current version on a held-out test set:

Metric	v1.0	v1.1	Threshold
Format compliance	94%	97%	> 95%
Clinical accuracy	91%	93%	> 90%
Hallucination rate	3.2%	1.8%	under 2%
Latency (p95)	420ms	435ms	under 500ms

Only promote v1.1 to production if it meets all thresholds. Keep v1.0 as a rollback option.

Deployment: Load Once, Swap Per Request

The inference engine loads the base model once at startup. Adapters are loaded on-demand:

Request arrives tagged with department: radiology
Router checks if radiology-report-v2.1 is in adapter cache
If cached: apply adapter, run inference (add ~5ms latency)
If not cached: load from SSD to GPU (~30-50ms), cache, run inference
Return response

Most inference frameworks (vLLM, text-generation-inference, Ollama) support this pattern natively. The adapter cache holds the 3-5 most recently used adapters in GPU memory. For a hospital where radiology, primary care, and emergency are the highest-volume departments, those three adapters stay permanently cached.

Performance: Generic Model vs. Specialty Adapter

This is where the investment pays off. A generic base model handles medical text reasonably well. A specialty adapter makes it clinically useful.

Accuracy Comparison (Internal Benchmarks)

Task	Generic Base Model	Specialty LoRA Adapter	Improvement
Radiology report generation	71% format compliance	96% format compliance	+25 pts
Radiology impression accuracy	78%	93%	+15 pts
Pathology synoptic completeness	65% fields correct	94% fields correct	+29 pts
Primary care SOAP notes	74%	91%	+17 pts
Patient communication readability	Grade 11 avg	Grade 7 avg	Appropriate level
Referral letter completeness	68%	92%	+24 pts
Discharge summary accuracy	72%	89%	+17 pts
Clinical coding suggestion accuracy	70%	88%	+18 pts

Average improvement across all tasks: +20.6 percentage points. This is the difference between a model that clinicians ignore and one they actually use.

Latency Comparison

Configuration	Time to First Token	Total Generation (500 tokens)
Base model only	45ms	380ms
Base + cached LoRA adapter	48ms	395ms
Base + cold LoRA adapter load	85ms	430ms

The latency overhead of LoRA is negligible — 3-15ms for a cached adapter. This is invisible in a clinical workflow where the human interaction (clicking, reading, editing) takes seconds.

Ship AI that runs on your users' devices.

Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

View early bird pricing or join the waitlist →

Putting It All Together

Deployment Checklist

Select base model. Llama 3 8B (Q5_K_M quantization) for balanced performance. Mistral 7B if latency is the top priority.
Prioritize specialties. Start with the 2-3 highest-volume departments. Radiology and primary care are almost always the right first choices.
Collect and de-identify training data. 300-600 examples per specialty. Work with department chairs to identify representative, high-quality examples.
Train adapters. Rank 16-32, learning rate 1e-4 to 2e-4, 3-5 epochs. Validate against a held-out test set after each epoch.
Benchmark against generic model. Document the improvement for each task. This data justifies the deployment to hospital administration.
Deploy with versioning. Use the naming convention above. Keep at least one prior version as a rollback option.
Monitor and retrain. Track accuracy metrics weekly. Retrain quarterly or when performance drifts below thresholds.

The one-model-many-adapters architecture is not just a cost optimization — it is an operational simplification. One model to update, one model to secure, one model to audit. The adapters add specialization without adding infrastructure complexity.

LoRA Adapters Per Healthcare Specialty: Radiology, Pathology, Primary Care

Architecture: One Base Model, Many Specialties

The Core Setup

Base Model Selection

Request Routing

Radiology Adapter

What It Does

Training Data Requirements

Output Format

Key Training Considerations

Pathology Adapter

What It Does

Training Data Requirements

Synoptic Output Example

Key Training Considerations

Primary Care Adapter

What It Does

Training Data Requirements

Key Training Considerations

Storage Math: What This Costs

Adapter Management: Versioning and Testing

Version Naming Convention

A/B Testing Between Versions

Deployment: Load Once, Swap Per Request

Performance: Generic Model vs. Specialty Adapter

Accuracy Comparison (Internal Benchmarks)

Latency Comparison

Putting It All Together

Deployment Checklist

Further Reading

Ship AI that runs on your users' devices.

Keep reading

Fine-Tuning Healthcare AI: From Clinical Notes to Compliant Deployment

Multi-Client Fine-Tuning: One Base Model, Custom LoRA Adapters Per Law Firm

On-Premise Healthcare AI: Architecture and Infrastructure Guide