Back to blog
    LoRA Adapters Per Healthcare Specialty: Radiology, Pathology, Primary Care
    healthcarelorafine-tuningradiologypathologymulti-tenantdeployment

    LoRA Adapters Per Healthcare Specialty: Radiology, Pathology, Primary Care

    How to serve multiple hospital departments from a single base model using specialty-specific LoRA adapters. Covers architecture, training data requirements, storage math, adapter management, and performance benchmarks.

    EErtas Team·

    A radiology report and a primary care visit note require fundamentally different AI capabilities. Radiology demands structured reporting with precise anatomical vocabulary, comparison against prior studies, and standardized impression sections. Primary care needs conversational patient communication, referral letter drafting, and visit note summarization across dozens of complaint types.

    Running separate fine-tuned models for each specialty is expensive and wasteful. A Llama 3 8B model in FP16 takes approximately 16 GB of VRAM. Five specialties, five models, five GPUs — that math does not work for any hospital or agency.

    The solution: one base medical model loaded once in GPU memory, plus lightweight LoRA adapters swapped per request based on the requesting department. This article covers the architecture, the specialty-specific training requirements, storage math, adapter management, and performance comparisons.

    Architecture: One Base Model, Many Specialties

    The Core Setup

    GPU Memory Layout:
    ┌─────────────────────────────────────┐
    │  Base Model (Llama 3 8B or Mistral) │ ← Loaded once: 8-16 GB
    │  Quantized to Q5_K_M: ~5.5 GB      │
    ├─────────────────────────────────────┤
    │  Active LoRA Adapter                │ ← Hot-swapped per request
    │  (Specialty-specific, 50-200 MB)    │
    └─────────────────────────────────────┘
    
    Adapter Storage (SSD):
    ├── radiology-report-v2.1.safetensors      (120 MB)
    ├── pathology-synoptic-v1.3.safetensors    (95 MB)
    ├── primary-care-notes-v3.0.safetensors    (140 MB)
    ├── cardiology-echo-v1.1.safetensors       (88 MB)
    ├── dermatology-lesion-v2.0.safetensors    (105 MB)
    ├── emergency-triage-v1.4.safetensors      (110 MB)
    ├── orthopedics-surgical-v1.0.safetensors  (92 MB)
    ├── psychiatry-eval-v1.2.safetensors       (130 MB)
    ├── oncology-staging-v1.1.safetensors      (115 MB)
    └── gastro-endo-v1.0.safetensors           (98 MB)
    

    Base Model Selection

    The base model matters. For healthcare, you want a model that already has strong medical vocabulary and reasoning, then specialize it further with LoRA.

    Llama 3 8B is the recommended starting point for most healthcare deployments:

    • Strong general reasoning
    • Good performance on medical benchmarks out of the box
    • Large community, well-tested quantization paths
    • Permissive license for commercial use

    Mistral 7B is a strong alternative when latency is the primary concern:

    • Slightly smaller, faster inference
    • Sliding window attention handles long clinical documents well
    • Good performance-per-parameter ratio

    Either model serves as the frozen foundation. The LoRA adapters do the specialization work.

    Request Routing

    When a request arrives, the system identifies the originating department and loads the corresponding adapter:

    Incoming Request
          │
          ▼
    ┌─────────────┐
    │  API Gateway │ ← Authenticates, identifies department
    └──────┬──────┘
           │
           ▼
    ┌──────────────────┐
    │ Adapter Router    │ ← Maps department → adapter file
    │                   │
    │ radiology  → rad-v2.1
    │ pathology  → path-v1.3
    │ primary    → pc-v3.0
    └──────┬───────────┘
           │
           ▼
    ┌──────────────────┐
    │  Inference Engine │ ← Base model + selected adapter
    │  (vLLM / Ollama)  │
    └──────────────────┘
    

    Adapter swap time on modern hardware: 10-50ms. Invisible to the end user. In practice, most inference engines cache recently used adapters, so swap cost approaches zero for active departments.

    Radiology Adapter

    What It Does

    Radiology AI assists with three core tasks:

    1. Report generation from findings — Given a list of imaging findings (dictated or extracted from worklists), generate a structured radiology report.
    2. Comparison with prior studies — Given current and prior findings, generate the "comparison" section of the report.
    3. Impression summarization — Condense a full findings section into a concise clinical impression with actionable recommendations.

    Training Data Requirements

    ParameterSpecification
    Volume300-500 radiology report examples
    SourcesDe-identified reports from the institution's PACS/RIS, MIMIC-CXR (publicly available), OpenI (NIH)
    FormatInput: structured findings list. Output: complete report sections.
    Quality criteriaReports reviewed and approved by attending radiologists. Exclude preliminary or amended reports.
    Modality coverageCT (30%), MRI (25%), X-ray (25%), Ultrasound (15%), Other (5%)
    De-identificationStrip patient name, MRN, DOB, dates, referring physician, institution name

    Output Format

    The adapter should produce reports following ACR (American College of Radiology) structured reporting standards:

    EXAMINATION: CT Chest with contrast
    
    CLINICAL INDICATION: 62-year-old male with persistent cough,
    rule out malignancy.
    
    COMPARISON: CT Chest dated [DATE].
    
    TECHNIQUE: Axial images obtained through the chest following
    administration of 100 mL IV contrast.
    
    FINDINGS:
    Lungs: 8mm ground-glass nodule in the right lower lobe,
    unchanged from prior examination. No new pulmonary nodules.
    No consolidation or pleural effusion.
    [...]
    
    IMPRESSION:
    1. Stable 8mm RLL ground-glass nodule. Recommend follow-up
       CT in 6 months per Fleischner Society guidelines.
    2. No acute cardiopulmonary abnormality.
    

    Key Training Considerations

    • Consistency over creativity. Radiology reports follow strict formatting conventions. Train with a low temperature (0.1-0.3) and emphasize format compliance in the training data.
    • Anatomical vocabulary. The adapter must learn institution-specific terminology preferences (e.g., "opacity" vs. "infiltrate," "lesion" vs. "mass").
    • Measurement precision. The model should reproduce measurements exactly as provided in the input. Train with explicit measurement examples to prevent hallucination of sizes or dimensions.

    Pathology Adapter

    What It Does

    1. Specimen description standardization — Convert free-text gross descriptions into standardized synoptic format.
    2. Result interpretation — Generate interpretive comments for common pathology findings.
    3. Synoptic reporting — Produce CAP (College of American Pathologists) protocol-compliant synoptic reports.

    Training Data Requirements

    ParameterSpecification
    Volume200-400 pathology report examples
    SourcesDe-identified institutional pathology reports, CAP protocol templates
    FormatInput: gross description + microscopic findings. Output: synoptic report.
    Quality criteriaFinal signed-out reports only. Exclude addenda unless paired with original.
    Specimen typesSurgical pathology (60%), cytology (20%), dermatopathology (15%), hematopathology (5%)
    De-identificationStrip patient identifiers, accession numbers, referring physician names

    Synoptic Output Example

    CAP SYNOPTIC REPORT — Breast Excision
    
    Procedure: Lumpectomy
    Specimen Laterality: Left
    Tumor Site: Upper outer quadrant
    Histologic Type: Invasive ductal carcinoma, NOS
    Histologic Grade: Grade 2 (Nottingham score 6/9)
    Tumor Size: 1.8 cm (greatest dimension)
    Margins: Negative (closest margin: 3mm, superior)
    Lymphovascular Invasion: Not identified
    DCIS: Present, solid and cribriform patterns
    [...]
    

    Key Training Considerations

    • Structured output fidelity. CAP synoptic protocols have required fields. The adapter must learn to populate every required field, even when the input is incomplete (in which case it should indicate "not specified" rather than hallucinate).
    • Lower volume requirement. Pathology reports are highly structured, so the adapter converges faster — 200-400 examples is typically sufficient compared to 400-600 for less structured specialties.
    • Classification accuracy. Histologic grading, staging, and margin status must be transcribed exactly from input data. Train with examples that specifically test these critical fields.

    Primary Care Adapter

    What It Does

    1. Visit note summarization — Generate SOAP notes from encounter data or dictation transcripts.
    2. Patient communication — Draft after-visit summaries, care instructions, and follow-up messages in patient-friendly language.
    3. Referral letter drafting — Generate specialist referral letters with relevant history, current medications, and clinical question.
    4. Care plan generation — Produce structured care plans based on diagnosis, patient history, and clinical guidelines.

    Training Data Requirements

    ParameterSpecification
    Volume400-600 examples (higher due to task diversity)
    SourcesDe-identified visit notes, patient portal messages, referral letters
    FormatVaries by task. Visit notes: encounter data → SOAP. Communication: clinical info → patient-friendly language.
    Quality criteriaNotes from experienced physicians. Exclude incomplete encounters.
    Visit type coverageAnnual wellness (15%), acute visits (35%), chronic disease management (30%), follow-ups (20%)
    De-identificationFull PHI removal including social history details that could be identifying

    Key Training Considerations

    • Reading level. Patient-facing communications must be written at a 6th-8th grade reading level. Include readability scoring in your evaluation criteria.
    • Task diversity. Primary care adapters handle the widest range of tasks. Use task-specific instruction prefixes in training data to help the adapter distinguish between "generate a SOAP note" and "write a patient letter."
    • Medication awareness. Primary care notes frequently reference medications. The adapter should reproduce medication names, dosages, and frequencies exactly. Do not rely on the adapter for drug interaction checking — that is a RAG task.

    Storage Math: What This Costs

    The entire multi-specialty deployment is remarkably compact:

    ComponentSize
    Base model (Llama 3 8B, Q5_K_M quantized)5.5 GB
    Radiology adapter120 MB
    Pathology adapter95 MB
    Primary care adapter140 MB
    Cardiology adapter88 MB
    Dermatology adapter105 MB
    Emergency adapter110 MB
    Orthopedics adapter92 MB
    Psychiatry adapter130 MB
    Oncology adapter115 MB
    Gastroenterology adapter98 MB
    Total (base + 10 specialties)~6.6 GB

    Compare this to 10 separate fine-tuned models: 10 x 5.5 GB = 55 GB. The LoRA approach uses 88% less storage and requires only one GPU instead of multiple.

    VRAM requirements at inference:

    • Base model (quantized): 5.5 GB
    • Active adapter: ~100-150 MB
    • KV cache (2K context): ~500 MB
    • Overhead: ~500 MB
    • Total: ~6.5-7 GB — fits on a single consumer GPU (RTX 4060 or better)

    A hospital running 10 specialties needs one GPU card, not ten. That is the LoRA value proposition.

    Adapter Management: Versioning and Testing

    Version Naming Convention

    Use a clear, predictable naming scheme:

    {specialty}-{task}-v{major}.{minor}.safetensors
    
    Examples:
    radiology-report-v1.0.safetensors    ← Initial release
    radiology-report-v1.1.safetensors    ← Bug fixes, minor retraining
    radiology-report-v2.0.safetensors    ← Major retraining, new data
    pathology-synoptic-v1.3.safetensors  ← Third patch of first release
    

    A/B Testing Between Versions

    Before deploying a new adapter version, run it against the current version on a held-out test set:

    Metricv1.0v1.1Threshold
    Format compliance94%97%> 95%
    Clinical accuracy91%93%> 90%
    Hallucination rate3.2%1.8%under 2%
    Latency (p95)420ms435msunder 500ms

    Only promote v1.1 to production if it meets all thresholds. Keep v1.0 as a rollback option.

    Deployment: Load Once, Swap Per Request

    The inference engine loads the base model once at startup. Adapters are loaded on-demand:

    1. Request arrives tagged with department: radiology
    2. Router checks if radiology-report-v2.1 is in adapter cache
    3. If cached: apply adapter, run inference (add ~5ms latency)
    4. If not cached: load from SSD to GPU (~30-50ms), cache, run inference
    5. Return response

    Most inference frameworks (vLLM, text-generation-inference, Ollama) support this pattern natively. The adapter cache holds the 3-5 most recently used adapters in GPU memory. For a hospital where radiology, primary care, and emergency are the highest-volume departments, those three adapters stay permanently cached.

    Performance: Generic Model vs. Specialty Adapter

    This is where the investment pays off. A generic base model handles medical text reasonably well. A specialty adapter makes it clinically useful.

    Accuracy Comparison (Internal Benchmarks)

    TaskGeneric Base ModelSpecialty LoRA AdapterImprovement
    Radiology report generation71% format compliance96% format compliance+25 pts
    Radiology impression accuracy78%93%+15 pts
    Pathology synoptic completeness65% fields correct94% fields correct+29 pts
    Primary care SOAP notes74%91%+17 pts
    Patient communication readabilityGrade 11 avgGrade 7 avgAppropriate level
    Referral letter completeness68%92%+24 pts
    Discharge summary accuracy72%89%+17 pts
    Clinical coding suggestion accuracy70%88%+18 pts

    Average improvement across all tasks: +20.6 percentage points. This is the difference between a model that clinicians ignore and one they actually use.

    Latency Comparison

    ConfigurationTime to First TokenTotal Generation (500 tokens)
    Base model only45ms380ms
    Base + cached LoRA adapter48ms395ms
    Base + cold LoRA adapter load85ms430ms

    The latency overhead of LoRA is negligible — 3-15ms for a cached adapter. This is invisible in a clinical workflow where the human interaction (clicking, reading, editing) takes seconds.

    Ship AI that runs on your users' devices.

    Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Putting It All Together

    Deployment Checklist

    1. Select base model. Llama 3 8B (Q5_K_M quantization) for balanced performance. Mistral 7B if latency is the top priority.

    2. Prioritize specialties. Start with the 2-3 highest-volume departments. Radiology and primary care are almost always the right first choices.

    3. Collect and de-identify training data. 300-600 examples per specialty. Work with department chairs to identify representative, high-quality examples.

    4. Train adapters. Rank 16-32, learning rate 1e-4 to 2e-4, 3-5 epochs. Validate against a held-out test set after each epoch.

    5. Benchmark against generic model. Document the improvement for each task. This data justifies the deployment to hospital administration.

    6. Deploy with versioning. Use the naming convention above. Keep at least one prior version as a rollback option.

    7. Monitor and retrain. Track accuracy metrics weekly. Retrain quarterly or when performance drifts below thresholds.

    The one-model-many-adapters architecture is not just a cost optimization — it is an operational simplification. One model to update, one model to secure, one model to audit. The adapters add specialization without adding infrastructure complexity.

    Further Reading

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Keep reading