Back to blog
    Pricing Your AI Agency Services: Flat-Rate vs. Per-Token When Using Self-Hosted Models
    pricingagencyself-hostingbusiness-modelsegment:agency

    Pricing Your AI Agency Services: Flat-Rate vs. Per-Token When Using Self-Hosted Models

    How self-hosted AI models change agency pricing strategy. Flat-rate, per-seat, and hybrid pricing models with worked margin examples at each GPU tier.

    EErtas Team·

    Most AI agencies inherited their pricing model from the API era: charge clients based on usage, pass through API costs with a markup. It works, but it caps your margins and makes revenue unpredictable.

    Self-hosted models break this dynamic. Your cost is a fixed GPU expense, not a per-token variable. This creates pricing opportunities that API-dependent agencies cannot match.

    This article extends the AI agency pricing strategy guide with specific pricing models for agencies running self-hosted fine-tuned models.

    The Step-Function Insight

    API costs are linear: more tokens, more cost. Self-hosted costs are step functions: fixed cost per GPU tier, zero marginal cost within that tier.

    This single fact changes everything about how you should price:

    Pricing ModelAPI-Based AgencySelf-Hosted Agency
    Cost structureVariable (per token)Fixed (per GPU tier)
    Margin on high-usage clientsThin or negativeExcellent
    Revenue predictabilityLowHigh
    Pricing flexibilityLimited by COGSWide margin range
    Client preferenceUnpredictable billsPredictable budgets

    When your costs are fixed, every pricing model that charges more than your fixed cost produces margin. The question is not "can I afford to serve this client?" but "which pricing model maximises the value I capture?"

    Pricing Model 1: Flat-Rate Monthly Retainer

    How it works: Client pays a fixed monthly fee for unlimited AI usage within defined scope.

    Example:

    • Contract review AI for a law firm: $5,000/month flat
    • Includes: unlimited contract reviews, monthly model retraining, support
    • Your cost: ~$200/month allocated (share of GPU, electricity, Ertas Studio seat)
    • Gross margin: 96%

    When to use:

    • Clients with predictable, moderate-to-high usage
    • Enterprise clients who prefer budget certainty
    • Engagements where usage growth benefits you (client uses more → they get more value → they stay longer)

    Risks:

    • A single client with extreme usage could saturate your GPU capacity
    • Mitigate by defining "unlimited within reasonable use" or setting a soft cap

    Margin analysis at different client counts (1 × RTX 5090, $42/month operational):

    ClientsRevenue (at $3,000/mo each)GPU CostGross Margin
    3$9,000$4299.5%
    5$15,000$4299.7%
    10$30,000$4299.9%

    Even at conservative pricing, margins are extraordinary once the GPU is paid off.

    Pricing Model 2: Per-Seat Pricing

    How it works: Client pays per user who has access to the AI tools.

    Example:

    • AI-powered legal research assistant: $200/user/month
    • Law firm with 15 associates: $3,000/month
    • Your cost: ~$200/month allocated
    • Gross margin: 93%

    When to use:

    • Products where usage scales with headcount
    • Clients who think in terms of per-employee software costs
    • When you want pricing to scale naturally as the client grows

    Advantages:

    • Familiar pricing model for enterprise buyers (like SaaS)
    • Revenue grows automatically as the client adds users
    • Easy for clients to budget and approve

    Margin analysis:

    Per-seat price10-person firm50-person firm200-person firm
    $100/seat$1,000/mo$5,000/mo$20,000/mo
    $200/seat$2,000/mo$10,000/mo$40,000/mo
    $500/seat$5,000/mo$25,000/mo$100,000/mo

    Your GPU cost is the same regardless of seat count (until you hit capacity limits). Per-seat pricing at large firms is wildly profitable.

    Pricing Model 3: Per-Project or Per-Engagement

    How it works: Client pays a fixed fee for a defined project (e.g., review a specific set of documents).

    Example:

    • Due diligence review for an M&A transaction: $15,000 per deal
    • Includes: AI-assisted review of up to 5,000 documents, summary report, risk analysis
    • Your cost: 2-3 days of agency time + negligible compute
    • Gross margin: 70-80% (lower than retainer because it includes labour)

    When to use:

    • Transaction-based work (M&A, litigation document review)
    • Clients who are not ready for a monthly commitment
    • High-value engagements where the output is clearly tied to a business outcome

    Advantages:

    • Aligns pricing with value delivered (a $50M M&A deal justifies $15K for AI review)
    • No ongoing commitment required (lower barrier to entry)
    • Can lead to retainer engagements after proving value

    Pricing Model 4: Hybrid (Base + Usage)

    How it works: Client pays a base retainer for the platform/access, plus a per-unit fee for heavy usage.

    Example:

    • Base: $2,000/month (includes platform access, model hosting, standard support)
    • Per-review: $25 per contract review beyond 100/month
    • Most clients stay within the base tier — the per-unit pricing is insurance against extreme usage

    When to use:

    • When you need to protect against outlier usage patterns
    • When clients have variable but somewhat predictable workloads
    • As a middle ground for clients hesitant to commit to flat-rate

    Worked Margin Examples at Each GPU Tier

    Tier 1: Single RTX 5090 ($2,000 hardware, $42/month operation)

    ScenarioMonthly RevenueMonthly CostGross MarginAnnual Profit
    3 clients × $3,000 flat$9,000$4299.5%$107,496
    5 clients × $2,000 flat$10,000$4299.6%$119,496
    10 clients × $1,500 flat$15,000$4299.7%$179,496

    Hardware ROI: 1-2 months.

    Tier 2: Dual RTX 5090 ($4,000 hardware, $84/month operation)

    ScenarioMonthly RevenueMonthly CostGross MarginAnnual Profit
    10 clients × $3,000 flat$30,000$8499.7%$359,808
    15 clients × $2,000 flat$30,000$8499.7%$359,808
    20 per-seat at $200, avg 10 seats$40,000$8499.8%$479,808

    Tier 3: A6000 ($4,500 hardware, $22/month operation)

    Better for agencies needing 48 GB VRAM (larger models, more concurrent adapters):

    ScenarioMonthly RevenueMonthly CostGross MarginAnnual Profit
    15 clients × $2,500 flat$37,500$2299.9%$449,736
    5 enterprise clients × $10,000 flat$50,000$22100.0%$599,736

    Note: These are gross margins on compute. Total agency margins include labour, software subscriptions, overhead, and client acquisition costs. Realistic net margins for a well-run agency: 40-60%.

    Pricing for Regulated Industries

    Legal and healthcare clients pay a compliance premium. They are not comparing your price to ChatGPT — they are comparing it to the cost of non-compliance (fines, malpractice risk, reputational damage).

    Compliance premium guidelines:

    IndustryStandard AI PricingWith Compliance Premium
    General business$1,500-3,000/month
    Legal services$3,000-8,000/month
    Healthcare$4,000-10,000/month
    Financial services$5,000-12,000/month
    Government/defence$8,000-20,000/month

    The compliance premium is justified because:

    1. On-premise deployment requires more setup and maintenance
    2. Compliance documentation and audit support add ongoing value
    3. The alternative (cloud AI with compliance risk) is not actually an option for these clients
    4. Data sovereignty guarantees have real, quantifiable value

    The Pricing Conversation

    When presenting pricing to a prospective client:

    Lead with value, not cost. "This solution saves your associates 8 hours per week" is a stronger frame than "this costs $5,000/month."

    Anchor to the alternative. "Hiring an ML team to build this in-house would cost $500K/year. Our solution delivers the same outcome for $60K/year."

    Make the ROI obvious. "At $400/hour billing rates, saving 8 associate-hours per week = $166K/year in additional billable time. Our $60K annual fee delivers a 2.8x return."

    Offer a pilot. "Start with a 3-month pilot at $X/month. If the ROI is not clear by month 3, we will part ways." This de-risks the decision for the client.


    Ship AI that runs on your users' devices.

    Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Further Reading

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Keep reading