Back to blog
    Ertas vs Together AI: Fine-Tuning Costs, Local Deployment, and Data Privacy
    ertastogether-aicomparisonfine-tuningprivacycost

    Ertas vs Together AI: Fine-Tuning Costs, Local Deployment, and Data Privacy

    Comparing Ertas and Together AI for fine-tuning language models. Covers per-token vs flat-cost inference, data privacy, local deployment, and when each platform wins.

    EErtas Team·

    Together AI is primarily a fast cloud inference provider that also offers fine-tuning. Ertas is primarily a fine-tuning platform that outputs models for local deployment. They overlap in the fine-tuning use case but diverge significantly on everything that happens after training.

    If you are evaluating both, the right question is: where does your model need to live after training?

    Together AI: The Cloud Inference Story

    Together AI built its reputation on fast, affordable cloud inference for open-source models. They run a large GPU cluster optimized for throughput, and their API provides access to 100+ open-source models with competitive per-token pricing. Fine-tuning was added as a feature to let customers customize these models to their use case.

    The Together AI fine-tuning workflow is API-driven:

    import together
    
    # Upload training data
    response = together.Files.upload(file="training_data.jsonl")
    file_id = response["id"]
    
    # Create fine-tuning job
    response = together.FineTuning.create(
        training_file=file_id,
        model="togethercomputer/llama-3-8b",
        n_epochs=3,
        learning_rate=2e-5,
        suffix="my-custom-model"
    )
    

    The result is a fine-tuned model hosted on Together AI's infrastructure, accessible via Together AI's API with the same per-token pricing model as their standard models.

    Together AI's strength is genuine: their inference is fast (among the fastest for open-source models), their API is reliable, and their per-token pricing is competitive with OpenAI for models of similar quality.

    What Ertas Does Differently

    Ertas trains in the cloud and exports the result as a GGUF file you own and run locally. Once you have the GGUF, inference is on your infrastructure at zero per-token cost. The platform offers a visual interface, no Python required, with built-in dataset tools, experiment tracking, and client project management.

    Comparison Table

    DimensionErtasTogether AI
    InterfaceVisual web UIAPI (Python/REST)
    Fine-tuning outputGGUF (local deployment)Model on Together AI's servers
    Inference modelLocal, zero per-token costCloud API, per-token
    Inference speedCPU: 10-25 tok/s; GPU VPS: 40-60 tok/s~150-200 tok/s (A100 cluster)
    Inference availabilityDepends on your infra99.9%+ SLA
    Data privacyTrains in cloud; runs locallyTraining data + inference on Together servers
    GGUF exportYes (one-click)No
    Local deploymentYesNo
    Pricing modelMonthly subscriptionPay-per-token (inference) + training cost
    Cost at 1M tokens/mo~$0 marginal (VPS already running)~$150-400 depending on model
    No-codeYesNo (API/code required)
    Dataset toolsBuilt-in validation, synthesis, evalBasic file upload

    The Per-Token Cost Question

    This is where the comparison becomes stark at scale.

    Together AI fine-tuned model inference pricing varies by model, but for a 7B model expect approximately $0.15-0.20 per million tokens. This is genuinely competitive with OpenAI and much cheaper than GPT-4. But it is still per-token.

    Ertas exports a GGUF file. You run it on your VPS (a $26/month Hetzner box handles a 7B model at 15-25 tokens/second). Inference cost: $0 per token.

    The crossover point depends on your volume:

    Monthly TokensTogether AI API CostErtas + VPS Total Cost
    100,000~$15-20$14.50 (Ertas) + $26 (VPS) = $40.50
    500,000~$75-100$40.50
    1,000,000~$150-200$40.50
    5,000,000~$750-1,000$40.50
    10,000,000~$1,500-2,000$40.50-66.50 (may need larger VPS)

    At 500,000 tokens per month, Together AI and Ertas have similar total costs. Above that, the local model approach is significantly cheaper. Below that, Together AI may be marginally cheaper depending on training job frequency.

    The break-even for a typical application with moderate usage is roughly 2-3 months after setup. After that, every month the local model saves you the equivalent of months of Together AI API costs.

    Data Privacy

    This is often the deciding factor for regulated or privacy-sensitive use cases.

    Together AI: Your training data is uploaded to Together AI's servers for the training job. Your fine-tuned model runs on Together AI's infrastructure. Every user query — every piece of data your application sends to the model — flows through Together AI's systems. This is similar to OpenAI's privacy model.

    For most use cases, this is fine. Together AI has standard data processing agreements. But for healthcare (HIPAA), finance (SOX, GDPR), legal (attorney-client privilege), or any enterprise client who has asked "where does our data go?" — the answer with Together AI is "Together AI's cloud."

    Ertas: Training data is processed on training infrastructure. The resulting GGUF model runs on your infrastructure. User queries at inference time never leave your network. This architecture is inherently compatible with privacy-sensitive deployments because the sensitive data — the inference queries — never touches an external server.

    Speed Comparison

    Together AI's inference advantage is real: their A100 cluster serves tokens at ~150-200 tokens/second for 7B models, with very low latency. Their infrastructure is built for high concurrency.

    Local Ollama inference on a $26/month VPS delivers 15-25 tokens/second for 7B models. For many applications (asynchronous processing, moderate concurrency, non-real-time workflows), this is sufficient. For latency-sensitive production applications serving many concurrent users, Together AI's cloud is meaningfully faster.

    This trade-off is application-specific. A batch document processing workflow is fine at 20 tokens/second. A real-time customer-facing chatbot with 500 concurrent users needs better performance — either a larger VPS, a GPU VPS (~$100-200/month), or a cloud API.

    Use CaseLocal VPS (7B)Together AIRecommendation
    Batch processing15-25 tok/s150-200 tok/sLocal fine-tuned (cost wins)
    Low-concurrency chatbot15-25 tok/s150-200 tok/sLocal fine-tuned (cost wins)
    High-concurrency production (500+ users)May struggleExcellentTogether AI or GPU VPS
    Privacy-sensitiveNo external APIExternal APILocal fine-tuned

    When Together AI Wins

    • You need high-concurrency cloud inference with an SLA
    • Your application has bursting traffic that would require significant local GPU investment
    • You want very low inference latency for real-time user-facing features
    • You do not have privacy-sensitive data
    • You need a quick path to fine-tuned cloud inference without managing infrastructure

    When Ertas Wins

    • You need to run models on your own infrastructure
    • Inference data is privacy-sensitive
    • Your traffic is moderate and predictable
    • You want zero per-token costs after the initial setup
    • You want to actually own the model file, not depend on Together AI's API indefinitely
    • You need the model to work when your internet connection is unreliable
    • You are building for clients who require on-premise deployment

    Ship AI that runs on your users' devices.

    Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Further Reading

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

    Keep reading