Ertas vs Together AI: Fine-Tuning Costs, Local Deployment, and Data Privacy

Together AI is primarily a fast cloud inference provider that also offers fine-tuning. Ertas is primarily a fine-tuning platform that outputs models for local deployment. They overlap in the fine-tuning use case but diverge significantly on everything that happens after training.

If you are evaluating both, the right question is: where does your model need to live after training?

Together AI: The Cloud Inference Story

Together AI built its reputation on fast, affordable cloud inference for open-source models. They run a large GPU cluster optimized for throughput, and their API provides access to 100+ open-source models with competitive per-token pricing. Fine-tuning was added as a feature to let customers customize these models to their use case.

The Together AI fine-tuning workflow is API-driven:

import together

# Upload training data
response = together.Files.upload(file="training_data.jsonl")
file_id = response["id"]

# Create fine-tuning job
response = together.FineTuning.create(
    training_file=file_id,
    model="togethercomputer/llama-3-8b",
    n_epochs=3,
    learning_rate=2e-5,
    suffix="my-custom-model"
)

The result is a fine-tuned model hosted on Together AI's infrastructure, accessible via Together AI's API with the same per-token pricing model as their standard models.

Together AI's strength is genuine: their inference is fast (among the fastest for open-source models), their API is reliable, and their per-token pricing is competitive with OpenAI for models of similar quality.

What Ertas Does Differently

Ertas trains in the cloud and exports the result as a GGUF file you own and run locally. Once you have the GGUF, inference is on your infrastructure at zero per-token cost. The platform offers a visual interface, no Python required, with built-in dataset tools, experiment tracking, and client project management.

Comparison Table

Dimension	Ertas	Together AI
Interface	Visual web UI	API (Python/REST)
Fine-tuning output	GGUF (local deployment)	Model on Together AI's servers
Inference model	Local, zero per-token cost	Cloud API, per-token
Inference speed	CPU: 10-25 tok/s; GPU VPS: 40-60 tok/s	~150-200 tok/s (A100 cluster)
Inference availability	Depends on your infra	99.9%+ SLA
Data privacy	Trains in cloud; runs locally	Training data + inference on Together servers
GGUF export	Yes (one-click)	No
Local deployment	Yes	No
Pricing model	Monthly subscription	Pay-per-token (inference) + training cost
Cost at 1M tokens/mo	~$0 marginal (VPS already running)	~$150-400 depending on model
No-code	Yes	No (API/code required)
Dataset tools	Built-in validation, synthesis, eval	Basic file upload

The Per-Token Cost Question

This is where the comparison becomes stark at scale.

Together AI fine-tuned model inference pricing varies by model, but for a 7B model expect approximately $0.15-0.20 per million tokens. This is genuinely competitive with OpenAI and much cheaper than GPT-4. But it is still per-token.

Ertas exports a GGUF file. You run it on your VPS (a $26/month Hetzner box handles a 7B model at 15-25 tokens/second). Inference cost: $0 per token.

The crossover point depends on your volume:

Monthly Tokens	Together AI API Cost	Ertas + VPS Total Cost
100,000	~$15-20	$14.50 (Ertas) + $26 (VPS) = $40.50
500,000	~$75-100	$40.50
1,000,000	~$150-200	$40.50
5,000,000	~$750-1,000	$40.50
10,000,000	~$1,500-2,000	$40.50-66.50 (may need larger VPS)

At 500,000 tokens per month, Together AI and Ertas have similar total costs. Above that, the local model approach is significantly cheaper. Below that, Together AI may be marginally cheaper depending on training job frequency.

The break-even for a typical application with moderate usage is roughly 2-3 months after setup. After that, every month the local model saves you the equivalent of months of Together AI API costs.

Data Privacy

This is often the deciding factor for regulated or privacy-sensitive use cases.

Together AI: Your training data is uploaded to Together AI's servers for the training job. Your fine-tuned model runs on Together AI's infrastructure. Every user query — every piece of data your application sends to the model — flows through Together AI's systems. This is similar to OpenAI's privacy model.

For most use cases, this is fine. Together AI has standard data processing agreements. But for healthcare (HIPAA), finance (SOX, GDPR), legal (attorney-client privilege), or any enterprise client who has asked "where does our data go?" — the answer with Together AI is "Together AI's cloud."

Ertas: Training data is processed on training infrastructure. The resulting GGUF model runs on your infrastructure. User queries at inference time never leave your network. This architecture is inherently compatible with privacy-sensitive deployments because the sensitive data — the inference queries — never touches an external server.

Speed Comparison

Together AI's inference advantage is real: their A100 cluster serves tokens at ~150-200 tokens/second for 7B models, with very low latency. Their infrastructure is built for high concurrency.

Local Ollama inference on a $26/month VPS delivers 15-25 tokens/second for 7B models. For many applications (asynchronous processing, moderate concurrency, non-real-time workflows), this is sufficient. For latency-sensitive production applications serving many concurrent users, Together AI's cloud is meaningfully faster.

This trade-off is application-specific. A batch document processing workflow is fine at 20 tokens/second. A real-time customer-facing chatbot with 500 concurrent users needs better performance — either a larger VPS, a GPU VPS (~$100-200/month), or a cloud API.

Use Case	Local VPS (7B)	Together AI	Recommendation
Batch processing	15-25 tok/s	150-200 tok/s	Local fine-tuned (cost wins)
Low-concurrency chatbot	15-25 tok/s	150-200 tok/s	Local fine-tuned (cost wins)
High-concurrency production (500+ users)	May struggle	Excellent	Together AI or GPU VPS
Privacy-sensitive	No external API	External API	Local fine-tuned

When Together AI Wins

You need high-concurrency cloud inference with an SLA
Your application has bursting traffic that would require significant local GPU investment
You want very low inference latency for real-time user-facing features
You do not have privacy-sensitive data
You need a quick path to fine-tuned cloud inference without managing infrastructure

When Ertas Wins

You need to run models on your own infrastructure
Inference data is privacy-sensitive
Your traffic is moderate and predictable
You want zero per-token costs after the initial setup
You want to actually own the model file, not depend on Together AI's API indefinitely
You need the model to work when your internet connection is unreliable
You are building for clients who require on-premise deployment

Ship AI that runs on your users' devices.

Ertas early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

View early bird pricing or join the waitlist →