Fine-Tune Apertus with Ertas

Switzerland's open-weight foundation model family — fully open weights, training data, and recipes, released under Apache 2.0 with first-class support for 1,000+ languages and explicit alignment to EU AI Act and Swiss data-protection requirements.

8B70BSwiss AI Initiative (ETH Zurich + EPFL + CSCS)

Overview

Apertus is the open-weight foundation model from the Swiss AI Initiative — a collaboration between ETH Zurich, EPFL, and the Swiss National Supercomputing Centre (CSCS). It launched in late 2025 and was substantially upgraded through 2026 as the European answer to the Llama, Qwen, and Mistral families. The model name means 'open' in Latin, and the project's defining commitment is total openness: weights, training data, training recipes, evaluation data, and model cards are all public, and everything is released under Apache 2.0.

This matters for two distinct audiences. For European enterprises and regulated-industry teams, Apertus is the cleanest path to a model whose entire provenance is auditable — a meaningful advantage under the EU AI Act's transparency requirements and under Swiss data-protection rules. For multilingual applications, Apertus is unusual: training data covers 1,000+ languages (including substantial coverage of low-resource European languages, Swiss German dialects, and African languages that mainstream open models underweight), and the multilingual evaluations are competitive with Qwen 3 and Llama 4 on the languages where they overlap.

The family ships in 8B and 70B dense variants. Both share the same training corpus, the same tokenizer, and the same alignment recipe, which makes the 8B a useful lab-scale stand-in for development before scaling to the 70B for deployment. CSCS provides public inference endpoints and the model is available on Hugging Face under `swiss-ai/Apertus-8B` and `swiss-ai/Apertus-70B`.

Key Features

Apache 2.0 licensing across the entire stack — weights, code, and training data — is the headline feature. This is more permissive than Llama (custom community license) and is the same license posture as Mistral, Qwen, and Gemma 4. For commercial users, redistribution and derivative works are explicitly permitted without separate negotiation.

Full training-data transparency is the Apertus distinctive. Most open-weight models — including most that are 'open source' in name — release weights without releasing training data. Apertus's training corpus is published, documented, and filterable; data lineage from raw source to final checkpoint is reconstructible. For EU AI Act compliance and for organizations whose own data-governance policies require auditable model provenance, this transforms what was previously a structural blocker into a solvable due-diligence problem.

The multilingual coverage is unusually broad. Where most open-weight models concentrate on English plus a curated set of 20–100 languages, Apertus's tokenizer and training corpus span 1,000+ languages with intentional emphasis on European multilingualism (including German, French, Italian, Romansh — the four Swiss national languages — and minority European languages like Catalan, Basque, and Welsh). For European builders shipping multilingual products, this is often the deciding factor.

Fine-Tuning with Ertas

Apertus 8B is well-suited to Ertas Studio fine-tuning for multilingual and regulated-industry use cases. QLoRA fine-tunes comfortably on a single 16-24GB consumer GPU at typical 2048-token sequence lengths. The Apache 2.0 licensing means fine-tuned derivatives can be redistributed without licensing complexity, which simplifies the agency and reseller paths in Studio's Pro and Business tiers.

The full-data transparency is a meaningful asset in fine-tuning workflows. Studio's fine-tuning runs produce model cards that link back to the base-model lineage, and with Apertus that lineage is itself fully auditable. For teams selling fine-tuned models to regulated-industry clients (legal, healthcare, financial services in EU jurisdictions), the ability to deliver a complete provenance chain — base-model training data → fine-tuning data → final adapter — is a procurement advantage.

For multilingual fine-tuning specifically, Apertus is often the right base over Llama 3 or Qwen 3 when the target language set includes European minority languages or low-resource languages where the other bases underperform. Studio's multilingual evaluation suite supports custom language configurations and can be pointed at the Apertus evaluation set for direct comparison against the published baseline.

Use Cases

The strongest use case for Apertus is European regulated-industry deployment: legal AI for EU jurisdictions, healthcare AI under GDPR, financial services AI subject to MiCA and other EU regulations, and public-sector deployments under EU AI Act Article 50 transparency obligations. The combination of full-data transparency and Apache 2.0 licensing is unique among credible open-weight models and meaningfully shortens the procurement cycle.

Multilingual product teams targeting European markets are a second strong fit. Apertus's coverage of Swiss German, Romansh, Catalan, Basque, and other underweighted European languages produces meaningfully better outputs than Llama or Qwen on these languages — both for direct generation and as a base for translation fine-tuning. For consumer apps with a multilingual user base in Europe, Apertus is increasingly the right starting point.

Research and academic uses are a third natural fit. Because the entire training pipeline is reproducible from public artifacts, Apertus is one of the few credible open-weight bases for ML research that needs full reproducibility (e.g., papers studying training-data influence, scaling laws, multilingual transfer). Several 2026 papers on data-contamination measurement and on multilingual fairness use Apertus as the reference base.

Hardware Requirements

Apertus 8B at Q4_K_M is approximately 4.5GB. Single-GPU consumer hardware (RTX 3060 12GB and above) handles inference and QLoRA fine-tuning. Throughput on consumer GPUs is typically 50–80 tokens per second at standard context lengths.

Apertus 70B at Q4_K_M is approximately 38GB. A single 48GB GPU (RTX 6000 Ada, A6000) handles inference; multi-GPU is required for fine-tuning at full sequence length. Most production deployments of Apertus 70B run on data-center hardware (H100, MI300X) or via the CSCS-provided endpoints.

For mobile deployment via Ertas Deployment CLI, Apertus 8B at Q4_K_M is too large for most phones today (4.5GB exceeds the working-memory budget of mid-tier devices), but Apertus distillation runs in Studio can produce smaller derivatives suitable for on-device shipping. The Apache 2.0 license makes such distillation derivatives freely redistributable.