Fine-Tune Ant Group Ling / Ring with Ertas

Ant Group's trillion-parameter open-weight family — Ling-2.5-1T (non-thinking, 1M context) and Ring-2.5-1T (the world's first hybrid-linear-architecture thinking model, gold-tier on IMO 2025 with 35/42 score and CMO 2025), plus the April 2026 Ling-2.6-1T update.

1T (Ling/Ring 2.5)1T (Ling 2.6)Ant Group (inclusionAI)

Overview

Ant Group's Ling and Ring lines, released through their inclusionAI organization, are among the most architecturally distinctive 2026 open-weight releases. Both are 1-trillion-parameter models — but they target fundamentally different use cases through different architectural choices. Ling-2.5-1T (released February 16, 2026) is a non-thinking model with a 1 million token context window, designed for long-context reasoning workflows where extensive context matters more than extended deliberation. Ring-2.5-1T (released same day) is the world's first hybrid-linear-architecture thinking model, designed specifically for reasoning-heavy workloads where extended chain-of-thought outweighs context length.

The headline result for Ring-2.5-1T is gold-tier performance on IMO 2025 (International Mathematical Olympiad) at 35/42 — a score competitive with strong human competitors and substantially exceeding most open-weight reasoning models. Ring also achieves gold-tier performance on CMO 2025 (Chinese Mathematical Olympiad). For mathematical reasoning specifically, Ring-2.5-1T is among the most capable open-weight options available, with the hybrid-linear architecture providing reasoning-mode efficiency that pure-transformer alternatives can't match at the same scale.

The Ling line was extended on April 23, 2026 with Ling-2.6-1T — an update to the non-thinking variant that adds capability improvements while maintaining the 1M context positioning. The Ling and Ring lines are positioned as complementary rather than competing — teams can deploy both for different workloads, with Ling handling long-context tasks and Ring handling reasoning-heavy tasks.

Ant Group's emergence as a serious open-weight provider is a notable 2026 industry development. While Ant has been involved in AI research for years (primarily through Alibaba ecosystem connections), the Ling/Ring releases represent the company's first frontier-scale open-weight contributions. The hybrid-linear architectural innovation in particular establishes Ant as a research lab worth watching, not just a deployment-engineering organization. Weights are available on Hugging Face under the inclusionAI organization.

Key Features

Hybrid-linear architecture in Ring-2.5-1T is the technical headline. Standard transformer attention has quadratic compute complexity in sequence length, making extended reasoning expensive. Linear-attention variants (Mamba, RWKV, Hyena) have linear complexity but historically worse quality. Hybrid-linear architectures combine the two — interleaving full attention layers with linear attention layers to capture the quality benefits of attention while substantially reducing compute cost on long reasoning traces. Ring-2.5-1T is the first frontier-scale implementation of this pattern in a thinking model, and the IMO 2025 gold-tier result demonstrates that the hybrid approach doesn't sacrifice reasoning quality.

IMO 2025 gold-tier score of 35/42 places Ring-2.5-1T among the most capable mathematical reasoning models — open-weight or proprietary. IMO problems require sustained multi-step reasoning, careful arithmetic, and strategic problem-solving that simple pattern-matching can't achieve. Ring's score is competitive with strong human competitors and substantially exceeds most prior open-weight reasoning models. CMO (Chinese Mathematical Olympiad) gold-tier performance further validates the result across a different problem distribution.

Ling-2.5-1T's 1M context combined with non-thinking architecture targets a different use case profile. Where Ring optimizes for reasoning depth, Ling optimizes for context breadth — long-document analysis, multi-document synthesis, full-codebase reasoning at the trillion-parameter scale. The non-thinking design means Ling responds directly without extended deliberation, producing fast responses for context-heavy queries that don't benefit from reasoning mode.

The Ling-2.6-1T April update extends the non-thinking line with capability improvements while preserving the 1M context positioning. For teams running production workflows on Ling-2.5-1T, the 2.6 update offers measurable gains without operational migration costs since the deployment surface and prompt patterns remain compatible.

Fine-Tuning with Ertas

Ling and Ring fine-tuning in Ertas Studio requires multi-GPU server configurations at the 1T parameter scale. QLoRA training needs approximately 600-700GB total VRAM at typical sequence lengths, fitting on an 8x A100 80GB or 8x H100 80GB server.

For most teams without 8-GPU server access, the recommended pattern is teacher-student distillation. Ring-2.5-1T as a reasoning teacher is particularly effective — its IMO-tier mathematical reasoning capability translates to high-quality synthetic reasoning trace data, which can then be used to fine-tune smaller bases (Qwen 32B, Llama 70B, DeepSeek-R1 distilled variants) for domain-specific reasoning capability at single-GPU deployment cost.

The hybrid-linear architecture in Ring requires Ertas Studio's MoE-aware training pipeline plus specific handling for the linear attention layers — complexity that the platform handles automatically without user configuration. After training, Ertas Studio exports to GGUF format with full Ring or Ling chat template preservation, including the architectural specifications that downstream inference frameworks need.

For mathematical reasoning fine-tuning specifically, Ring-2.5-1T is the strongest base in the open-weight ecosystem. Combined with Ertas Studio's support for explicit reasoning trace training data formats, this enables building specialized mathematical reasoning models for education, research, or technical domains where IMO-tier capability matters.

Use Cases

Ring-2.5-1T targets mathematical reasoning, scientific analysis, and structured problem-solving applications where IMO-tier reasoning capability genuinely matters. Educational platforms (advanced mathematics tutoring, competitive math training), research assistance (mathematical literature analysis, theorem verification), and technical analysis (engineering calculations, scientific computing) all benefit from Ring's combination of strong reasoning capability and hybrid-linear architectural efficiency.

Ling-2.5-1T and Ling-2.6-1T target long-context, non-reasoning workloads. Long-document analysis (legal contracts, regulatory filings, multi-volume technical documentation), multi-document synthesis (literature reviews, competitive intelligence), and full-codebase reasoning all benefit from Ling's 1M context combined with the trillion-parameter capacity. The non-thinking architecture means responses are fast — appropriate for production serving where latency matters.

For teams that previously deployed separate reasoning and chat models, Ling + Ring provides a complementary pairing under unified Ant Group infrastructure. Teams can route reasoning-heavy queries to Ring and context-heavy queries to Ling, both deployed through compatible inference infrastructure. This is structurally similar to how teams previously deployed DeepSeek-R1 + DeepSeek-V3, but with Ant Group's specific architectural strengths.

For teams interested in alternative-architecture research and deployment, Ring-2.5-1T is a particularly interesting production-deployable artifact of hybrid-linear research. Most hybrid-linear research models are smaller proof-of-concept releases; Ring at 1T scale demonstrates that the architecture works at frontier scale, opening up production deployment options that weren't previously accessible.

Hardware Requirements

Ant Group Ling-2.5-1T or Ring-2.5-1T at Q4_K_M quantization requires approximately 540GB of memory, fitting on an 8x A100 80GB or 8x H100 80GB server. CPU inference is feasible on hosts with 768GB+ RAM but at substantially lower throughput than GPU deployment.

For smaller deployments, Q3_K_M quantization (approximately 405GB) trades modest quality for reduced memory, fitting on a 4x H100 80GB server with margin. Below Q3 is not recommended for production deployment — the reasoning capability that distinguishes Ring depends on consistent quality across long reasoning chains, and aggressive quantization compounds error in ways that degrade reasoning more than it does direct-response generation.

For fine-tuning in Ertas Studio: Ling/Ring QLoRA needs approximately 600-700GB total VRAM (multi-GPU server). For teams without that scale, Ring as a teacher for mathematical reasoning distillation onto smaller bases (Qwen 32B at 40GB GPU, Llama 70B at 48GB GPU) is the most practical path to capturing Ring's reasoning patterns at deployable infrastructure scale.

Supported Quantizations

Q3_K_MQ4_0Q4_K_MQ5_K_MQ6_KQ8_0

Related Resources

Integration

llama.cpp

Integration

Ollama

Integration

vLLM

Ship AI that runs on your users' devices.

Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

View early bird pricing or join the waitlist →