Fine-Tune Tencent Hy3 (Hunyuan 3) Preview with Ertas

Tencent's April 23 2026 comeback release — a 295-billion parameter mixture-of-experts with 21B active parameters plus a 3.8B Multi-Token Prediction module, built in 90 days under former OpenAI researcher Shunyu Yao after a complete Hunyuan infrastructure rebuild. 256K context with strong math, code, and multilingual performance.

295B-A21B + 3.8B MTPTencent

Overview

Tencent Hy3 (Hunyuan 3) Preview, released April 23 2026, is the most significant Tencent open-weight release in over a year and represents the company's strategic comeback in the open-weight ecosystem. The model is a 295-billion parameter mixture-of-experts with 21B active parameters per token, plus an additional 3.8-billion parameter Multi-Token Prediction (MTP) module that improves generation efficiency for streaming and structured outputs.

The story behind the model is as notable as the model itself. After a period where Tencent's Hunyuan series fell behind the rapid release cadence of DeepSeek, Qwen, and Kimi, Tencent rebuilt their AI infrastructure from scratch starting in February 2026 under former OpenAI researcher Shunyu Yao. The rebuild took 90 days from infrastructure decisions to a deployable Hy3 Preview model — an unusually compressed timeline reflecting both the urgency Tencent felt and the maturity of the underlying training stack the team rebuilt on.

Hy3 Preview's benchmark results validate the rebuild effort. The model outperforms DeepSeek-V3 on math, code, and multilingual benchmarks, placing it competitively with the top tier of late-2025 open-weight releases (though not at the absolute frontier of the 2026 leaderboard dominated by DeepSeek V4, Kimi K2.6, and similar). The 'Preview' designation indicates Tencent expects continued refinement before the full Hy3 release — likely targeting a Q3 2026 timeframe based on Tencent's historical release patterns.

The 3.8B MTP module is an architectural detail worth understanding. Multi-Token Prediction enables the model to generate multiple tokens per forward pass for predictable patterns (structured outputs, common code patterns, repeated formatting), substantially improving end-to-end generation throughput for these patterns. While MTP doesn't help on creative or unpredictable text generation, it provides meaningful speedups for the structured-output workloads that dominate production agent deployments.

Weights are available on Hugging Face under `tencent/Hy3-preview`. The license is open-weight but worth reviewing for specific deployment scenarios. The 256K context window is competitive with the broader 2026 cohort and supports most production long-context use cases.

Key Features

The 295B-A21B MoE architecture with the additional 3.8B MTP module is operationally distinctive. The MTP module improves throughput substantially on structured-output and pattern-rich workloads — function calls, JSON output, code generation, formatted content — which represent the bulk of production agent traffic. Combined with the 21B active parameter count for the main model, Hy3 Preview delivers production-friendly inference economics.

The 90-day infrastructure rebuild is a genuinely interesting industry data point. Most frontier model training pipelines accumulate over years of organizational investment, making it difficult to evaluate how much of a given lab's capability is reproducible vs. dependent on accumulated tacit knowledge. Tencent's Hy3 demonstrates that a well-resourced team with clear leadership can rebuild a competitive training stack in a quarter — not from zero, but from organizational ground state to deployable model. The implications for industry training-cost dynamics are substantial.

Math, code, and multilingual outperformance against DeepSeek-V3 (the prior generation of one of the strongest open-weight families) places Hy3 Preview in a credible competitive position. While not at the absolute frontier of the 2026 leaderboard, Hy3 Preview is a meaningful re-entry of Tencent's Hunyuan series into the competitive open-weight conversation after a period of being viewed as a distant follower.

Under Shunyu Yao's leadership, the broader Hy3 development trajectory targets continued refinement — the 'Preview' designation indicates ongoing work on the post-training pipeline, additional specialized variants (likely coding and multimodal), and a full Hy3 release later in 2026. For teams evaluating Tencent's open-weight options, the trajectory is more interesting than the current snapshot — Hy3 Preview is a credible starting point for a series likely to continue improving rapidly.

Fine-Tuning with Ertas

Tencent Hy3 Preview fine-tuning in Ertas Studio requires multi-GPU server configurations for QLoRA at the full model scale. Approximately 200-260GB of total VRAM is needed at typical sequence lengths, fitting on a 4x A100 80GB or equivalent server.

For most teams without that infrastructure, the recommended pattern is teacher-student distillation: use Hy3 Preview as a teacher to generate synthetic training data, then fine-tune a smaller base model (Qwen 32B, Llama 70B, or DeepSeek-R1 distilled variants) on that data. This produces a domain-specialized model at single-GPU deployment cost while inheriting Hy3 Preview's behavioral patterns.

For fine-tuning datasets, Hy3 Preview benefits from training data that includes structured outputs, function calls, and multi-language content. The MTP module's throughput advantages translate to substantially faster training on these patterns — an unexpected benefit beyond just inference economics. Ertas Studio handles MTP-aware training automatically, preserving the throughput advantages in fine-tuned variants.

After training, Ertas Studio exports to GGUF format with full Hy3 Preview chat template preservation. The MTP module is preserved in the export, maintaining the inference throughput advantages in deployed fine-tunes.

Use Cases

Hy3 Preview's primary use cases align with Tencent's broader product positioning — gaming, social applications, and Chinese-market consumer software. For teams in these adjacent markets or with existing Tencent product integrations, Hy3 Preview is a natural starting point that aligns with broader Tencent infrastructure choices.

Beyond Tencent-specific positioning, Hy3 Preview is a credible general-purpose option for teams wanting Chinese-lab open-weight quality with a different organizational backing than the DeepSeek/Qwen/Kimi triad that dominates current discussion. For supply-chain diversification or strategic-positioning reasons, including Tencent in your model portfolio diversifies dependencies on any single Chinese AI lab's continued release cadence and quality trajectory.

Structured-output and agent-execution workloads benefit specifically from the MTP architectural choice. Production agent systems that generate substantial structured output — function calls, JSON responses, formatted reports, code generation — see meaningful throughput improvements over alternative open-weight models at equivalent benchmark quality. For high-volume agent deployments where token-cost and latency matter equally, Hy3 Preview is worth evaluating against the established options.

Multilingual applications benefit from Hy3 Preview's strong multilingual benchmark performance. While Qwen 3.6 has broader language coverage (119 languages vs. Hy3 Preview's smaller but high-quality language set), Hy3 Preview is competitive on the major commercial languages and has particularly strong Chinese-language performance for teams targeting Chinese-market deployments.

Hardware Requirements

Tencent Hy3 Preview at Q4_K_M quantization requires approximately 165GB of memory, fitting on a 2x H100 80GB or 3x A100 80GB server, or a CPU inference host with 256GB+ RAM. Active parameter count of 21B (plus the 3.8B MTP module for structured outputs) determines token generation throughput once loaded.

For smaller deployments, Q3_K_M quantization (approximately 125GB) trades modest quality for reduced memory, fitting on a single 80GB GPU with margin or 2x 64GB Apple Silicon Mac Studios. Below Q3 is not recommended for production deployments — quality degradation on multi-step reasoning becomes noticeable.

For fine-tuning in Ertas Studio: Hy3 Preview QLoRA needs approximately 200-260GB total VRAM (multi-GPU server). For teams without that scale, distillation onto smaller bases via teacher-generated synthetic data uses standard 20-48GB VRAM and delivers Hy3 Preview's behavioral patterns at substantially lower fine-tuning cost.

Supported Quantizations

Q3_K_MQ4_0Q4_K_MQ5_K_MQ6_KQ8_0

Related Resources

Integration

llama.cpp

Integration

Ollama

Integration

vLLM

Ship AI that runs on your users' devices.

Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

View early bird pricing or join the waitlist →