Fine-Tune Qwen3-Coder with Ertas
Alibaba's specialized coding model line — including the 480B-A35B Qwen3-Coder flagship with 256K-1M context and the 80B-A3B Qwen3-Coder-Next, both designed natively for Claude Code, Cline, and Qwen Code-style agentic coding CLIs. Apache 2.0.
Overview
Qwen3-Coder is Alibaba's dedicated coding model line within the Qwen 3 family, designed specifically for agentic coding workloads rather than general chat or reasoning. The flagship Qwen3-Coder-480B-A35B-Instruct combines a large mixture-of-experts architecture (480B total / 35B active) with a 256K native context window extrapolatable to 1M tokens, targeting full-codebase reasoning and long-horizon coding tasks. Smaller variants — Qwen3-Coder-30B-A3B and the Qwen3-Coder-Next 80B-A3B — extend the coding-focused training to mid-tier deployment scales.
What distinguishes Qwen3-Coder from general-purpose models that happen to code well is targeted post-training: the models were trained explicitly on agentic coding traces, including planning, multi-file edits, test execution, and iteration based on observed outcomes. This pattern is what tools like Claude Code, Cline, Aider, and Qwen Code rely on, and Qwen3-Coder's training matches the deployment pattern. As a result, Qwen3-Coder produces more reliable agentic coding behavior than non-specialized Qwen 3 variants of equivalent size.
Qwen3-Coder-Next (80B-A3B) is particularly notable for its inference economics. With only ~3B active parameters per token, it serves at speeds comparable to a 3B dense model while delivering coding-specific quality competitive with much larger models. SWE-Bench Verified scores around 70.6% put it among the strongest open-weight coding models — and the inference speed makes it practical for high-throughput agentic deployment where larger models would be prohibitively expensive.
All Qwen3-Coder variants are released under Apache 2.0 with weights on Hugging Face under `Qwen/Qwen3-Coder-480B-A35B-Instruct`, `Qwen/Qwen3-Coder-30B-A3B-Instruct`, and `Qwen/Qwen3-Coder-Next`.
Key Features
Targeted agentic-coding training is Qwen3-Coder's core differentiator. The models were post-trained on traces from real coding workflows: task descriptions, planning steps, multi-file edits, test runs, and iterative correction. This pattern-matched training produces models that handle agentic coding deployments more reliably than general-purpose models, even when the general model has higher synthetic-benchmark scores.
The 480B-A35B flagship's 256K-1M context window enables full-codebase reasoning that smaller-context models can't achieve. With effective context engineering (relevant files at start and end of context, summarized middle sections), the model can reason holistically across an entire repository in a single prompt.
Qwen3-Coder-Next at 80B-A3B is the practical sweet-spot variant for production agentic coding. The 3B active parameter count gives it inference economics suitable for high-throughput serving, while the SWE-Bench Verified score of ~70.6% is competitive with much larger general-purpose models. For self-hosted deployments where Claude Code or Cursor backend pricing is prohibitive, Qwen3-Coder-Next is the strongest open-weight alternative for many workloads.
All variants integrate natively with Qwen-Agent and external coding CLIs (Claude Code, Cline, Qwen Code) via standard MCP and function-calling interfaces. This means deployment requires minimal integration glue compared to bolting agentic capability onto a non-specialized base.
Fine-Tuning with Ertas
Qwen3-Coder variants are well-supported in Ertas Studio's fine-tuning pipeline. The 30B-A3B variant fine-tunes with QLoRA on a single 24GB GPU thanks to the 3B active parameter count. Qwen3-Coder-Next at 80B-A3B fits on a 48-80GB GPU at typical sequence lengths.
The 480B-A35B flagship requires multi-GPU server fine-tuning. For most teams, the recommended pattern is to use the 480B as a teacher for generating synthetic coding-trace data, then fine-tune Qwen3-Coder-Next or Qwen3-Coder-30B-A3B on that data plus your own codebase examples. This produces a model specialized to your team's specific patterns at single-GPU deployment cost.
For fine-tuning datasets, Qwen3-Coder benefits substantially from training data that includes complete agentic-coding traces — task description, planning, code edits, test outputs, and iterations. Ertas Studio supports these multi-step formats natively, including tool-use traces from CLI agent runs. After training, Ertas Studio exports to GGUF format with full Qwen3-Coder chat template preservation, deploying cleanly via Ollama, llama.cpp, or vLLM with single-click integration into Claude Code, Cline, or Aider via their custom-model configuration.
Use Cases
Qwen3-Coder is the strongest open-weight choice for self-hosted agentic coding agents. Production deployment patterns include AI pair-programming for enterprise codebases (where data sovereignty rules out Claude Code or GitHub Copilot), autonomous PR generation for repetitive change patterns, large-scale refactoring assistance, and codebase-wide code review.
The 480B-A35B with 256K-1M context excels at full-codebase reasoning tasks: architectural review of large systems, security audits across an entire codebase, dependency upgrade impact analysis, and large refactoring planning. These tasks benefit from the model considering the whole codebase simultaneously rather than retrieving and summarizing.
Qwen3-Coder-Next is the practical pick for high-throughput production deployments. Customer-facing coding tools, internal developer assistants, and CI-integrated code review agents all benefit from the 3B-class inference speed combined with strong coding quality. For teams considering self-hosted alternatives to Claude Code or Cursor, Qwen3-Coder-Next is among the most compelling options.
Hardware Requirements
Qwen3-Coder-30B-A3B at Q4_K_M requires approximately 17-18GB of memory, fitting on a 24GB GPU with margin for context. Inference speed is dominated by the 3B active parameter count.
Qwen3-Coder-Next at 80B-A3B at Q4_K_M needs approximately 45GB, fitting on a single 48GB GPU or split across two 24GB GPUs. Despite the 80B total parameter count, inference runs at approximately 3B-class speed.
The 480B-A35B flagship at Q4_K_M requires approximately 270GB of memory, demanding multi-GPU server setups (4x A100 80GB minimum). Active parameter count of 35B determines token generation throughput once loaded.
For fine-tuning in Ertas Studio: 30B-A3B with QLoRA needs 22-28GB VRAM, Qwen3-Coder-Next needs 50-70GB VRAM, and 480B-A35B requires multi-GPU server fine-tuning (200-280GB total VRAM with QLoRA).
Supported Quantizations
Related Resources
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.