The strongest open-weight models with minimal refusal training — well-suited to legitimate use cases like security research, red-team evaluation, mature creative writing, and educational discussion of sensitive topics where mainstream models' over-refusal is an obstacle.
By TraitUpdated 2026-04-305 picks
Introduction
Mainstream open-weight instruction-tuned models — Llama Instruct, Qwen Instruct, Phi Instruct — apply safety alignment training during their post-training pipeline. This is appropriate for general-purpose consumer applications, but it creates real obstacles for legitimate use cases that the alignment training doesn't anticipate: security research and red-team evaluation, CTF training environments, fiction with mature themes, historical and educational content involving sensitive topics, and legitimate analytical work that crosses into ambiguous territory.
This ranking covers open-weight models that have either explicitly minimal refusal training (Hermes 4) or are widely used as bases for community fine-tunes that strip the alignment layer (Llama 3 + Dolphin and similar). The goal is not to enable harmful content — production deployments still need product-level safety controls — but to identify models where the legitimate use cases blocked by aggressive refusal training are practically accessible.
Hermes 4 from Nous Research is the clearest pick for legitimate use cases blocked by mainstream safety training. The model is explicitly 'neutrally aligned' — Nous deliberately avoided heavy-handed RLHF refusal training, producing a fine-tune that follows instructions without the over-refusal patterns common in other contemporary releases. Built on Llama 3.1 base with Atropos RL post-training using ~1,000 task-specific verifiers, Hermes 4 also delivers strong reasoning capability beyond its alignment posture. For security research, red-team evaluation, mature creative writing, and educational content involving sensitive topics, Hermes 4 is the standout choice.
Strengths
Explicitly neutrally-aligned — no heavy-handed refusal training
OpenChat is a community-aligned fine-tune that deliberately avoids the over-refusal patterns of base instruction-tuned models. While not as recently maintained as Hermes 4, OpenChat remains widely deployed for use cases where standard Llama, Mistral, or Qwen Instruct variants refuse legitimate requests. The fine-tuning methodology emphasizes following instructions without imposing additional alignment constraints beyond basic safety.
Strengths
Community-aligned fine-tune with reduced refusal patterns
Apache 2.0 license — fully commercial
Mature deployment ecosystem and stable production behavior
Lower hardware requirements than Hermes 4 (7B variants available)
Trade-offs
Less actively maintained vs. Hermes 4
Behind 2026 frontier on reasoning benchmarks
Fewer alignment tools for production safety integration
Cooperation on edge requests: Better than Llama Instruct
Mistral has historically used lighter-weight alignment training than US-based labs, producing models that engage with content others reject more readily. Mistral Small 4 continues this pattern — its instruction-tuned behavior is more cooperative on edge-case requests than Llama 3 Instruct or comparable. Combined with Apache 2.0 licensing, EU sovereignty positioning, and the 6B active parameter MoE architecture, Mistral Small 4 is a strong choice for use cases where European deployment matters and over-refusal is an obstacle.
Strengths
Lighter-weight alignment training than US-based models
Apache 2.0 license — no commercial restrictions
EU-headquartered developer with data sovereignty positioning
6B active parameter inference economics
Trade-offs
Not as explicitly neutrally-aligned as Hermes 4
Some refusal patterns remain for high-risk requests
Chinese-lab models, including the Qwen family, generally use lighter-weight refusal training than US-based alternatives. Qwen 3.6 follows instructions more readily on edge-case requests while maintaining strong overall capability. Apache 2.0 licensing combined with the dense 27B variant's single-GPU deployment makes Qwen 3.6 particularly accessible. For most use cases requiring less aggressive refusal training, Qwen 3.6 is a credible default choice that doesn't require committing to specialized fine-tunes.
Strengths
Lighter-weight refusal training than US-based labs
Apache 2.0 license — fully commercial
Dense 27B variant deploys on a single 24GB GPU
Native multilingual capability across 119 languages
Trade-offs
Some content filtering for politically-sensitive topics specific to Chinese context
Base alignment: Standard (use community fine-tunes)
Llama 3 itself uses standard safety alignment, but it serves as the base for many community uncensored fine-tunes — most notably the Dolphin series from Eric Hartford / cognitivecomputations. These fine-tunes remove the safety alignment layer while preserving Llama 3's underlying capability. For teams who specifically want a Dolphin-style or similarly-aligned model, Llama 3 is the relevant base to start from. Hermes 4 is generally the better choice for new deployments, but Llama 3 + community fine-tunes remains a credible path for teams already invested in the Llama ecosystem.
Strengths
Wide ecosystem of community uncensored fine-tunes (Dolphin, etc.)
Massive deployment ecosystem and tooling support
Multiple parameter scales (8B, 70B, 405B) for different deployment targets
Trade-offs
Base Llama 3 Instruct has standard refusal training
Requires choosing and validating a community fine-tune for actual uncensoring
Llama Community License usage caps and attribution requirements
How We Chose
We evaluated models on three factors: how the model handles edge-case requests in red-team evaluation (does it follow instructions or refuse?), how strong the underlying capability is (an uncensored but weak model is rarely useful), and how deployable the model is for legitimate commercial use cases. We weighted models with explicit neutral-alignment positioning (like Hermes 4) above community fine-tunes that strip alignment from base models, since the former are typically more thoroughly engineered.
Bottom Line
Hermes 4 is the standout choice — explicitly engineered for legitimate use cases blocked by aggressive refusal training, with strong reasoning capability beyond its alignment posture. For teams investing in long-term deployments where neutral alignment matters, Hermes 4 is the recommended default. Mistral Small 4 and Qwen 3.6 are credible alternatives with lighter-weight base alignment that may be sufficient for many use cases. Community fine-tunes of Llama 3 (Dolphin family) remain valid for teams already in the Llama ecosystem. As always, the right choice depends on your specific use case and deployment context — consider whether a product-level safety layer plus a less-aligned model is a better fit than an aligned model that refuses legitimate requests.