Code Llama
CodeMeta
Meta's specialized code generation model family built on Llama 2, available in 7B, 13B, 34B, and 70B sizes with variants optimized for code completion, instruction following, and Python development.
Open-source models you can fine-tune with Ertas.
Meta
Meta's specialized code generation model family built on Llama 2, available in 7B, 13B, 34B, and 70B sizes with variants optimized for code completion, instruction following, and Python development.
Cohere
Cohere's enterprise-focused model family in 35B and 104B sizes, purpose-built for retrieval-augmented generation (RAG) with native citation support, tool use, and multilingual capability across 10+ languages.
DeepSeek
DeepSeek's dedicated reasoning model trained with reinforcement learning to perform extended chain-of-thought reasoning, available in distilled sizes from 1.5B to 70B and the full 671B mixture-of-experts architecture.
DeepSeek
DeepSeek's flagship 671-billion parameter mixture-of-experts model with 37B active parameters per token, delivering frontier-level general performance at remarkably efficient inference costs.
TII Abu Dhabi
The Technology Innovation Institute's open-weight model family in 7B, 40B, and 180B sizes, trained on the massive RefinedWeb dataset and pioneering the use of high-quality filtered web data for LLM training.
Google's latest open-weight model family built on Gemini technology, available in 1B, 4B, 12B, and 27B sizes with native multimodal vision-language capabilities and a 128K token context window.
Shanghai AI Lab
Shanghai AI Laboratory's multilingual model series in 7B and 20B sizes, featuring strong Chinese-English capabilities, long-context support, and excellent performance on reasoning and tool-use benchmarks.
Meta
Meta's third-generation open-weight large language model family, delivering state-of-the-art performance across reasoning, code generation, and multilingual tasks in 8B, 70B, and 405B parameter configurations.
Meta
Meta's fourth-generation open-weight model family featuring a mixture-of-experts architecture, with Scout (109B total, 17B active) for efficient deployment and Maverick (400B total, 17B active) for high-capability tasks.
Mistral AI
Mistral AI's foundational 7-billion parameter model that punches well above its weight class, featuring sliding window attention and grouped-query attention for efficient long-context inference.
Mistral AI
Mistral AI's mixture-of-experts models that route each token through 2 of 8 expert networks, delivering 70B-class performance at the cost of a 13B dense model in the 8x7B variant.
Intel
Intel's 7-billion parameter conversational model fine-tuned from Mistral 7B, optimized for Intel hardware and demonstrating strong chat performance with particular focus on CPU inference efficiency.
Allen AI
Allen Institute for AI's fully open language model family in 1B, 7B, and 13B sizes, with completely open training data, code, weights, and evaluation — setting the standard for reproducible AI research.
OpenChat
A 7-billion parameter model fine-tuned from Mistral 7B using Conditioned Reinforcement Learning Fine-Tuning (C-RLFT), achieving GPT-3.5-level performance through a novel mixed-quality data training approach.
Microsoft
Microsoft's family of compact yet capable language models available in 3.8B, 7B, and 14B sizes, designed for on-device and edge deployment with surprisingly strong performance on reasoning and instruction-following tasks.
Microsoft
Microsoft's 14-billion parameter small language model that emphasizes reasoning quality through synthetic data training, achieving performance competitive with models several times its size on math and logic benchmarks.
Alibaba
Alibaba's comprehensive open-weight model family spanning seven sizes from 0.5B to 72B parameters, with particularly strong multilingual and coding capabilities across 29+ languages.
Alibaba
Alibaba's latest-generation model family featuring both dense and mixture-of-experts architectures, with sizes from 0.6B to 235B and built-in hybrid thinking modes for adaptive reasoning depth.
HuggingFace
HuggingFace's family of ultra-compact language models in 135M, 360M, and 1.7B sizes, trained on the high-quality Cosmopedia synthetic dataset and designed for on-device AI applications with minimal resource requirements.
Upstage
Upstage's 10.7-billion parameter model created through depth up-scaling, a novel technique that merges and extends a pretrained model's layers to achieve larger-model quality at efficient inference cost.
BigCode / HuggingFace
An open-access code generation model trained on permissively licensed source code, available in 3B, 7B, and 15B sizes with transparent training data governance and strong multi-language programming support.
TinyLlama Team
A compact 1.1-billion parameter model trained on 3 trillion tokens — far more data than typical for its size — delivering surprisingly capable performance for edge deployment, mobile applications, and resource-constrained environments.
LMSYS
LMSYS's instruction-tuned model family in 7B, 13B, and 33B sizes, fine-tuned from Llama on ShareGPT conversations and widely recognized for pioneering open-source chatbot evaluation methodology.
01.AI
01.AI's bilingual Chinese-English model family available in 6B, 9B, and 34B sizes, known for strong performance on both Chinese and English benchmarks with excellent instruction-following capabilities.
HuggingFace
HuggingFace's 7-billion parameter model fine-tuned from Mistral 7B using distilled direct preference optimization (dDPO), demonstrating that alignment techniques can produce highly capable chat models without human preference data.