AI Models

    Open-source models you can fine-tune with Ertas.

    Code Llama

    Code

    Meta

    Meta's specialized code generation model family built on Llama 2, available in 7B, 13B, 34B, and 70B sizes with variants optimized for code completion, instruction following, and Python development.

    7B13B34B

    Command R

    General

    Cohere

    Cohere's enterprise-focused model family in 35B and 104B sizes, purpose-built for retrieval-augmented generation (RAG) with native citation support, tool use, and multilingual capability across 10+ languages.

    35B104B

    DeepSeek-R1

    Reasoning

    DeepSeek

    DeepSeek's dedicated reasoning model trained with reinforcement learning to perform extended chain-of-thought reasoning, available in distilled sizes from 1.5B to 70B and the full 671B mixture-of-experts architecture.

    1.5B7B8B

    DeepSeek-V3

    General

    DeepSeek

    DeepSeek's flagship 671-billion parameter mixture-of-experts model with 37B active parameters per token, delivering frontier-level general performance at remarkably efficient inference costs.

    671B (37B active)

    Falcon

    General

    TII Abu Dhabi

    The Technology Innovation Institute's open-weight model family in 7B, 40B, and 180B sizes, trained on the massive RefinedWeb dataset and pioneering the use of high-quality filtered web data for LLM training.

    7B40B180B

    Gemma 3

    General

    Google

    Google's latest open-weight model family built on Gemini technology, available in 1B, 4B, 12B, and 27B sizes with native multimodal vision-language capabilities and a 128K token context window.

    1B4B12B

    InternLM

    Multilingual

    Shanghai AI Lab

    Shanghai AI Laboratory's multilingual model series in 7B and 20B sizes, featuring strong Chinese-English capabilities, long-context support, and excellent performance on reasoning and tool-use benchmarks.

    7B20B

    Llama 3

    General

    Meta

    Meta's third-generation open-weight large language model family, delivering state-of-the-art performance across reasoning, code generation, and multilingual tasks in 8B, 70B, and 405B parameter configurations.

    8B70B405B

    Llama 4

    General

    Meta

    Meta's fourth-generation open-weight model family featuring a mixture-of-experts architecture, with Scout (109B total, 17B active) for efficient deployment and Maverick (400B total, 17B active) for high-capability tasks.

    Scout 109B (17B active)Maverick 400B (17B active)

    Mistral 7B

    General

    Mistral AI

    Mistral AI's foundational 7-billion parameter model that punches well above its weight class, featuring sliding window attention and grouped-query attention for efficient long-context inference.

    7B

    Mixtral

    General

    Mistral AI

    Mistral AI's mixture-of-experts models that route each token through 2 of 8 expert networks, delivering 70B-class performance at the cost of a 13B dense model in the 8x7B variant.

    8x7B8x22B

    Neural Chat

    General

    Intel

    Intel's 7-billion parameter conversational model fine-tuned from Mistral 7B, optimized for Intel hardware and demonstrating strong chat performance with particular focus on CPU inference efficiency.

    7B

    OLMo

    General

    Allen AI

    Allen Institute for AI's fully open language model family in 1B, 7B, and 13B sizes, with completely open training data, code, weights, and evaluation — setting the standard for reproducible AI research.

    1B7B13B

    OpenChat

    General

    OpenChat

    A 7-billion parameter model fine-tuned from Mistral 7B using Conditioned Reinforcement Learning Fine-Tuning (C-RLFT), achieving GPT-3.5-level performance through a novel mixed-quality data training approach.

    7B

    Phi-3

    Small

    Microsoft

    Microsoft's family of compact yet capable language models available in 3.8B, 7B, and 14B sizes, designed for on-device and edge deployment with surprisingly strong performance on reasoning and instruction-following tasks.

    3.8B7B14B

    Phi-4

    Small

    Microsoft

    Microsoft's 14-billion parameter small language model that emphasizes reasoning quality through synthetic data training, achieving performance competitive with models several times its size on math and logic benchmarks.

    14B

    Qwen 2.5

    Multilingual

    Alibaba

    Alibaba's comprehensive open-weight model family spanning seven sizes from 0.5B to 72B parameters, with particularly strong multilingual and coding capabilities across 29+ languages.

    0.5B1.5B3B

    Qwen 3

    Multilingual

    Alibaba

    Alibaba's latest-generation model family featuring both dense and mixture-of-experts architectures, with sizes from 0.6B to 235B and built-in hybrid thinking modes for adaptive reasoning depth.

    0.6B1.7B4B

    SmolLM

    Small

    HuggingFace

    HuggingFace's family of ultra-compact language models in 135M, 360M, and 1.7B sizes, trained on the high-quality Cosmopedia synthetic dataset and designed for on-device AI applications with minimal resource requirements.

    135M360M1.7B

    SOLAR

    General

    Upstage

    Upstage's 10.7-billion parameter model created through depth up-scaling, a novel technique that merges and extends a pretrained model's layers to achieve larger-model quality at efficient inference cost.

    10.7B

    StarCoder

    Code

    BigCode / HuggingFace

    An open-access code generation model trained on permissively licensed source code, available in 3B, 7B, and 15B sizes with transparent training data governance and strong multi-language programming support.

    3B7B15B

    TinyLlama

    Small

    TinyLlama Team

    A compact 1.1-billion parameter model trained on 3 trillion tokens — far more data than typical for its size — delivering surprisingly capable performance for edge deployment, mobile applications, and resource-constrained environments.

    1.1B

    Vicuna

    General

    LMSYS

    LMSYS's instruction-tuned model family in 7B, 13B, and 33B sizes, fine-tuned from Llama on ShareGPT conversations and widely recognized for pioneering open-source chatbot evaluation methodology.

    7B13B33B

    Yi

    Multilingual

    01.AI

    01.AI's bilingual Chinese-English model family available in 6B, 9B, and 34B sizes, known for strong performance on both Chinese and English benchmarks with excellent instruction-following capabilities.

    6B9B34B

    Zephyr

    General

    HuggingFace

    HuggingFace's 7-billion parameter model fine-tuned from Mistral 7B using distilled direct preference optimization (dDPO), demonstrating that alignment techniques can produce highly capable chat models without human preference data.

    7B