Fine-Tune OLMo with Ertas

Allen Institute for AI's fully open language model family in 1B, 7B, and 13B sizes, with completely open training data, code, weights, and evaluation — setting the standard for reproducible AI research.

1B7B13BAllen AI

Overview

OLMo (Open Language Model) is a family of language models developed by the Allen Institute for AI (AI2) with a mission of full openness. Unlike most open-weight models that release only the final model weights, OLMo provides everything: the complete training data (Dolma dataset), training code, intermediate checkpoints saved throughout training, evaluation code, and detailed training logs. This level of transparency is unprecedented and makes OLMo uniquely valuable for AI research.

The OLMo 2 family includes models at 1B, 7B, and 13B parameters. The 7B and 13B models were trained on approximately 5 trillion tokens from the Dolma dataset, a carefully curated collection of web text, academic papers, code, books, and encyclopedic content. Despite their moderate sizes, OLMo 2 models achieve competitive performance with other models in their size classes, demonstrating that full transparency need not compromise model quality.

Architecturally, OLMo 2 uses a standard dense transformer decoder with improvements including RoPE positional embeddings, SwiGLU activations, and grouped-query attention. The models support context windows up to 4K tokens in the base configuration, extendable through fine-tuning with RoPE scaling.

All OLMo artifacts are released under the Apache 2.0 license. AI2's commitment to openness extends beyond the license — they provide detailed technical reports, training recipe documentation, and active community support to help researchers reproduce and build upon their work.

Key Features

Full training transparency is OLMo's defining feature. The release includes not just final model weights but also the complete Dolma training dataset (approximately 3 trillion tokens of deduplicated, filtered text), the full training codebase, hundreds of intermediate checkpoints saved during training, comprehensive evaluation suites, and detailed training logs including loss curves and hardware utilization data. This enables researchers to study training dynamics, reproduce results, and conduct experiments that are impossible with weights-only releases.

The Dolma dataset itself is a significant contribution. AI2 documented every step of their data pipeline: data sources, filtering criteria, deduplication methods, quality scoring approaches, and content-type classification. This transparency allows researchers to understand exactly what the model learned from and to create improved versions of the dataset.

OLMo 2 demonstrates competitive performance despite its fully open approach. The 13B model, in particular, performs competitively with Llama 2 13B and other models in its size class on standard benchmarks, showing that transparency and quality are not mutually exclusive. The OLMo Instruct variants, fine-tuned with Tulu 2, provide capable instruction-following behavior.

Fine-Tuning with Ertas

OLMo models are excellent fine-tuning targets in Ertas Studio, combining accessible model sizes with a fully transparent training pedigree. The 1B model requires only 3-5GB VRAM with QLoRA, the 7B needs 8-12GB, and the 13B needs 10-14GB — all within consumer GPU capabilities. The small sizes enable rapid experimentation and iteration.

OLMo's full openness provides a unique advantage for fine-tuning: because you know exactly what the base model was trained on, you can design your fine-tuning dataset to complement the base training rather than conflicting with it. If Dolma underrepresents your specific domain, you can fill that gap precisely with targeted fine-tuning data.

After fine-tuning in Ertas Studio, export to GGUF format for local deployment. OLMo models work well with all standard quantization formats. A Q4_K_M quantized OLMo 7B is approximately 4.3GB — small enough to distribute as part of research tools, educational software, or domain-specific applications. Deploy through Ollama or llama.cpp for standard inference.

Use Cases

OLMo is the model of choice for AI research that requires understanding of training dynamics, data influence, and model behavior at a fundamental level. Researchers studying topics like memorization, data attribution, emergent capabilities, scaling laws, and training instability benefit immensely from OLMo's complete training artifacts.

For organizations with strict requirements around training data provenance, OLMo offers unmatched transparency. Every document in the training set is documented and traceable, and the data pipeline is fully auditable. This makes OLMo suitable for regulated industries where model explainability and data governance are critical requirements.

OLMo also serves well as a teaching tool for AI and machine learning education. Students and practitioners can study the complete lifecycle of a modern LLM — from data curation through training to evaluation — using real production-quality artifacts rather than simplified toy examples. Universities and research labs use OLMo as a platform for hands-on LLM coursework.

Hardware Requirements

OLMo 1B at Q4_K_M requires approximately 700MB of RAM, running on virtually any computing device. The 7B model at Q4_K_M needs about 4.3GB, and the 13B needs about 7.8GB. These modest requirements make OLMo accessible on consumer laptops, desktop GPUs, and even some mobile devices at the smallest size.

At Q8_0, the requirements are approximately 1.2GB (1B), 7.5GB (7B), and 14GB (13B). Full FP16 inference requires approximately 2.2GB (1B), 14.5GB (7B), and 26GB (13B). The 7B and 13B models run comfortably on consumer GPUs like the RTX 4070 Ti 12GB and RTX 4090 24GB respectively.

For fine-tuning in Ertas Studio, the 1B model needs 3-5GB VRAM with QLoRA, the 7B needs 8-12GB, and the 13B needs 10-14GB. The small sizes make OLMo ideal for researchers and students who need to run experiments on limited hardware budgets. Multiple experiments can be run on a single consumer GPU within a day.

Supported Quantizations

Q4_0Q4_K_MQ5_K_MQ6_KQ8_0F16

Related Resources

Integration

llama.cpp

Integration

LM Studio

Integration

Ollama

Ship AI that runs on your users' devices.

Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

View early bird pricing or join the waitlist →