File sizes and formats

Estimate the on-disk size of an Ertas export before you train, plus what you actually download for each catalog model.

The two artifacts Ertas produces from a successful run have very different storage profiles. The GGUF bundle is hundreds of MB to a few GB, depending on the base model, and is what you ship into an app. The LoRA adapter is tens of MB regardless of base size, and is what you keep as your source of truth.

This page lays out the size math, gives concrete estimates for the most common catalog models at Q4_K_M, and explains the storage-quota implications. For the upstream story of why GGUF is the shipped format, see GGUF overview. For the size-vs-quality trade-offs the quantization step controls, see Quantization.

What you download

A completed run produces two downloadable bundles in Hub. Both arrive as ZIP files via signed, time-limited download URLs.

Bundle	Typical size	Contents	When you use it
GGUF	~0.5 to 9 GB	`model.gguf`, `Modelfile`, `install.bat` / `install.sh`, `README.txt`	Ship into an iOS, Android, desktop, or web app.
LoRA	25 to 50 MB	`adapter_model.safetensors`, `adapter_config.json`, `chat_template.jinja`, tokenizer files, `README.md` model card	Re-merge into a different base in code; re-quantise locally to a non-Q4_K_M level; keep as source of truth.

The two bundles live in Hub independently. Deleting a run record from the Runs tab does not delete the Hub entry; you delete artifacts from Hub directly. See Storage for the full life-cycle picture.

The Q4_K_M sizing rule of thumb

For models in the 3B-and-up range, Q4_K_M lands at roughly 0.6 bytes per parameter, including the K-quant family's per-block scale overhead:

gguf_size ≈ 0.6 * parameter_count   (models 3B and up)

Below 3B parameters, the embedding and unembedding tables (which scale with vocabulary size, not compute parameters) become a larger share of the total file. A 1B-class model with a large vocabulary like Gemma 3 or Llama 3.2 lands closer to 0.8 bytes per parameter; a sub-1B Qwen 2.5 lands closer to 1.0. The "0.6 rule" is the right anchor for the 3B-to-14B models that dominate Ertas fine-tuning, but it underestimates small models.

Other variables that shift the file size by a few percent:

Which precision-sensitive matrices Q4_K_M keeps at higher precision (architecture-dependent).
Bundled tokenizer size (Llama and Mistral are compact; Qwen's multilingual and Gemma's 256K-token vocab are larger).
A small fixed GGUF metadata header.

The ZIP wrapper around model.gguf adds only a few MB for the Modelfile, install scripts, and README.txt. The bundle ZIP is essentially the same size as the model.gguf it contains.

Sizes for common catalog models

The table below lists Q4_K_M model.gguf sizes for commonly used catalog models. The figures are sourced from public Q4_K_M GGUFs of the corresponding base on Hugging Face (Unsloth, the official model uploader, or bartowski as a fallback), and from Ertas's own export of Gemma 4 E2B. Your fine-tuned exports land within a few MB of these numbers.

Base model	Parameters	`model.gguf` size	GPU tier to train
Qwen 2.5 0.5B Instruct	0.5B	0.49 GB	T4
TinyLlama 1.1B Chat	1.1B	0.67 GB	T4
Gemma 3 1B IT	1.0B	0.81 GB	T4
Llama 3.2 1B Instruct	1.0B	0.81 GB	T4
Qwen 2.5 1.5B Instruct	1.5B	1.12 GB	T4
Llama 3.2 3B Instruct	3.0B	2.02 GB	T4
Qwen 2.5 3B Instruct	3.0B	2.10 GB	T4
Phi-3 mini 4k Instruct	3.8B	2.39 GB	T4
Gemma 3 4B IT	4.0B	2.49 GB	T4
Gemma 4 E2B	5.1B total	3.19 GB	A10G
Mistral 7B Instruct	7.0B	4.37 GB	A10G
Qwen 2.5 7B Instruct	7.0B	4.68 GB	A10G
Llama 3.1 8B Instruct	8.0B	4.92 GB	A10G
Qwen 2.5 14B Instruct	14.0B	8.99 GB	A10G

A few notes on the table:

The "GPU tier to train" column is the gating constraint, not the file-size constraint. Free-plan accounts can train any T4 model on the list (training-time VRAM is a different question than inference-time file size). Models 5B and larger, including Gemma 4 E2B (whose 5.1B total parameter count comes from Per-Layer Embedding lookup tables in addition to its 2.3B effective compute parameters), require A10G and a paid plan.
The Quickstart bundle for Phi-3 mini lands at roughly 2 to 3 GB end-to-end including the Ollama scripts, the README, and ZIP overhead. The 2.39 GB above is the raw model.gguf; the ZIP is a fraction of a GB larger.
Small models do not get smaller proportionally. Notice that the 0.5B and 1B rows hover around 0.5 to 0.8 GB, much larger than 0.6 * parameter_count would suggest. The embeddings dominate at this scale. If you are choosing a base for file-size reasons, picking a 1B over a 3B usually saves less space than you expect.
For models not in this table, the models index carries the full catalog with licenses, notes, and (where measured) GGUF sizes.

Coming soon: mixture-of-experts (MoE) base models. Mixtral 8x7B and similar MoE architectures are not yet trainable in Ertas; the catalog covers dense transformers today. When MoE support ships, the Q4_K_M GGUFs will be in the 25 to 30 GB range for Mixtral 8x7B (47B total parameters, 13B active per token), so plan for paid-plan A10G training and Hub storage well above the Free tier.

LoRA adapter sizes

The LoRA adapter ZIP is dominated by two things: the adapter weights and the bundled tokenizer.

Adapter weights (adapter_model.safetensors) are typically 10 to 30 MB at the Ertas default of rank 16, scaling roughly linearly with rank and the number of target modules. A rank 64 adapter with the full default target-module set is closer to 60 to 100 MB.
Tokenizer files (tokenizer.json, tokenizer_config.json) are 1 to 15 MB depending on family.
Chat template and config files together are under 100 KB.

The 25 to 50 MB total range covers nearly every default-config run. Bumping LoRA rank above 32 or expanding target modules can push the total over 100 MB, but that is the exception.

LoRA adapters are small enough that keeping every one you train indefinitely is a sensible default. The storage cost is negligible compared to the GGUFs, and the LoRA is the artifact you re-merge from when you want to try a different quantization level, swap in a newer base, or audit what changed between training runs.

Storage-quota math

The Free plan gives you 5 GB of model-artifacts storage in Hub. Builder, Pro, and Business plans grant more (see the pricing page for the current per-plan numbers).

Some quick math to make the 5 GB figure intuitive:

What you keep	Storage per run	Approximately fits in 5 GB
1B-class GGUF + LoRA	~0.8 GB	5 to 6 complete runs
3B-class GGUF + LoRA	~2.1 GB	2 complete runs
Phi-3 mini GGUF + LoRA	~2.4 GB	2 complete runs
7B-class GGUF + LoRA	~4.5 GB	1 complete run, with room for several LoRAs
14B-class GGUF + LoRA	~9.0 GB	exceeds Free; needs a paid plan
LoRA-only retention	~0.04 GB	over 100 LoRAs

If you only need the LoRA for archival and want room for the next GGUF, delete the GGUF from Hub after downloading. You can rerun the same config with Convert to GGUF enabled later if you want the GGUF back.

For iteration-heavy work, turn Convert to GGUF off in the Training Config picker while you are still sweeping hyperparameters. Each disabled conversion saves both the conversion credits and the 0.5 to 9 GB of Hub storage. Once you settle on a winning config, run one final pass with conversion on. See Credits and usage for the broader cost-saving guide.

What's next

Verifying exports

Sanity-check a GGUF before shipping.

Quantization

Size-vs-quality trade-offs across levels.

Storage

Managing Hub storage, quotas, and deletion.

Ship

Embed the GGUF into iOS, Android, desktop, or web.