Export
What comes out of a successful run, what's inside the GGUF bundle, and how to verify it before you ship.
A successful Ertas run produces two artifacts in Hub: a LoRA adapter and (by default) a Q4_K_M quantised GGUF. The GGUF is the file you ship into an app; the LoRA is the source of truth you can re-merge into a different base later. This section covers what's inside each bundle, how quantization shapes the size and quality, and how to sanity-check the export before it lands in front of users.
GGUF overview
Read the gguf overview guide.
Quantization
Read the quantization guide.
File Sizes And Formats
Read the file sizes and formats guide.
Verifying Exports
Read the verifying exports guide.
The shortest possible summary
If you read nothing else in this section, read this:
- The GGUF download is a single ZIP containing
model.gguf, an OllamaModelfilewith the right chat template and sampling defaults,install.bat/install.shscripts, and aREADME.txt. - Ertas quantises every export to Q4_K_M today. It is the practical sweet spot between size and quality. Additional levels (Q5_K_M, Q8_0, Q3_K_M, Q2_K) are on the roadmap.
- The GGUF is ready for Ollama, llama.cpp, LM Studio, and any other llama.cpp-compatible runner. No additional conversion needed.
- A Q4_K_M GGUF is roughly a quarter of the original fp16 model size. A Phi-3 mini fine-tune lands around 2 to 3 GB; larger bases scale roughly linearly. See File sizes and formats for the exact numbers.
- Verify before shipping. The five-minute smoke test in Verifying exports catches the failure modes that survive Ertas's internal checks.
Start with GGUF overview if the format is new to you, or jump to Quantization if you already know GGUF and want the per-level trade-offs.