Export

    What comes out of a successful run, what's inside the GGUF bundle, and how to verify it before you ship.

    A successful Ertas run produces two artifacts in Hub: a LoRA adapter and (by default) a Q4_K_M quantised GGUF. The GGUF is the file you ship into an app; the LoRA is the source of truth you can re-merge into a different base later. This section covers what's inside each bundle, how quantization shapes the size and quality, and how to sanity-check the export before it lands in front of users.

    The shortest possible summary

    If you read nothing else in this section, read this:

    • The GGUF download is a single ZIP containing model.gguf, an Ollama Modelfile with the right chat template and sampling defaults, install.bat / install.sh scripts, and a README.txt.
    • Ertas quantises every export to Q4_K_M today. It is the practical sweet spot between size and quality. Additional levels (Q5_K_M, Q8_0, Q3_K_M, Q2_K) are on the roadmap.
    • The GGUF is ready for Ollama, llama.cpp, LM Studio, and any other llama.cpp-compatible runner. No additional conversion needed.
    • A Q4_K_M GGUF is roughly a quarter of the original fp16 model size. A Phi-3 mini fine-tune lands around 2 to 3 GB; larger bases scale roughly linearly. See File sizes and formats for the exact numbers.
    • Verify before shipping. The five-minute smoke test in Verifying exports catches the failure modes that survive Ertas's internal checks.

    Start with GGUF overview if the format is new to you, or jump to Quantization if you already know GGUF and want the per-level trade-offs.