Verifying exports

A five-minute smoke test for a downloaded GGUF before you ship it into an app.

A successful Ertas run produces a GGUF that has already passed a small in-pipeline smoke test: the 3 inference samples the Run panel shows after training. That tells you the model trained without breaking the chat template and produces coherent text. It does not tell you the file you downloaded is intact, that the bundled Modelfile works on your target runtime, or that the model still behaves correctly under prompts you actually care about.

This page is the five-minute check between downloading and shipping. Run it once per model before pushing into a release build.

The smoke test

The check is the same regardless of platform: install, load, run five to ten probe prompts, look at the outputs.

Extract the ZIP

Unzip the GGUF bundle. The folder should contain model.gguf, a Modelfile, install.bat, install.sh, and a README.txt. If any are missing, the download was truncated; retry from the Run panel.

Install into Ollama

Make sure Ollama is running. Double-click install.bat on Windows or run bash install.sh on macOS or Linux. The script registers the model with the bundled Modelfile and exits when the model is ready to query, usually within 30 seconds on a modern machine.

Run a chat probe

From a terminal, run ollama run <model-name> (the model name is in README.txt and matches the extracted folder name, lowercased with non-alphanumerics converted to hyphens). Send three or four prompts that match the task you fine-tuned for. Read every response in full.

Run a stress probe

Send one prompt that should be straightforward, one prompt that is slightly off-distribution from the training data, and one prompt that has nothing to do with the training data at all. The model should answer the first two well and gracefully refuse or default-respond on the third.

If all four steps pass, the export is ready to ship. If any fail, the next section is the triage guide.

What "looks right" vs "looks wrong"

Signal	Verdict	What to do
Coherent prose, correct format, on-task answers	Right	Ship it.
Coherent prose but wrong format (missing closing tags, JSON not validating, code blocks not closed)	Dataset or template issue	See Datasets troubleshooting.
Raw template tokens leaking into output (see examples below)	Template mismatch at training or inference	The training data probably had templates embedded inline; rebuild the dataset without them. See JSONL format.
Mostly correct but with stylistic drift (refusals where you trained for answers, or vice versa)	Insufficient training	Increase steps or dataset size. See Iterating.
Gibberish or repetitive tokens (`the the the the...`)	Loss did not converge, or the LoRA was undertrained	Check the loss curve in the Run panel. If it never dropped, the learning rate was too low or the dataset was too small.
Refuses everything	Over-fit on refusal-shaped data, or chat-template misalignment	Inspect the training data for accidental refusal rows.
Ollama errors with "unknown model format" or refuses to load	Ollama version too old, or GGUF file corrupted	Update Ollama (`ollama version`), then retry. If still failing, see "Verifying the file itself" below.

Common raw template tokens to watch for: <|im_start|>, <|user|>, [INST], <start_of_turn>, or their family-specific equivalents appearing literally in the model's response.

Verifying the file itself

If Ollama or another runtime cannot load the GGUF, the file may have been truncated or corrupted in transit. Two quick checks:

Check the file size

Compare the downloaded model.gguf against the expected size shown in the Run panel. If the difference is more than a few KB, the download was incomplete; re-download from Hub. Signed download URLs are time-limited but regenerate on every click of the GGUF button.

Try llama.cpp directly

Build llama.cpp and run ./build/bin/llama-cli -m /path/to/model.gguf -p "Hello." -n 32. If llama-cli loads the model and produces output, the file is intact and the problem is with the runtime (typically: Ollama needs an update). If llama-cli also fails to load, the file is corrupted; re-download.

There is no Ertas-published SHA-256 for the bundled GGUF today; this is on the roadmap. Until it ships, if you need a content hash for change-tracking or supply-chain reasons, compute one yourself after download (sha256sum model.gguf or platform equivalent) and store it alongside your build pipeline.

When to suspect the GGUF vs the dataset vs the base

If the smoke test fails, the order of suspicion is:

The dataset, almost always. Bad format, leaked template tokens, insufficient row count, low diversity. Most "model is broken" cases resolve here. See Dataset quality.
The training config, sometimes. Too few steps, learning rate too low or too high, LoRA rank too small for the task. See Training tips.
The base model, occasionally. The pretraining did not cover what you are asking the fine-tune to do. See Picking a base model.
The GGUF or the quantization, rarely. Q4_K_M's quality loss is consistently under 1% on standard benchmarks; if a fine-tune that looked coherent in the Run panel's 3-sample inference looks broken after export, the dataset is far more often the cause than the quantization. Most cases that do trace to the GGUF turn out to be runtime or template mismatches rather than the file itself.

Loading the same model in LM Studio (which renders chat templates differently from Ollama) is a quick sanity-check that isolates GGUF correctness from runtime quirks. If the model behaves correctly in LM Studio but not in Ollama, the issue is the bundled Modelfile or the Ollama version, not the export.

The biggest single source of post-export surprises is template mismatch between training and inference. Ertas applies the base model's chat template at training time and bundles the matching template into the Modelfile, but if you fine-tuned with templates already embedded in your dataset rows, the trained model has effectively learned to produce raw template tokens, and the bundled Modelfile will wrap them in another template at inference. Rebuild the dataset without the inline templates if you see this pattern. See JSONL format for the rule.

A reproducible probe set

If you fine-tune the same task more than once (which most teams do), the highest-leverage habit is to maintain a small probe set: 10 to 30 hand-written prompts that exercise the corners of the task, kept in a .jsonl or .txt alongside your training data and re-run against every new GGUF. Two minutes per probe round, dramatic visibility into regressions.

A probe set is the working-engineer's substitute for the built-in bulk evaluation suite (coming soon). It pays for itself the third time you iterate on the same model.

What's next

Ship

Deploy the verified GGUF into iOS, Android, desktop, or web.

Iterating

What to change if the smoke test failed.

Handling failures

Triage runs that failed before producing a GGUF.

Evaluating a model

Build a probe set and grade outputs methodically.