Glossary
Vocabulary used across the docs, with one-line definitions and links to the canonical detailed page for each term.
Every term defined here is also explained in detail somewhere else in the docs; the page link after each definition is the canonical home of the deep explanation. If a term is missing, the FAQ and Concepts pages catch most of what is not here.
A
Action Module. The central node on the Studio canvas. A Fine-Tune Action Module has four legs (base model, dataset, training config, LoRA config); a Train Action Module has three (no LoRA). See Concepts.
Adapter. Short for LoRA adapter; the small file of additional weights produced by a fine-tune that rides alongside the frozen base model. See Concepts: LoRA and File sizes and formats.
Alpha. A LoRA scaling factor applied to the adapter output. Convention is alpha = 2 * rank. See Concepts: LoRA.
ASR. Automatic Speech Recognition. The upstream step that produces a text transcript from audio, before the LLM cleanup pass. See Cookbook: voice transcript cleanup.
B
Base model. The pretrained transformer you fine-tune from. Ertas's catalogue is in Supported models.
Build mode. The canvas mode where you can drag, connect, and configure nodes. Contrasts with Run mode. See Concepts.
C
Canvas. Studio's main visual surface. Nodes are dropped and connected like a modular synth patch. See Concepts.
ChatML. A multi-turn chat format where messages are tagged with <|im_start|>role and <|im_end|> markers. Used by Qwen and others; the messages JSONL schema in Ertas maps to it. See JSONL format.
Chat template. A model-family-specific format for separating system / user / assistant turns at inference time. Ertas writes the right template into the Modelfile automatically. See GGUF overview.
Context window. The maximum number of tokens the model can attend to in one forward pass. Catalogue models range from 4k (Phi-3) to 128k (Llama 3.2, Qwen 2.5) tokens. See Supported models.
Conversations format. The ShareGPT-style multi-turn JSONL schema with from and value keys. See JSONL format.
Credits. Ertas's billing unit. One credit equals one GPU-minute on T4 or A10G (T4 minutes cost less; A10G minutes cost more). See Credits and usage.
D
Data Craft. The Studio tab where datasets live. Upload, validate, preview, and attach to runs from here. See Concepts: Datasets.
Dataset. The body of examples a model learns from. Ertas accepts five JSONL shapes. See JSONL format.
DPO. Direct Preference Optimisation. A training method that uses (prompt, chosen, rejected) preference triples to push past SFT's quality ceiling. Ertas currently supports SFT; DPO is on the roadmap. See SFT vs DPO and Known limitations.
E
Effective parameters. The compute-time parameter count, distinct from total parameters. Relevant for Gemma 4 E2B, which has 2.3B effective compute parameters and 5.1B total when Per-Layer Embedding tables are counted. See Concepts: Base models.
Eval. Short for evaluation; the process of measuring whether a fine-tuned model meets your quality bar. Today this is mostly post-export (run your own probes); the in-app eval suite is on the roadmap. See Evaluating a model and Known limitations.
F
Fine-tuning. Training a pre-existing model on new examples to change its behaviour, as opposed to training from scratch. Ertas uses LoRA-based fine-tuning by default. See Concepts.
FIM. Fill-in-the-Middle. The task shape for code completion: model sees a prefix and a suffix and predicts the middle. See Cookbook: code completion.
G
GGUF. GPT-Generated Unified Format. The file format used by llama.cpp, Ollama, and most consumer-grade local model runners. Ertas's default export. See GGUF overview.
Grad accumulation. A training trick that simulates a larger batch size by accumulating gradients across multiple smaller batches before applying the update. Ertas uses it to fit large effective batches into limited GPU memory. See Training tips.
H
Hub. The Studio tab where trained artifacts (LoRA and GGUF) live, with download links and storage usage. See Storage.
I
Imatrix. Importance-matrix calibration. A llama.cpp feature that improves Q4_K_M quality by using a representative calibration dataset to decide where mixed-precision blocks should land. Ertas uses imatrix server-side. See Quantization.
Inference. Running a trained model to produce outputs (as opposed to training). Ertas does not host inference; the GGUF runs wherever you choose to ship it. See Ship.
Input / output format. The JSONL schema with input, output, and optional metadata keys. Cleanest fit for single-task supervised tuning. See JSONL format.
Instruction tuning. Fine-tuning with examples that take an instruction (and optionally an input) and produce a target response. The most common fine-tuning task. See Instruction tuning.
Instruction / output format. The simplest JSONL schema, with one instruction and one output key per row. See JSONL format.
J
JSONL. JSON Lines. One JSON object per line, no surrounding array. Ertas's canonical dataset format. See JSONL format.
K
K-quant. A family of quantisation methods in llama.cpp (Q2_K, Q3_K, Q4_K, Q5_K, Q6_K) with _S, _M, _L size variants. Q4_K_M is Ertas's default export. See Quantization.
KV cache. Key-Value cache. The runtime memory that stores attention state for past tokens so they do not need to be recomputed on each new token. Scales with sequence length and model size. See Performance tips.
L
llama.cpp. The open-source C++ inference engine that defines the GGUF format and underpins Ollama, LM Studio, and most local runtimes. See llama.cpp versions.
LoRA. Low-Rank Adaptation. A parameter-efficient fine-tuning method that trains small adapter matrices alongside a frozen base model. Ertas's default. See Concepts: LoRA.
M
Messages format. The ChatML / OpenAI-style multi-turn JSONL schema with role and content keys. Loss is computed on assistant turns only. See JSONL format.
Modelfile. Ollama's configuration file. Ertas's GGUF export bundle ships a Modelfile with the base reference, chat template, stop tokens, and sampling defaults already configured. See GGUF overview.
O
OCR. Optical Character Recognition. The upstream step that produces text from an image, before the LLM extraction pass. See Cookbook: structured data extraction.
Ollama. The most popular open-source local-LLM runner. Ertas's GGUF bundle is Ollama-ready out of the box; the install script registers the model with ollama create. See GGUF overview and Ship: desktop.
P
Per-Layer Embedding. The lookup tables Gemma 4 uses in addition to its standard transformer parameters. They contribute to total parameter count but not to compute-time parameters. See Concepts: Base models.
Probe set. A small handpicked collection of prompts (typically 5 to 10) used to spot-check a fine-tune's behaviour before wiring up a full eval. See Verifying exports and every recipe in Cookbook.
Q
Q4_K_M. The 4-bit K-quant Medium quantisation level. Ertas's only export today. Roughly 0.6 bytes per parameter for models 3B and up. See Quantization.
Quantization. Compressing model weights from fp16 / bf16 to a lower precision (4-bit, 5-bit, 8-bit) to reduce file size and memory at inference time. See Quantization.
R
Rank. The dimensionality of a LoRA adapter. Higher rank means more capacity, more VRAM, and a larger adapter. 16 is Ertas's default. See Concepts: LoRA.
Recipe. A saved canvas configuration. In the docs this term is also used informally for a Cookbook entry. See Cookbook.
Run. One execution of an Action Module. Carries a config snapshot, a status, live metrics, logs, and (on success) downloadable artifacts. See Concepts: Runs.
Run panel. The right-side slide-in that opens when you enter Run mode. Shows live run status, logs, and artifacts. See Training.
S
SFT. Supervised Fine-Tuning. Training on (input, target output) pairs, with loss computed on the output. The default training method in Ertas. See SFT vs DPO.
ShareGPT. A widely-used multi-turn conversation format with from and value keys. Ertas's conversations JSONL schema maps to it. See JSONL format.
Sticky note. A canvas annotation. Used today as a placeholder for collaboration features; see Managing projects.
System prompt. A directive given to the model before the user's first turn, typically describing the assistant's persona and constraints. Ertas supports system prompts in both single-turn and multi-turn JSONL schemas. See Instruction tuning.
T
Target modules. The transformer layers a LoRA adapter attaches to. Ertas's default set (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj) covers attention and MLP projections. See Concepts: LoRA.
Temperature. A sampling parameter controlling output randomness. 0.0 is deterministic / greedy; 0.7 to 1.0 is conversational; above 1.0 is creative-but-erratic. See Performance tips.
Text-only format. The simplest JSONL schema, one text string per row. Used for corpus-style pretraining or stylistic absorption. See JSONL format.
Token. The unit of text the model reads and writes. Roughly 4 English characters or 0.75 English words per token. Context windows, sampling caps, and pricing are all in tokens.
Tokenizer. The component that converts text to tokens (and back). Each model family has its own; Ertas carries the right one through to the GGUF and Modelfile automatically.
top_p. A sampling parameter that restricts the next-token candidates to those whose cumulative probability is below p. Common defaults are 0.85 to 0.95. See Performance tips.
W
WebGPU. The browser graphics API that wllama uses for accelerated in-browser inference. Available in Chrome 113+, Edge 113+, Safari 18+, Firefox 141+. See System requirements and Ship: web.
wllama. A WebAssembly + WebGPU library for running GGUF models in a browser. The standard browser inference path. See Ship: web.