Core concepts
The vocabulary you need to use Ertas confidently: base models, LoRA, quantization, GGUF, and the canvas.
If you have done a fine-tune before, skim the headings and skip what you know. If you have not, this page is the one to read before touching the canvas. Every other doc assumes the vocabulary defined here.
The canvas
Studio's main surface is a visual canvas. You compose a fine-tune (or train-from-scratch) job by dropping nodes and connecting them, the same way a sound designer wires synth modules.
The central node is an Action Module. It has two flavours:
- Fine-Tune: starts from an existing base model and teaches it new behaviour with LoRA. Four legs: base model, dataset, training config, LoRA config.
- Train: starts from scratch (no LoRA). Three legs: base model, dataset, training config.
Each leg requires a child node before the module can run. Studio surfaces required-but-missing legs with a red asterisk on the label. Once every leg is connected, the play button becomes active.
The canvas has two modes:
- Build mode: you can drag, connect, and configure nodes. This is where you spend setup time.
- Run mode: the canvas is read-only and the Run panel slides in from the right. You drop back to Build mode by closing the panel or pressing the mode switch in the top toolbar.
You can have many Action Modules on a single canvas. Studio treats each one as an independent recipe and runs them in parallel up to your plan's concurrency limit. See Parallel runs.
Base models
A base model is the pretrained transformer you fine-tune from. Ertas's base-model catalog covers the common families:
- Llama 3 / Llama 4 (Meta) for general instruction-following
- Mistral 7B / Mixtral for high-quality general-purpose
- Phi-3 / Phi-4 (Microsoft) for very small, capable models
- Gemma 3 / Gemma 4 (Google) for high-quality small models
- Qwen 2.5 / Qwen 3 (Alibaba) for multilingual and long-context
- SmolLM, TinyLlama for sub-3B experiments
Every model in the catalog has a minimum_gpu_tier annotation. T4 (16 GB VRAM) handles models under 5B total parameters in 4-bit. Any model 5B or larger requires A10G (24 GB VRAM), including Gemma 4 E2B. The "E" in E2B stands for Effective: the model has 2.3B effective compute parameters but 5.1B total parameters including its Per-Layer Embedding lookup tables, which still need to fit in VRAM during training. That total pushes Gemma 4 E2B past the T4 limit even though its compute footprint at inference time is small. The model picker dims and locks any model your plan cannot run, so you cannot accidentally queue a job that will not start.
Beyond the catalog, you can pull any compatible model directly from a Hugging Face URL. Ertas validates the architecture and reports whether the model is known to fine-tune cleanly. If validation is uncertain, you can still run it, but you take responsibility for the result and credits are not refunded on training failures for unverified architectures.
See the full model catalog for licenses, sizes, and notes per family.
LoRA
LoRA (Low-Rank Adaptation) is the fine-tuning method Ertas uses by default. Instead of updating every parameter in the base model (which would require tens of gigabytes of VRAM and produce a multi-billion-parameter checkpoint), LoRA trains tiny adapter matrices that ride alongside the frozen base. The result is:
- Training fits on a T4 or A10G GPU for models up to 14B.
- Adapters are small (tens of megabytes), so iteration is cheap.
- You can merge an adapter into the base for export, or keep it separate for hot-swapping.
The two parameters you will see most are rank and alpha:
- Rank is the dimensionality of the adapter. Higher rank means more capacity to learn, more VRAM, and a larger adapter file. 16 is a strong default for instruction tuning.
- Alpha is a scaling factor applied to the adapter output. The convention is
alpha = 2 * rank, which is also Studio's default.
You also pick target modules, the layers inside each transformer block that LoRA attaches to. The default set (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj) covers attention and MLP projections and works for nearly every architecture. Trimming this list saves a little memory at the cost of expressiveness.
LoRA is always enabled for Fine-Tune jobs. If you want to update every parameter, use the Train action module instead, but be aware that this is slower, more expensive, and almost never necessary for small models.
Datasets
A dataset is the body of examples your model learns from. Ertas accepts five JSONL shapes:
- Text-only (
textonly): a single training string per row. Good for corpus-style pretraining or stylistic absorption. - Instruction / output (
instruction+output): single-turn directive with a target answer. Easy to author by hand. - Input / output with metadata (
input+output+ optionalmetadata): same single-turn shape as above, naming convention used in many public datasets, with a bookkeeping field for source or tags. - Conversations (
conversationsarray of{from, value}): ShareGPT-style multi-turn chat. Loss is ongptturns only. - Messages (
messagesarray of{role, content}): ChatML / OpenAI-style multi-turn chat. Loss is onassistantturns only.
Datasets live in the Data Craft tab and are validated on upload. You can attach the same dataset to many runs without re-uploading. See JSONL format for the exact schemas and Dataset quality for what makes a dataset train well.
When you import from Hugging Face, Ertas validates that the dataset has training-friendly columns and prompts you to attest you have the right to use it. The first time you attach an HF dataset, Ertas mirrors it into Data Craft so subsequent runs do not re-download.
Quantization and GGUF
A freshly fine-tuned model lives in full precision (fp16 or bf16). For on-device deployment, that is too big and too slow. Ertas's last training step converts your model into a quantised GGUF file by default.
- GGUF is the file format used by llama.cpp, Ollama, LM Studio, and most consumer-grade local model runners.
- Q4_K_M is the only quantization Ertas exports today. It uses 4-bit weights with mixed-precision blocks for the key matrices, giving a small quality loss (typically under 1% on standard benchmarks) and roughly 4x smaller files than fp16. We chose Q4_K_M as the practical sweet spot between size and quality for on-device deployment. Additional levels (Q5_K_M for slightly higher fidelity, Q8_0 for near-fp16 quality, Q3_K_M and Q2_K for the smallest possible files) are on the roadmap.
GGUF conversion adds roughly 12 minutes to the end of a run. You can disable it on the Training Config picker if you only want the raw LoRA adapter (for example, to merge into a different base in code).
See GGUF overview for the export pipeline in detail.
Runs
A run is one execution of an Action Module. Each run carries:
- The full config snapshot (base model, dataset, training config, LoRA config) at the time you pressed play.
- A status:
queued,provisioning,training,completed,failed,retrying, orcancelled. - Live metrics: current epoch, total epochs, loss, throughput, progress percent.
- Logs streamed from the GPU node.
- A credit ledger showing how much the run actually cost.
- Downloadable artifacts on success: the LoRA adapter and (if enabled) the GGUF export.
Runs are the unit of history. You can rerun, cancel, or delete a run, but the configuration that produced it is preserved so you can reproduce or diff against later runs. See Iterating.
Credits
Ertas bills by GPU-minute. The Training Config picker shows your current credit balance and an estimated cost before you press play. You spend credits only while the GPU is attached and training, never while the job is queued or provisioning.
If a run fails on a verified base model from the catalog, credits are refunded automatically. If you bring an unverified Hugging Face model and acknowledge the warning, you take the credit risk yourself. See Credits and usage.