Code completion

Fine-tune a 3B-class code model on a niche language or in-house framework so completions inside a creator app or IDE plugin understand syntax that base models botch.

Base code models know Python, JavaScript, TypeScript, Java, C++, Rust, and Go very well. They know everything else somewhere between "passably" and "wrong." A fine-tune for code completion is most useful when your users write in a language or framework that sits in the second bucket: an internal DSL, an in-house framework with strict conventions, or a niche language base models default to the wrong dialect of.

This recipe uses a worked example: imagine Roblox wants to add AI completions for Luau inside Roblox Studio. Luau is Roblox's typed dialect of Lua 5.1 with a different standard library (task.wait instead of os.execute, no io, type-annotated functions, a strict opt-in mode) and 70 million-plus creators editing it monthly. Out of the box, a 3B-class code model will give them Lua 5.1 completions, often with the wrong API surface; a fine-tune teaches it Luau-the-language and Studio-the-environment. Roblox is the anchor because the scale and the language-fit gap make the trade-offs concrete; substitute any vertical with its own scripting layer.

Recipe also fits

The Roblox / Luau example is one of many. The same shape applies to:

Niche-language IDE plugins for Lean, Solidity, Forth, Idris, or other languages base models barely know.
In-house DSL completions for any team with a custom scripting layer used by hundreds of internal developers.
Vertical-language plugins for Salesforce Apex, HashiCorp HCL, SAP ABAP, or PL/SQL, where strong house conventions meet weak base-model coverage.
Domain-specific scripting in scientific tools (Mathematica, R, MATLAB) or shader languages (GLSL, HLSL).
Creator-platform scripting environments like Unity C# patterns, Unreal Blueprint scripts, or Godot GDScript.

If your users edit code in an environment where the base model defaults to the wrong dialect or misses your house conventions, this is the recipe to start with.

When this is the right fit

This recipe is the right fit when:

The target language or framework is underserved by base models. Niche languages (Luau, Solidity dialects, GLSL variants, Lean), in-house DSLs, or older versions of mainstream languages (PL/SQL, COBOL, VB.NET) all qualify. Mainstream-current code (Python 3.12, TypeScript 5.6) does not; the base model is already better than you can fine-tune in this size class.
You have access to a non-trivial corpus of well-written code in the target. A million lines is comfortable; a hundred thousand is workable; ten thousand is hard. Open-source repos, internal codebases, generated examples all count.
You want completions to respect house conventions, not just be syntactically valid. Variable naming, function shape, error-handling idioms, comment style, type-annotation density: the fine-tune learns all of these.
The user surface is an editor pane with a cursor, and completions trigger on idle or explicit invocation. If the surface is chat, this recipe is the wrong shape; see Customer support bot for that pattern.

It is not the right fit when:

You want agentic code editing (multi-file refactors, "fix this test"). A 3B fine-tune is too small for the planning step. Use this recipe for the completion side and a frontier API for the agent side, if at all.
You want completions in a dozen languages. Fine-tunes are most effective when scoped tightly. Train one model per language family.
The target language is well-served by frontier APIs and latency is not a concern. The fine-tune wins on offline, privacy, and per-character cost; if none of those matter for your users, do not bother.

The dataset

Code completion is a fill-in-the-middle (FIM) task. The model sees the code before the cursor (the prefix) and after the cursor (the suffix) and predicts what should sit in between (the middle). The training data has to mirror that shape.

For the Luau scenario, 20,000 rows is a reasonable starting point. Tune up if quality lags or down if authoring at scale is hard. A typical mix of sources:

Source	Share	Notes
Open-source Roblox games (GitHub, GitHub Search)	50% (~10,000)	Well-starred repos filtered for active maintenance and idiomatic style
Roblox Creator Documentation code samples	20% (~4,000)	Authoritative examples for the Studio APIs and patterns the team wants to promote
In-house style-guide examples	15% (~3,000)	Curated examples for conventions the team wants reinforced
Synthetic FIM pairs from existing code	15% (~3,000)	Take canonical scripts and machine-generate FIM cuts at multiple cursor positions

A "row" is one (prefix, suffix, middle) triple. From a single 500-line script you can derive 20 to 50 high-quality FIM rows by choosing different cursor positions.

Generating FIM pairs

The naive approach (pick a random character offset, cut there) produces a lot of low-signal rows because most cursor positions are inside identifiers or in whitespace. Cut at boundary points instead:

Start of a statement
After an opening brace or function keyword (model completes the body)
After a local x = (model completes the expression)
Inside a string literal where the literal is a known canonical value (model completes the string)
After a comment that describes intent (the comment is part of the prefix)

A heuristic that works well: parse the file with the target language's parser, walk the AST, and emit FIM rows at every statement boundary, every function-body opener, and every right-hand side of an assignment. The Luau open-source parser is straightforward to run from a script.

Dataset format

Use the instruction/output schema. Encode the FIM shape into the instruction:

{
  "instruction": "Complete the Luau code. Output only the middle, between the prefix and suffix.\n\nPrefix:\nlocal function teleportPlayer(player, position)\n    if not player or not player.Character then return end\n    local humanoidRoot = player.Character:FindFirstChild('HumanoidRootPart')\n    if not humanoidRoot then return end\n    \n\nSuffix:\nend\n\nMiddle:",
  "output": "humanoidRoot.CFrame = CFrame.new(position)"
}

Three details that matter:

The instruction is the same every row. The fine-tune is teaching the model that "Prefix / Suffix / Middle" means FIM; if the instruction phrasing varies, the model treats it as content. See Instruction tuning.
Strip the leak. If the prefix ends with humanoidRoot.CFrame = the answer is obvious from the prefix alone. Cut the prefix earlier so the model has to think. Drop rows where the middle is trivially deducible.
Keep the suffix. Without a suffix the task becomes "continue this code" rather than "fill this in." That is a different task and produces noticeably worse mid-line completions because the model does not anticipate how the surrounding code constrains the middle.

Multi-line vs single-line middles

About 60% of the dataset should have single-line middles (the cursor is partway through a line). These are the highest-frequency completion in a real editor.

The remaining 40% should be 2-to-15-line middles (whole function bodies, conditional blocks, loops). These teach the model to scope multi-line completions sensibly. Anything longer than 15 lines is too speculative for a completion model and tends to drift; route those to a chat-style task instead.

Do not include rows where the middle is >30 lines. The model will start producing 30-line completions in production, which is overwhelming in an editor. Cap it at 15 in the dataset and the model self-limits.

The base model

Pick Qwen 2.5 Coder 3B from the Ertas catalogue. The reasoning:

It is code-specialised. The base ships with FIM training already baked in, so the fine-tune is reinforcing a known capability rather than teaching a new one.
At Q4_K_M, the GGUF is about 2.1 GB. That fits a Roblox Studio install footprint comfortably.
The 32k context window is generous for repository-aware completion (you can stuff multiple related files into the prefix, see Integration).

GPU tier: Qwen 2.5 Coder 3B trains on a T4, fitting the Free plan. The paid-plan upgrade is documented two paragraphs down (Qwen 2.5 Coder 7B on A10G).

If you need to go smaller (older creator machines, web-based code editor target), Qwen 2.5 Coder 1.5B at Q4_K_M (~1 GB) is the next step down. Expect noticeably worse multi-line completions; single-line completions stay usable.

If you need to go larger (you found that 3B misses too many house-convention details), Qwen 2.5 Coder 7B at Q4_K_M (~4.5 GB) is the next step up. The quality gain is real; the latency cost is also real (1.5 to 3 seconds versus 0.5 to 1.5 for the 3B). For an editor completion that fires every keystroke pause, that latency difference is the difference between "useful" and "annoying."

Training config

For 20,000 FIM rows, start with:

Setting	Value	Why
Schema	`instruction/output`	FIM encoded into the instruction
Epochs	2	Code overfits quickly; two epochs is usually enough
Learning rate	1e-4	Code-specialised bases want a gentler LR than generic instruction tuning
LoRA rank	16	Captures the language idioms and the API conventions
LoRA alpha	32	2x rank
Batch size	4	Code sequences are shorter on average than prose
Grad accumulation	4	Effective batch 16
Warmup	10% of steps	Slightly longer warmup helps stabilise on code
Max sequence length	4096	Long enough for whole-function FIM with surrounding context

Wall-clock time and credit cost depend on the GPU tier and dataset size. Ertas's Training Config picker shows an estimate before you press play; see Credits and usage for the current rates.

A common mistake: using a higher learning rate (3e-4) because "more is better." Code models punish high LRs more than instruction-tuned chat models because the syntactic precision required is tight; one digit of overcooked weights and the model starts producing slightly-wrong identifiers. Stay conservative.

Integration: Studio plugin via local Ollama

Roblox Studio supports plugins written in Luau, including HTTP-out plugins. The integration calls a local Ollama instance over HTTP. The user installs the Ertas-shipped Ollama bundle once; the plugin talks to http://localhost:11434 thereafter.

-- LuauCompletePlugin.lua (excerpt)
local HttpService = game:GetService("HttpService")

local function complete(prefix: string, suffix: string): string?
    local instruction = string.format(
        "Complete the Luau code. Output only the middle, between the prefix and suffix.\n\nPrefix:\n%s\n\nSuffix:\n%s\n\nMiddle:",
        prefix, suffix
    )

    local body = HttpService:JSONEncode({
        model = "roblox-luau-completer",
        prompt = instruction,
        stream = false,
        options = {
            temperature = 0.1,
            top_p = 0.95,
            num_predict = 120,
            stop = {"\n\nPrefix:", "\n\nSuffix:"},
        },
    })

    local ok, response = pcall(function()
        return HttpService:PostAsync(
            "http://localhost:11434/api/generate",
            body,
            Enum.HttpContentType.ApplicationJson
        )
    end)

    if not ok then return nil end
    local data = HttpService:JSONDecode(response)
    return data.response
end

Key choices:

Temperature 0.1: code completion wants near-deterministic output. The same prefix and suffix should produce essentially the same middle.
num_predict: 120: caps the completion at roughly 6 to 12 lines. The dataset caps middles at 15 lines, so this leaves a small buffer.
stop tokens: protect against the model continuing into a synthetic next-FIM-row. If the model sees a "Prefix:" or "Suffix:" marker in its output, stop immediately.
pcall wrapper: Studio plugins survive crashes silently; an HTTP failure should not interrupt the user's edit.

Picking what to send

The naive approach sends the file the user is editing as the prefix and an empty suffix. A better approach sends roughly 2,000 characters before the cursor as the prefix, 500 characters after as the suffix, and prefixes the prefix with the top-of-file local declarations so the model can see imported modules.

local function buildContext(script: LuaSourceContainer, cursorPos: number): (string, string)
    local source = script.Source
    local headerEnd = source:find("\nlocal [A-Z]", 1, false) or 1

    local rawPrefix = source:sub(1, cursorPos)
    local suffix = source:sub(cursorPos + 1, cursorPos + 500)

    if cursorPos > 2000 then
        local header = source:sub(1, headerEnd - 1)
        local tail = source:sub(cursorPos - 1800, cursorPos)
        return header .. "\n\n-- ...\n\n" .. tail, suffix
    end
    return rawPrefix, suffix
end

The header trick is the easiest single quality improvement: most Luau completions need to know which Roblox services and modules are imported, and those declarations are always at the top of the file. Without the trick, completions occasionally invent service names; with it, they cite the right ones.

Probe set

Ten prompts that exercise the language coverage. Run them by pasting the prefix and suffix into your harness; the column on the right is what good looks like.

#	Scenario	Pass criteria
1	Mid-statement: `humanoid.Health = humanoid.MaxHealth` (suffix: `\nend`)	Completes to `- takeDamage` or similar arithmetic; not a new statement
2	Function body: `local function isAlive(humanoid)\n` (suffix: `end`)	One-line body returning `humanoid.Health > 0`; not multi-line ceremony
3	Studio API call: `local part = workspace:FindFirstChild(`	Closes with a name argument and recovery code; uses `workspace`, not `game.Workspace`
4	Wait pattern: `task.wait(`	Completes with a numeric argument; does NOT suggest `wait()` (Lua 5.1 style)
5	Type annotation: `local function move(player: Player, position:`	Completes with `Vector3)` or `CFrame)`, not generic types
6	Module require: `local Module = require(script.Parent.`	Completes with a plausible module name from the surrounding context
7	Event hookup: `player.CharacterAdded:Connect(function(character)\n`	Connects a one-shot handler; uses `character`, not `Character`
8	Error-handling: `local ok, result = pcall(function()\n`	Single-line wrapped call inside; `pcall` not `xpcall` for a basic case
9	Strict-mode header: `--!strict\n\n` (cursor on a new line)	Starts with `local function ...` not `function ...` (strict prefers local)
10	API-version drift: `local players = game:GetService(`	Closes with `"Players")`, not the legacy `game.Players` global

Most of these should pass cleanly on a properly trained model. Probe 4 (waiting in Lua 5.1 vs Luau syntax) is the canonical pass/fail: a base model gets it wrong; a properly fine-tuned model gets it right. Probe 10 (service-getting modern API) is the second-most-useful tell.

Limits

No project-wide awareness. The model sees the current file and the small header you stitch in. It does not know about cross-file types, calls into other scripts, or what is in ServerScriptService. For multi-file refactors, escalate to a frontier model with proper retrieval.
No chat memory. Completions are stateless. If the user accepts a completion and edits it, the next completion does not know what the user edited. That is fine for completion; it is wrong if you try to use this model for chat-style help.
Style drift on rare patterns. Patterns that appear less than 100 times in the dataset will not be reliably learned. If a house convention is rare in real code, you may need to seed the dataset with extra synthetic examples of it.
Latency spikes on long contexts. The 2,000-character context is a good default; bumping it to 8,000 to fit more of the file roughly triples first-token latency and starts to feel slow. Tune for the user's hardware floor.
No symbol resolution. The model does not know what types exist in your project. It will sometimes fabricate type names that look plausible. Treat completions as suggestions, not authoritative; the user's eyes are the final filter.

What's next

Cookbook

Back to the index. This is the last recipe.

Picking a base model

When to step up to 7B or down to 1.5B.

Ship: desktop

Desktop integration patterns; the same Ollama pattern applies to most code-editor plugins.

Verifying exports

Smoke test the model on a handful of FIM prompts before shipping.