Fine-Tune Code Llama with Ertas

Meta's specialized code generation model family built on Llama 2, available in 7B, 13B, 34B, and 70B sizes with variants optimized for code completion, instruction following, and Python development.

7B13B34B70BMeta

Overview

Code Llama is Meta's family of code-specialized large language models, released in August 2023. Built by further training Llama 2 on code-heavy datasets, Code Llama comes in four sizes (7B, 13B, 34B, and 70B) and three variants per size: the base Code Llama for code completion, Code Llama Instruct for instruction-following code tasks, and Code Llama Python for Python-specific development.

The models were trained on approximately 500 billion tokens of predominantly code data, including public code repositories, code-related discussions, and documentation. This extensive code-focused training produces models that understand programming concepts deeply — from syntax and semantics to design patterns, algorithmic complexity, and best practices across dozens of programming languages.

Code Llama supports infilling (fill-in-the-middle) capability, where the model can generate code to fill a gap between a prefix and suffix. This is essential for IDE integration, where developers need completions that fit naturally within existing code context. The 7B and 13B models support a 16K context window, while the 34B and 70B models extend to 100K tokens through RoPE frequency scaling.

All Code Llama models are released under the Llama 2 Community License, permitting commercial use. While newer models like Qwen 2.5 Coder and DeepSeek Coder have since matched or exceeded Code Llama on some benchmarks, Code Llama remains widely deployed and benefits from a mature ecosystem of tools and integrations.

Key Features

Fill-in-the-middle (FIM) capability is Code Llama's most distinctive feature for practical code development. Unlike standard left-to-right generation, FIM allows the model to generate code that connects a prefix to a suffix, producing completions that are contextually appropriate on both sides. This is crucial for IDE integrations like code completion, automated refactoring, and hole-filling tasks where the surrounding code context constrains the solution.

The Python-specialized variant (Code Llama Python) was further trained on an additional 100 billion tokens of Python-specific data, making it particularly strong for Python development tasks. It outperforms the base Code Llama model on Python benchmarks by a significant margin while maintaining reasonable performance on other languages.

Code Llama demonstrates strong long-context capabilities, especially in the 34B and 70B sizes. The 100K token context window enables processing of entire codebases, large files, and multi-file contexts — essential for real-world code tasks that require understanding of project structure, import hierarchies, and cross-file dependencies.

Fine-Tuning with Ertas

Code Llama is one of the most rewarding models to fine-tune in Ertas Studio for code-related applications. The 7B and 13B variants are ideal starting points, requiring 8-12GB and 10-16GB VRAM respectively with QLoRA. Fine-tuning on your organization's internal code repositories, coding standards, and API documentation creates a model that generates code aligned with your team's conventions and frameworks.

For code-focused fine-tuning, Ertas Studio supports specialized dataset formats: instruction-response pairs for code generation tasks, prefix-suffix-middle triples for fill-in-the-middle training, and code-review pairs for automated review applications. Upload your dataset in JSONL format with the appropriate fields, and the platform handles tokenization and chat template formatting.

The 34B model offers an excellent quality step-up for organizations that need higher code quality, requiring approximately 20-24GB VRAM with QLoRA. After fine-tuning, export to GGUF and integrate with your development workflow via Ollama (for API-based access) or llama.cpp (for direct embedding in development tools). Many teams use fine-tuned Code Llama models as the backend for custom VS Code extensions and JetBrains plugins.

Use Cases

Code Llama's primary use case is as an intelligent code assistant: generating code from natural language descriptions, completing partial implementations, explaining existing code, and suggesting improvements. The fill-in-the-middle capability makes it particularly effective for IDE integration, where it can provide contextually aware completions that account for both preceding and following code.

Fine-tuned Code Llama models excel as internal development assistants for organizations. By training on proprietary codebases, internal libraries, and coding guidelines, the model can generate code that follows team conventions, uses internal APIs correctly, and adheres to organizational coding standards. This is especially valuable for large teams where code consistency is important.

Code Llama is also well-suited for code review automation, documentation generation, test case creation, and legacy code modernization. The instruction-tuned variants can explain complex code to junior developers, identify potential bugs, and suggest optimizations. The Python variant is particularly popular in data science teams for generating data processing pipelines, visualization code, and ML training scripts.

Hardware Requirements

Code Llama 7B at Q4_K_M requires approximately 4.4GB of RAM, running on virtually any modern development machine. The 13B needs about 7.8GB, the 34B about 20GB, and the 70B about 40GB at Q4_K_M. For development use, the 7B and 13B models provide instant responses on consumer hardware, making them ideal for IDE integration.

Full FP16 inference requires approximately 14GB (7B), 26GB (13B), 68GB (34B), and 140GB (70B). For development team deployments, the 34B model on an A6000 48GB provides an excellent balance of quality and speed, typically generating 25-35 tokens per second.

For fine-tuning in Ertas Studio, the 7B model needs 8-12GB VRAM, the 13B needs 10-16GB, the 34B needs 20-24GB, and the 70B needs 40-48GB with QLoRA. The 7B and 13B models are the most popular for fine-tuning due to their fast training times and low resource requirements, enabling rapid iteration on code-specific datasets.

Supported Quantizations

Q4_0Q4_K_MQ5_K_MQ6_KQ8_0F16

Related Resources

Integration

llama.cpp

Integration

LM Studio

Integration

Ollama

Ship AI that runs on your users' devices.

Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

View early bird pricing or join the waitlist →