Fine-Tune SOLAR with Ertas

Upstage's 10.7-billion parameter model created through depth up-scaling, a novel technique that merges and extends a pretrained model's layers to achieve larger-model quality at efficient inference cost.

10.7BUpstage

Overview

SOLAR 10.7B is a large language model developed by Upstage, a South Korean AI company. Released in December 2023, SOLAR introduced an innovative model creation technique called depth up-scaling (DUS), which produces a larger, more capable model by intelligently duplicating and merging layers from a smaller pretrained model. Starting from a Llama 2-based architecture, Upstage used DUS to create a 10.7B parameter model that outperformed many existing 13B models and competed with some 30B+ models on key benchmarks.

The depth up-scaling approach works by taking a pretrained model, duplicating a subset of its layers, and then performing continued pretraining on the expanded model. This allows the new model to inherit the knowledge from the original pretrained weights while gaining additional capacity from the extra layers. The result is a model that trains faster and achieves higher quality than training a 10.7B model from scratch.

SOLAR 10.7B uses a dense transformer architecture with 48 layers, a hidden dimension of 4096, and 32 attention heads. It supports grouped-query attention for efficient inference and uses a context window of 4K tokens, extendable through RoPE scaling. The model uses the Llama tokenizer with a 32K vocabulary.

The instruction-tuned variant (SOLAR 10.7B Instruct) was trained using a combination of supervised fine-tuning and direct preference optimization (DPO), demonstrating strong instruction-following, conversational ability, and reasoning skills. SOLAR is released under the Apache 2.0 license for full commercial use.

Key Features

Depth up-scaling (DUS) is SOLAR's pioneering contribution to the model development community. The technique demonstrates that new, larger models can be created efficiently from existing pretrained models by duplicating layers and continuing training, rather than training from scratch. This approach significantly reduces the compute cost and time required to produce a capable model at a target size, and the technique has since influenced other model scaling strategies.

SOLAR 10.7B occupies an interesting niche in the model size landscape — it sits between the popular 7B and 13B tiers. This 10.7B size provides a meaningful quality improvement over 7B models while remaining more efficient than 13B models in terms of memory and inference speed. For applications where 7B quality is insufficient but 13B resources are a stretch, SOLAR offers an attractive middle ground.

The DPO-trained instruction variant demonstrates particularly strong performance on Korean language tasks in addition to English, reflecting Upstage's focus on the Korean market. This makes SOLAR a notable option for Korean-English bilingual applications, though it is fundamentally a general-purpose model with broad language support.

Fine-Tuning with Ertas

SOLAR 10.7B is a convenient model to fine-tune in Ertas Studio due to its moderate size. QLoRA fine-tuning requires approximately 8-12GB VRAM, well within the capacity of consumer GPUs like the RTX 4070 Ti 12GB, RTX 4080 16GB, or RTX 4090 24GB. The model's depth up-scaled architecture means it has more layers than a typical 10B model (48 vs. the usual 32), providing more potential LoRA insertion points for fine-grained adaptation.

In Ertas Studio, select SOLAR 10.7B as your base model, upload your dataset in JSONL or CSV format, and configure your LoRA parameters. The model responds well to LoRA ranks of 16-64 and learning rates around 1e-4 to 3e-4. Training on 10,000 examples typically completes in 1-3 hours on a single GPU, making it practical for iterative development.

After fine-tuning, Ertas Studio exports to GGUF format. The 10.7B model at Q4_K_M produces a file of approximately 6.5GB — very manageable for local deployment. Deploy through Ollama, llama.cpp, or LM Studio for immediate use. The slightly-above-7B size means SOLAR fine-tuned models offer noticeably better quality than 7B alternatives while remaining highly portable.

Use Cases

SOLAR 10.7B is well-positioned for applications where 7B models fall slightly short but 13B+ models are too resource-intensive. Conversational AI, content generation, customer support automation, and document summarization all benefit from the quality uplift that SOLAR's additional parameters provide. The model is particularly effective for Korean language applications, making it a strong choice for businesses operating in South Korea.

The model's strong instruction-following capabilities make it suitable for structured output generation: JSON extraction, form filling, data classification, and template-based content creation. Fine-tuned SOLAR models can serve as reliable data processing engines in automated workflows.

SOLAR is also valuable for educational and research contexts exploring model scaling. The depth up-scaling technique opens possibilities for creating custom model sizes optimized for specific deployment constraints. Researchers can study the effects of layer duplication and continued training on model behavior, knowledge retention, and capability scaling.

Hardware Requirements

SOLAR 10.7B at Q4_K_M quantization requires approximately 6.5GB of RAM, comfortable on most systems with 8-16GB RAM and GPUs with 8GB+ VRAM. At Q8_0, the requirement is approximately 11.5GB, fitting on 16GB GPUs and 16GB+ RAM systems. Full FP16 inference requires approximately 21.5GB VRAM, achievable on RTX 4090 24GB or A5000 24GB.

Inference speed on consumer hardware is excellent. On an RTX 4090 with Q4_K_M quantization, expect 45-60 tokens per second for generation. On Apple M2 Pro with 16GB unified memory, expect 12-18 tokens per second. CPU inference on modern hardware with Q4_K_M typically yields 5-10 tokens per second.

For fine-tuning with QLoRA in Ertas Studio, 8-12GB VRAM is recommended. Full LoRA (without quantization) requires approximately 16-18GB VRAM. The model's moderate size allows for reasonable batch sizes even on consumer GPUs, enabling efficient training.

Supported Quantizations

Q4_0Q4_K_MQ5_K_MQ6_KQ8_0F16

Related Resources

Integration

llama.cpp

Integration

LM Studio

Integration

Ollama

Ship AI that runs on your users' devices.

Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

View early bird pricing or join the waitlist →