Gemma 3 for Mobile: Fine-Tuning and On-Device Deployment

Google's Gemma 3 represents a significant step forward from Gemma 2. The 1B model is practical for mobile classification tasks, and the 4B model offers reasoning capability that competes with larger models from other families.

For mobile developers already in the Google ecosystem (Android, Firebase, Google Cloud), Gemma is a natural choice with good tooling support.

Gemma 3 Model Lineup for Mobile

Model	Parameters	GGUF Q4 Size	RAM Needed	Mobile Viability
Gemma 3 1B	1B	~600MB	~800MB	Excellent (4GB+ devices)
Gemma 3 4B	4B	~2.3GB	~3GB	Good (8GB+ devices)
Gemma 3 12B	12B	~7GB	~9GB	Not viable for mobile
Gemma 3 27B	27B	~15GB	~18GB	Not viable for mobile

The 1B and 4B models are the mobile-relevant sizes. The 4B is slightly larger than the typical 3B target but runs within budget on 8GB devices.

Gemma 3 vs Gemma 2

Improvement	Gemma 2	Gemma 3
Instruction following (IFEval)	51.2 (2B)	54.2 (1B)
General knowledge (MMLU)	51.3 (2B)	46.8 (1B), 67.2 (4B)
Multilingual support	20 languages	35+ languages
Context window (1B)	8K	32K
Context window (4B)	8K	128K

Gemma 3's 4B model is a standout. It approaches the capability of Llama 3.2's 8B (which is not mobile-viable) while fitting on flagship mobile devices.

When Gemma 3 Is the Right Choice

Google ecosystem integration: If you already use Firebase, Android Studio, and Google Cloud, Gemma has the smoothest tooling path. Google provides Keras integration, Vertex AI fine-tuning, and Android-specific documentation.

4B quality on flagships: If your app targets flagship devices and you need stronger reasoning than a 3B model provides, Gemma 3 4B fills a gap. It sits between the typical 3B and 7B categories.

Multilingual requirements: Gemma 3's 35+ language support is broader than Llama 3.2 (though narrower than Qwen). For European and South Asian language apps, Gemma is a strong choice.

Fine-Tuning Gemma 3

Training Data Format

Gemma uses a specific chat template with <start_of_turn> and <end_of_turn> tokens:

<start_of_turn>user
What's the return policy for electronics?<end_of_turn>
<start_of_turn>model
Electronics purchased within the last 30 days can be returned with receipt for a full refund. Items must be in original packaging.<end_of_turn>

For fine-tuning, structure your data as conversations following this template. Most training frameworks (Hugging Face, Axolotl, Unsloth) handle the template automatically when you specify Gemma as the model type.

LoRA Configuration

Parameter	1B	4B
LoRA rank (r)	16-32	16-64
LoRA alpha	32-64	32-128
Learning rate	2e-4	1e-4
Epochs	3-5	2-4
Target modules	q_proj, v_proj, k_proj, o_proj	Same
Adapter size	30-80MB	50-150MB

Training Data Requirements

The same guidelines apply as other model families:

Task	Minimum Examples	Recommended
Classification	200	500-1,000
Q&A	300	1,000-2,000
Chat	500	2,000-5,000

Quality After Fine-Tuning

Gemma 3 responds well to fine-tuning. The 1B model jumps from general-purpose mediocrity to domain-specific competence with as few as 500 examples. The 4B model fine-tunes to quality levels that rival prompted GPT-4o on narrow tasks.

Expected accuracy ranges (domain-specific classification):

1B base: 65-72%
1B fine-tuned (500 examples): 88-92%
4B base: 75-80%
4B fine-tuned (500 examples): 92-96%

GGUF Export

Gemma 3 models convert to GGUF format using the standard llama.cpp conversion tools. The process:

Fine-tune with LoRA
Merge the LoRA adapter into the base weights
Convert to GGUF using convert_hf_to_gguf.py
Quantize to Q4_K_M with llama-quantize

Platforms like Ertas automate this pipeline: select Gemma 3 as the base model, upload training data, train, and export directly to GGUF at your desired quantization level.

Deployment on iOS and Android

Gemma 3 GGUF models run on llama.cpp identically to Llama or any other GGUF model. The deployment process is the same:

iOS: Load the GGUF via llama.cpp with Metal acceleration. No Gemma-specific configuration needed.

Android: Load via llama.android with Vulkan GPU acceleration. Same API as any other GGUF model.

The advantage of GGUF as a universal format is that your deployment infrastructure works with any model family. Switching from Llama to Gemma (or vice versa) requires only swapping the model file.

Performance on Mobile Devices

Gemma 3 1B (Q4_K_M, ~600MB)

Device	Tokens/sec	Memory
iPhone 16 Pro	38-48	~800MB
iPhone 15	26-34	~800MB
Galaxy S24 (Vulkan)	38-48	~800MB
Mid-range Android	18-25	~800MB

Gemma 3 4B (Q4_K_M, ~2.3GB)

Device	Tokens/sec	Memory
iPhone 16 Pro	16-22	~3.0GB
iPhone 15 Pro	14-20	~3.0GB
Galaxy S24 (Vulkan)	18-24	~3.0GB
Galaxy S25 (Vulkan)	20-28	~3.0GB

The 4B model is slightly slower than a 3B model but the difference is small. On flagship devices, it is still well above the 10 tok/s usability threshold.

Gemma vs Gemini Nano

Google offers both Gemma (open model for self-deployment) and Gemini Nano (on-device via Android AICore). They serve different purposes:

Factor	Gemma 3 (GGUF)	Gemini Nano
Custom fine-tuning	Yes	No
Device coverage	Any 4GB+ device	Pixel 8+, Galaxy S24+ only
Model control	Full	None
Tasks	Any text generation	Limited pre-defined tasks
Platform	iOS and Android	Android only
Cost	Free (on-device)	Free (on-device)

If you need custom AI behavior, domain-specific knowledge, or cross-platform deployment, Gemma via GGUF is the right path. Gemini Nano is only appropriate for pre-defined tasks on a narrow device set.

Licensing

Gemma 3 uses the Gemma Terms of Use:

Commercial use: Allowed
Fine-tuning and modification: Allowed
Distribution: Allowed
No MAU threshold (unlike Llama's 700M limit)
Cannot use outputs to train models that compete with Gemini

The license is practical for most mobile app use cases. The restriction on competitive model training is unlikely to affect mobile developers.

Ship AI that runs on your users' devices.

Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

View early bird pricing or join the waitlist →

Gemma 3 for Mobile: Fine-Tuning and On-Device Deployment

Gemma 3 Model Lineup for Mobile

Gemma 3 vs Gemma 2

When Gemma 3 Is the Right Choice

Fine-Tuning Gemma 3

Training Data Format

LoRA Configuration

Training Data Requirements

Quality After Fine-Tuning

GGUF Export

Deployment on iOS and Android

Performance on Mobile Devices

Gemma 3 1B (Q4_K_M, ~600MB)

Gemma 3 4B (Q4_K_M, ~2.3GB)

Gemma vs Gemini Nano

Licensing

Ship AI that runs on your users' devices.

Ship AI that runs on your users' devices.

Keep reading

On-Device AI Model Size Guide: 1B vs 3B vs 7B for Mobile

Llama 3.2 for Mobile Apps: Fine-Tuning and On-Device Deployment

How to Add AI to Your Mobile App: A Developer's Decision Guide