What is Hyperparameter?

    A configuration value set before training begins that controls the learning process itself, as opposed to model parameters which are learned during training.

    Definition

    A hyperparameter is any configuration variable that governs the training process but is not learned from the data. Unlike model parameters (weights and biases that are updated through backpropagation), hyperparameters are set by the practitioner before training starts and remain fixed throughout the training run. They control how the model learns rather than what it learns.

    Common hyperparameters in LLM fine-tuning include learning rate (how aggressively weights are updated), batch size (how many examples are processed before each update), number of epochs (how many times the model sees the full dataset), weight decay (regularization to prevent overfitting), warmup steps (gradual learning rate increase at the start of training), and LoRA-specific settings like rank, alpha, and target modules. Each hyperparameter affects training dynamics and final model quality.

    Hyperparameter selection is both a science and an art. While principled approaches like grid search, random search, and Bayesian optimization exist, practical LLM fine-tuning often relies on established heuristics. For example, learning rates between 1e-5 and 5e-5 work well for most fine-tuning tasks, LoRA ranks of 8-64 cover most use cases, and training for 1-3 epochs prevents overfitting on typical dataset sizes. These heuristics save enormous amounts of compute compared to exhaustive search.

    Why It Matters

    Hyperparameter choices can make the difference between a model that converges to excellent performance and one that fails to learn, overfits, or produces incoherent outputs. A learning rate that is too high causes training instability and divergence; too low, and the model barely changes from the base model. A batch size that is too small produces noisy gradients; too large, and the model converges to sharp minima that generalize poorly.

    For teams without deep ML expertise, hyperparameter selection is often the biggest obstacle to successful fine-tuning. The interaction effects between hyperparameters — learning rate and batch size are coupled, LoRA rank and alpha must be balanced, warmup steps depend on dataset size — make manual tuning difficult without experience or automated tools.

    How It Works

    Hyperparameters are specified in a training configuration before the training loop begins. During training, they modulate the optimization process at each step. The learning rate multiplies the gradient to determine the magnitude of each weight update. The batch size determines how many training examples contribute to each gradient estimate. Regularization hyperparameters like weight decay add penalty terms to the loss function.

    Hyperparameter tuning evaluates multiple configurations to find the best combination. Grid search evaluates all combinations of a predefined set of values — thorough but exponentially expensive. Random search samples random combinations and is often more efficient. Bayesian optimization uses a probabilistic model of the hyperparameter-performance landscape to intelligently select the next configuration to try. Population-based training evolves hyperparameter schedules during training, adapting them as training progresses.

    Example Use Case

    A team fine-tunes a 7B model and initially uses a learning rate of 2e-4 (too high), resulting in a loss curve that diverges after 100 steps. They reduce it to 5e-5 and see stable convergence but poor final performance. After testing learning rates of 1e-5, 2e-5, and 3e-5 with warmup ratios of 0.03 and 0.1, they find that 2e-5 with 0.03 warmup produces the best validation metrics — a process that took 6 training runs but yielded a 15% improvement over their initial attempt.

    Key Takeaways

    • Hyperparameters control the training process and are set before training, unlike learned model parameters.
    • Key LLM fine-tuning hyperparameters include learning rate, batch size, epochs, and LoRA rank.
    • Incorrect hyperparameters can cause training failure, overfitting, or poor model quality.
    • Established heuristics for LLM fine-tuning reduce the need for exhaustive hyperparameter search.
    • Interaction effects between hyperparameters make tuning complex without experience or automated tools.

    How Ertas Helps

    Ertas Studio provides sensible hyperparameter defaults for each base model and training configuration, while exposing advanced controls for experienced users. The visual interface makes it easy to adjust and compare hyperparameter settings across training runs.

    Related Resources

    Ship AI that runs on your users' devices.

    Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.