What is Embedding?
A dense vector representation of a token, word, or passage in a continuous mathematical space where semantic similarity corresponds to geometric proximity.
Definition
An embedding is a learned mapping that converts a discrete symbol — such as a word, subword token, or entire text passage — into a fixed-length vector of real numbers (typically 768 to 4,096 dimensions for modern LLMs). These vectors live in a continuous space where geometric relationships encode semantic meaning: words with similar meanings cluster together, and analogies manifest as consistent vector offsets (e.g., "king" - "man" + "woman" ≈ "queen").
In transformer-based language models, the embedding layer is the very first component: it takes each token ID from the tokenizer and looks up its corresponding vector in a learned embedding table. These initial embeddings are then refined by successive transformer layers that incorporate context from surrounding tokens. The output of the final layer is a contextualized embedding — a vector that represents not just the token in isolation but its meaning within the specific sentence or passage.
Beyond their role inside language models, embeddings are widely used as standalone tools for semantic search, retrieval-augmented generation (RAG), clustering, and classification. Dedicated embedding models (like those from OpenAI, Cohere, or open-source alternatives like BGE and E5) are optimized to produce embeddings where cosine similarity reliably measures semantic relatedness. Organizations use embedding-based vector databases to find relevant documents, compare user queries to knowledge bases, and power recommendation systems.
Why It Matters
Embeddings are the mathematical bridge between human language and machine computation. Without them, language models would have no way to represent or reason about meaning. For practitioners, understanding embeddings is key to building effective RAG pipelines, search systems, and classification workflows. The quality of embeddings also determines how well a model can generalize: better embeddings capture more nuanced semantic relationships, leading to more accurate and contextually appropriate outputs across a wider range of inputs.
How It Works
The embedding layer is essentially a lookup table with V rows (one per vocabulary token) and D columns (the embedding dimension). When a token with ID 42 enters the model, the layer returns row 42 — a D-dimensional vector. During pre-training, these vectors are initialized randomly and then updated through backpropagation so that tokens appearing in similar contexts develop similar vectors. In fine-tuning with methods like LoRA, the embedding table is typically frozen (not updated), since the pre-trained embeddings already capture rich semantic information. For standalone embedding models, the entire model is trained (or fine-tuned) with a contrastive loss that explicitly pushes similar texts closer together and dissimilar texts apart in the embedding space.
Example Use Case
A customer support platform uses a fine-tuned embedding model to power semantic search over 50,000 help articles. When a customer types "my payment didn't go through," the system embeds the query into a 768-dimensional vector and performs a nearest-neighbor search in a vector database. The top 5 results are passed as context to a fine-tuned LLM, which synthesizes a personalized answer — even though none of the retrieved articles contain the exact phrase the customer used.
Key Takeaways
- Embeddings convert discrete tokens into continuous vectors where semantic similarity is geometric proximity.
- The embedding layer is the first component of any transformer-based language model.
- Contextualized embeddings (from transformer outputs) capture word meaning within a specific context.
- Standalone embedding models power semantic search, RAG, clustering, and classification workflows.
- Fine-tuning typically freezes the embedding layer, relying on pre-trained semantic representations.
How Ertas Helps
While Ertas Studio focuses primarily on generative fine-tuning, the embeddings learned by Ertas-tuned models are integral to their domain-specific performance. Models fine-tuned in Ertas develop richer internal representations for domain vocabulary, improving both generation quality and the model's ability to serve as a backbone for downstream embedding-based workflows like semantic search and document classification.
Related Resources
Attention
Context Window
Inference
Tokenizer
Transformer
Getting Started with Ertas: Fine-Tune and Deploy Custom AI Models
Privacy-Conscious AI Development: Fine-Tune in the Cloud, Run on Your Terms
Hugging Face
Ollama
Ertas for Healthcare
Ertas for SaaS Product Teams
Ertas for Customer Support
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.