Ertas for Named Entity Recognition

Train NER models that identify and extract custom entity types — people, organizations, products, medical terms, legal references — specific to your industry and data.

The Challenge

Named entity recognition is the foundation of structured information extraction from unstructured text. Generic NER models can identify common entities like person names, locations, and organizations, but they fall apart on domain-specific entity types. A medical NER system needs to recognize drug names, dosage units, anatomical terms, and ICD codes. A legal system needs to identify case citations, statute references, party names, and jurisdictions. A financial system needs to extract ticker symbols, monetary amounts with currency, regulatory body names, and specific financial instruments.

Building custom NER models has traditionally required deep NLP expertise, complex annotation tooling, and significant engineering effort to deploy and maintain. Teams spend months on annotation guidelines, inter-annotator agreement measurement, and model architecture selection before they even begin training. The result is often a fragile pipeline that breaks when encountering entity formats it was not explicitly trained on — a new drug naming convention, an unusual citation format, or a foreign organization name that does not match expected patterns.

The Solution

Ertas simplifies custom NER by leveraging the broad language understanding of large language models and focusing fine-tuning on entity extraction patterns specific to your domain. Rather than training a traditional NER model from scratch, teams fine-tune a generative model in Ertas Studio on examples of text with annotated entities in a structured output format. The model learns to identify and extract entities by understanding the semantic context around them, not just pattern matching on surface forms.

This approach is dramatically more robust than traditional NER. Because the base model already understands language at a deep level, the fine-tuned model generalizes well to entity formats it has not seen explicitly — recognizing a new drug name based on its syntactic context, or identifying an unusual organization name because it appears in a role that organizations typically fill. Ertas Studio accepts training data as JSONL with input text and structured entity output, making annotation straightforward. The trained model can be deployed locally via Ollama for batch processing or through Ertas Cloud for real-time extraction APIs.

Key Features

Studio

Custom Entity Type Training

Define arbitrary entity types and train the model to extract them from text. Studio supports nested entities, overlapping spans, and relational extraction in a single fine-tuning run.

Hub

Pre-Trained Language Models

Start from models on Hub with strong language understanding that generalizes to unseen entity formats, reducing the annotation volume needed for high-accuracy extraction.

Cloud

Extraction API Endpoints

Deploy your NER model through Cloud as a REST API that accepts text and returns structured entity annotations with confidence scores, spans, and entity types.

Vault

Sensitive Entity Protection

Vault ensures that training data containing sensitive entities (patient names, financial accounts, personal identifiers) is encrypted and access-controlled throughout the pipeline.

Example Workflow

A pharmaceutical company needs to extract drug names, dosage information, adverse events, and patient demographics from clinical trial reports for pharmacovigilance monitoring. The NLP team annotates 15,000 report excerpts with their custom entity schema (12 entity types) and uploads the JSONL dataset to Ertas Vault. In Ertas Studio, they fine-tune a 7B model that takes report text as input and outputs structured JSON with all identified entities, their types, and text spans. The model is deployed as a batch processing endpoint that runs nightly over newly received trial reports. Extracted entities are loaded into the pharmacovigilance database, where safety analysts review flagged adverse events. The fine-tuned model achieves 94% F1 on entity extraction compared to 62% from a generic NER model — with the biggest improvements on domain-specific entities like drug compound names and medical device identifiers that the generic model completely missed.