Sentiment Analysis Dataset Template
Template for building datasets that train AI models to classify text sentiment across customer reviews, social media, and survey responses.
ClassificationOverview
Sentiment analysis datasets train AI models to identify the emotional tone and opinion expressed in text — whether a customer review is positive, negative, or neutral, whether a social media post expresses satisfaction or frustration, and what specific aspects of a product or service are praised or criticized. This is one of the most mature and widely deployed NLP tasks, with applications spanning customer feedback analysis, brand monitoring, market research, and product development prioritization.
Modern sentiment analysis goes beyond simple positive/negative classification. Aspect-based sentiment analysis (ABSA) identifies sentiment toward specific features or aspects of a product — a restaurant review might express positive sentiment about food quality but negative sentiment about service speed. Multi-dimensional sentiment captures not just polarity but also intensity (slightly positive vs. extremely positive) and emotional categories (anger, joy, disappointment, surprise). Training data should reflect the level of granularity your application requires.
The domain specificity of sentiment expression is a critical consideration. The word "sick" means very different things in a medical context versus casual social media. "Aggressive" might be negative when describing customer service but positive when describing a sports car's styling. Training data must be drawn from the same domain where the model will be deployed, or must include enough cross-domain examples to teach the model domain-specific sentiment patterns.
Dataset Schema
interface SentimentExample {
text: string;
sentiment: "positive" | "negative" | "neutral" | "mixed";
confidence: number; // 0.0 - 1.0
aspects?: {
aspect: string; // e.g., "battery_life", "customer_service"
sentiment: "positive" | "negative" | "neutral";
snippet: string; // Text span supporting the label
}[];
metadata: {
source: string;
domain: string;
language: string;
word_count: number;
};
}Sample Data
[
{
"text": "Absolutely love this laptop. The battery easily lasts 12 hours of real work, and the keyboard feel is the best I've used on any ultrabook. My only gripe is the webcam quality — it's noticeably grainy in video calls. For the price point though, this is an incredible value.",
"sentiment": "positive",
"confidence": 0.88,
"aspects": [
{"aspect": "battery_life", "sentiment": "positive", "snippet": "battery easily lasts 12 hours of real work"},
{"aspect": "keyboard", "sentiment": "positive", "snippet": "keyboard feel is the best I've used"},
{"aspect": "webcam", "sentiment": "negative", "snippet": "webcam quality — it's noticeably grainy"},
{"aspect": "value", "sentiment": "positive", "snippet": "incredible value"}
],
"metadata": {"source": "product_review", "domain": "electronics", "language": "en", "word_count": 52}
},
{
"text": "The hotel location was perfect, right on the beach with ocean views from our room. However, the check-in process took over 45 minutes due to understaffing, and our room wasn't ready until 5pm despite a 3pm check-in time. The pool area was nice but very crowded. Mixed feelings overall.",
"sentiment": "mixed",
"confidence": 0.82,
"aspects": [
{"aspect": "location", "sentiment": "positive", "snippet": "location was perfect, right on the beach"},
{"aspect": "check_in", "sentiment": "negative", "snippet": "check-in process took over 45 minutes"},
{"aspect": "room_readiness", "sentiment": "negative", "snippet": "room wasn't ready until 5pm"},
{"aspect": "amenities", "sentiment": "neutral", "snippet": "pool area was nice but very crowded"}
],
"metadata": {"source": "travel_review", "domain": "hospitality", "language": "en", "word_count": 58}
},
{
"text": "Ordered the medium roast blend. Arrived on time, packaging intact. Tastes like coffee.",
"sentiment": "neutral",
"confidence": 0.75,
"aspects": [
{"aspect": "delivery", "sentiment": "neutral", "snippet": "Arrived on time"},
{"aspect": "taste", "sentiment": "neutral", "snippet": "Tastes like coffee"}
],
"metadata": {"source": "product_review", "domain": "food_beverage", "language": "en", "word_count": 16}
}
]Data Collection Guide
Source text from the platforms and domains where your model will operate. For product review sentiment, export reviews from your e-commerce platform. For social media monitoring, collect posts mentioning your brand or industry keywords. For customer feedback analysis, export survey responses and support ticket comments. Each domain has its own linguistic patterns for expressing sentiment, and training on in-domain data is essential for accurate classification.
Label quality depends heavily on clear annotation guidelines. Define exactly what constitutes positive, negative, neutral, and mixed sentiment with domain-specific examples. Address edge cases in your guidelines: sarcasm, comparative statements ("better than X but worse than Y"), conditional sentiment ("would be great if..."), and intensity variations. Provide annotators with 20-30 calibration examples before they begin labeling to establish consistent standards.
For aspect-based sentiment, define your aspect taxonomy before annotation begins. List all relevant aspects for your domain (for a restaurant: food_quality, service, ambiance, price, cleanliness, wait_time) and provide clear definitions and examples for each. Annotators should identify the text span that supports each aspect-level sentiment label, creating evidence that can be verified during quality review.
Quality Criteria
Measure inter-annotator agreement using Cohen's kappa or Krippendorff's alpha. For document-level sentiment, aim for kappa greater than 0.80. For aspect-based sentiment, which is more subjective, kappa greater than 0.70 is a reasonable target. Low agreement suggests ambiguous guidelines or genuinely ambiguous text that should be reviewed.
Balance the dataset across sentiment categories. Natural distributions are typically skewed — most reviews are either very positive or very negative, with fewer neutral examples. Imbalanced datasets produce models biased toward the majority class. Either collect additional examples for underrepresented classes or use stratified sampling to create a balanced training set. Aim for a minimum of 500 examples per sentiment category for reliable classification.
Validate that aspect labels align with the text spans marked as evidence. If an annotator labels an aspect as "negative" but the supporting snippet does not clearly convey negative sentiment, the example should be corrected or removed. Aspect-text alignment is a strong indicator of annotation quality and directly impacts model performance on aspect-based tasks.
Using This Template with Ertas
Import raw text data (reviews, survey responses, social posts) into Ertas Data Suite for PII redaction. Customer reviews often contain names, email addresses, order numbers, and location details that must be masked before use in training. After redaction, export the cleaned text for annotation, then re-import annotated data for final dataset preparation and format conversion.
Export in JSONL format for LLM fine-tuning or CSV format for encoder-based classification model training. Ertas Studio supports both approaches — fine-tuning a generative model for sentiment with explanations, or training a smaller classification model for high-throughput sentiment scoring.
Recommended Model
For high-throughput sentiment classification without explanations, fine-tune an encoder model (BERT, DeBERTa, or RoBERTa) which provides faster inference and lower resource requirements than generative models. For sentiment analysis with natural language explanations (explaining why the sentiment is what it is), fine-tune a 7B generative model.
Aspect-based sentiment analysis benefits from generative models that can identify aspects, extract relevant spans, and assign sentiment in a single pass. A 7B-8B model fine-tuned on aspect-level training data handles this well. Export to GGUF at Q4_K_M for efficient local inference in production.
Ship AI that runs on your users' devices.
Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.