How to Build an Emotion Recognition Tool with PyTorch and FastAPI

How to Build an Emotion Recognition Tool with PyTorch and FastAPI
Real-World Use Case and Architecture
Prerequisites and Environment Setup
Create a virtual environment
Core dependencies
Core Implementation: Training the Emotion Classifier
train_emotion.py
Constants
Building the Production API with FastAPI
api.py

📺 Watch: Neural Networks Explained

Video by 3Blue1Brown

Emotion recognition from facial expressions remains one of the most challenging yet commercially valuable problems in computer vision. While basic classifiers can distinguish between happy and sad faces with reasonable accuracy, production-grade systems must handle real-world variability in lighting, head pose, occlusions, and cultural differences in emotional expression. In this tutorial, we'll build a complete emotion recognition pipeline that achieves competitive accuracy while remaining deployable on modest hardware.

We'll use a ResNet-18 backbone fine-tuned on the FER2013 dataset, wrapped in a FastAPI service with proper input validation, batching, and monitoring. The system will recognize seven basic emotions: anger, disgust, fear, happiness, sadness, surprise, and neutral. By the end, you'll have a production-ready API that can process both static images and video streams.

Real-World Use Case and Architecture

Emotion recognition systems are deployed across multiple industries. According to a 2025 MarketsandMarkets report, the facial recognition market (including emotion detection) is projected to reach $12.67 billion by 2027, with healthcare and automotive sectors driving adoption. Common use cases include:

Automotive safety: Detecting driver drowsiness or road rag [2]e
Healthcare monitoring: Assessing patient pain levels or depression severity
Retail analytics: Measuring customer satisfaction in physical stores
Human-computer interaction: Adaptive interfaces that respond to user frustration

Our architecture follows a three-tier design:

Preprocessing layer: Face detection using MTCNN, alignment, and normalization
Inference engine: PyTorch [7] model with ONNX Runtime for optimized serving
API layer: FastAPI with async endpoints, rate limiting, and health checks

This separation allows independent scaling of each component. The preprocessing layer can be offloaded to GPU if needed, while the inference engine benefits from ONNX's cross-platform optimization.

Prerequisites and Environment Setup

Before writing code, ensure your environment has the following dependencies. We'll use Python 3.10+ and CUDA 11.8 if available.

# Create a virtual environment
python -m venv emotion_env
source emotion_env/bin/activate # On Windows: emotion_env\Scripts\activate

# Core dependencies
pip install torch==2.1.0 torchvision==0.16.0 --index-url https://download.pytorch.org/whl/cu118
pip install fastapi==0.104.1 uvicorn[standard]==0.24.0
pip install opencv-python==4.8.1.78 pillow==10.1.0
pip install numpy==1.26.2 scikit-learn==1.3.2
pip install onnxruntime-gpu==1.16.3 # Use onnxruntime for CPU-only systems
pip install python-multipart==0.0.6 # For file uploads
pip install pydantic==2.5.2 pydantic-settings==2.1.0
pip install prometheus-client==0.19.0 # For monitoring

For face detection, we'll use MTCNN from the facenet-pytorch library, which provides a pre-trained model:

pip install facenet-pytorch==2.5.3

Hardware considerations: The model requires approximately 2GB of GPU memory for batch inference of 32 images. On CPU, expect 50-100ms per image with MTCNN detection. For production, consider using a smaller face detector like RetinaFace or MediaPipe if latency is critical.

Core Implementation: Training the Emotion Classifier

We'll start by training a ResNet-18 model on the FER2013 dataset. This dataset contains 35,887 grayscale 48x48 pixel faces labeled with seven emotions. The dataset is imbalanced, with "happy" and "neutral" being overrepresented compared to "disgust" and "fear".

# train_emotion.py
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset
from torchvision import transforms, models
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
import os
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Constants
EMOTIONS = ['Angry', 'Disgust', 'Fear', 'Happy', 'Sad', 'Surprise', 'Neutral']
BATCH_SIZE = 64
EPOCHS = 50
LEARNING_RATE = 1e-4
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

class FER2013Dataset(Dataset):
 """Custom dataset for FER2013 CSV format."""

 def __init__(self, dataframe, transform=None):
 self.dataframe = dataframe
 self.transform = transform

 def __len__(self):
 return len(self.dataframe)

 def __getitem__(self, idx):
 row = self.dataframe.iloc[idx]
 # Parse pixel values from space-separated string
 pixels = np.array([int(p) for p in row['pixels'].split()], dtype=np.uint8)
 image = pixels.change(48, 48).astype(np.float32)

 # Normalize to [0, 1] and convert to 3-channel
 image = image / 255.0
 image = np.stack([image] * 3, axis=0) # Shape: (3, 48, 48)

 label = int(row['emotion'])

 if self.transform:
 # Convert to tensor and apply transforms
 image_tensor = torch.from_numpy(image)
 # Resize to 224x224 for ResNet
 image_tensor = torch.nn.functional.interpolate(
 image_tensor.unsqueeze(0), size=(224, 224), mode='bilinear'
 ).squeeze(0)
 image_tensor = self.transform(image_tensor)
 else:
 image_tensor = torch.from_numpy(image)
 image_tensor = torch.nn.functional.interpolate(
 image_tensor.unsqueeze(0), size=(224, 224), mode='bilinear'
 ).squeeze(0)

 return image_tensor, label

def load_data(csv_path='fer2013.csv'):
 """Load and split FER2013 dataset."""
 df = pd.read_csv(csv_path)

 # The dataset has a 'Usage' column for train/test split
 train_df = df[df['Usage'] == 'Training']
 val_df = df[df['Usage'] == 'PublicTest']
 test_df = df[df['Usage'] == 'PrivateTest']

 logger.info(f"Train: {len(train_df)}, Val: {len(val_df)}, Test: {len(test_df)}")

 # Data augmentation for training
 train_transform = transforms.Compose([
 transforms.RandomHorizontalFlip(p=0.5),
 transforms.RandomRotation(degrees=10),
 transforms.ColorJitter(brightness=0.1, contrast=0.1),
 transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
 ])

 val_transform = transforms.Compose([
 transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
 ])

 train_dataset = FER2013Dataset(train_df, transform=train_transform)
 val_dataset = FER2013Dataset(val_df, transform=val_transform)
 test_dataset = FER2013Dataset(test_df, transform=val_transform)

 return train_dataset, val_dataset, test_dataset

def create_model():
 """Create ResNet-18 with custom classifier head."""
 model = models.resnet18(weights=models.ResNet18_Weights.IMAGENET1K_V1)

 # Freeze early layers to prevent overfitting on small dataset
 for param in list(model.parameters())[:-8]: # Keep last 2 blocks trainable
 param.requires_grad = False

 # Replace classifier head
 num_features = model.fc.in_features
 model.fc = nn.Sequential(
 nn.Dropout(0.5),
 nn.Linear(num_features, 256),
 nn.ReLU(),
 nn.BatchNorm1d(256),
 nn.Dropout(0.3),
 nn.Linear(256, 7) # 7 emotion classes
 )

 return model

def train_model(model, train_loader, val_loader, criterion, optimizer, scheduler):
 """Training loop with validation and early stopping."""
 best_val_acc = 0.0
 patience = 10
 patience_counter = 0

 for epoch in range(EPOCHS):
 # Training phase
 model.train()
 train_loss = 0.0
 train_correct = 0
 train_total = 0

 for images, labels in train_loader:
 images, labels = images.to(DEVICE), labels.to(DEVICE)

 optimizer.zero_grad()
 outputs = model(images)
 loss = criterion(outputs, labels)
 loss.backward()

 # Gradient clipping to prevent exploding gradients
 torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)

 optimizer.step()

 train_loss += loss.item() * images.size(0)
 _, predicted = torch.max(outputs, 1)
 train_total += labels.size(0)
 train_correct += (predicted == labels).sum().item()

 train_acc = 100 * train_correct / train_total
 train_loss = train_loss / train_total

 # Validation phase
 model.eval()
 val_loss = 0.0
 val_correct = 0
 val_total = 0

 with torch.no_grad():
 for images, labels in val_loader:
 images, labels = images.to(DEVICE), labels.to(DEVICE)
 outputs = model(images)
 loss = criterion(outputs, labels)

 val_loss += loss.item() * images.size(0)
 _, predicted = torch.max(outputs, 1)
 val_total += labels.size(0)
 val_correct += (predicted == labels).sum().item()

 val_acc = 100 * val_correct / val_total
 val_loss = val_loss / val_total

 scheduler.step(val_loss)

 logger.info(f"Epoch {epoch+1}/{EPOCHS} | "
 f"Train Loss: {train_loss:.4f} Acc: {train_acc:.2f}% | "
 f"Val Loss: {val_loss:.4f} Acc: {val_acc:.2f}%")

 # Early stopping and model checkpoint
 if val_acc > best_val_acc:
 best_val_acc = val_acc
 patience_counter = 0
 torch.save(model.state_dict(), 'best_emotion_model.pth')
 logger.info(f"Saved new best model with val_acc: {val_acc:.2f}%")
 else:
 patience_counter += 1
 if patience_counter >= patience:
 logger.info(f"Early stopping triggered after {epoch+1} epochs")
 break

 return model

def main():
 logger.info(f"Using device: {DEVICE}")

 # Load data
 train_dataset, val_dataset, test_dataset = load_data()

 train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=4)
 val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=False, num_workers=4)
 test_loader = DataLoader(test_dataset, batch_size=BATCH_SIZE, shuffle=False, num_workers=4)

 # Create model
 model = create_model().to(DEVICE)

 # Loss and optimizer
 # Use class weights to handle imbalance
 class_counts = [0] * 7
 for _, label in train_dataset:
 class_counts[label] += 1

 class_weights = 1.0 / torch.tensor(class_counts, dtype=torch.float)
 class_weights = class_weights / class_weights.sum()
 class_weights = class_weights.to(DEVICE)

 criterion = nn.CrossEntropyLoss(weight=class_weights)
 optimizer = optim.AdamW(model.parameters(), lr=LEARNING_RATE, weight_decay=1e-4)
 scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.5, patience=5)

 # Train
 model = train_model(model, train_loader, val_loader, criterion, optimizer, scheduler)

 # Load best model for testing
 model.load_state_dict(torch.load('best_emotion_model.pth'))
 model.eval()

 # Evaluate on test set
 all_preds = []
 all_labels = []

 with torch.no_grad():
 for images, labels in test_loader:
 images, labels = images.to(DEVICE), labels.to(DEVICE)
 outputs = model(images)
 _, predicted = torch.max(outputs, 1)
 all_preds.extend(predicted.cpu().numpy())
 all_labels.extend(labels.cpu().numpy())

 logger.info("\nTest Set Classification Report:")
 logger.info(classification_report(all_labels, all_preds, target_names=EMOTIONS))

 # Export to ONNX for production
 dummy_input = torch.randn(1, 3, 224, 224).to(DEVICE)
 torch.onnx.export(
 model, dummy_input, 'emotion_model.onnx',
 input_names=['input'], output_names=['output'],
 dynamic_axes={'input': {0: 'batch_size'}, 'output': {0: 'batch_size'}},
 opset_version=17
 )
 logger.info("Model exported to ONNX format")

if __name__ == '__main__':
 main()

Key design decisions in the training code:

Transfer learning with selective freezing: We freeze all but the last two residual blocks of ResNet-18. This prevents overfitting on the relatively small FER2013 dataset (35K images) while allowing the model to adapt high-level features to emotion recognition.
Class weighting: The FER2013 dataset has severe class imbalance. "Disgust" has only 600 samples compared to "Happy" with 8,989. Using inverse frequency weights in the loss function helps the model learn minority classes without oversampling.
Gradient clipping: With a learning rate of 1e-4 and AdamW, gradient norms can spike during early training. Clipping at 1.0 stabilizes training.
ONNX export: Converting to ONNX allows deployment on edge devices (Raspberry Pi, Jetson) and enables optimizations like INT8 quantization. The dynamic axes parameter allows variable batch sizes during inference.

Building the Production API with FastAPI

Now we'll create the inference server. This API handles image uploads, runs face detection, performs emotion classification, and returns structured results with confidence scores.

# api.py
import io
import time
from typing import List, Optional
import numpy as np
import cv2
from PIL import Image
import torch
import onnxruntime as ort
from fastapi import FastAPI, File, UploadFile, HTTPException, Depends
from fastapi.responses import JSONResponse
from pydantic import BaseModel, Field
import uvicorn
from facenet_pytorch import MTCNN
import logging
from prometheus_client import Counter, Histogram, generate_latest, CONTENT_TYPE_LATEST
from starlette.responses import Response

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Prometheus metrics
PREDICTION_COUNTER = Counter('emotion_predictions_total', 'Total predictions made')
PREDICTION_LATENCY = Histogram('emotion_prediction_latency_seconds', 'Prediction latency')
DETECTION_FAILURES = Counter('face_detection_failures_total', 'Failed face detections')

app = FastAPI(
 title="Emotion Recognition API",
 description="Production-grade emotion recognition from facial expressions",
 version="1.0.0"
)

# Global model instances
class EmotionModel:
 """Wrapper for ONNX emotion classifier with face detection."""

 def __init__(self, onnx_path: str = 'emotion_model.onnx'):
 self.emotions = ['Angry', 'Disgust', 'Fear', 'Happy', 'Sad', 'Surprise', 'Neutral']

 # Initialize face detector
 self.face_detector = MTCNN(
 image_size=160, # MTCNN default
 margin=20,
 min_face_size=20,
 thresholds=[0.6, 0.7, 0.7],
 factor=0.709,
 post_process=True,
 device='cuda' if torch.cuda.is_available() else 'cpu'
 )

 # Initialize ONNX runtime
 providers = ['CUDAExecutionProvider', 'CPUExecutionProvider'] if torch.cuda.is_available() else ['CPUExecutionProvider']
 self.session = ort.InferenceSession(onnx_path, providers=providers)

 # Input/output details
 self.input_name = self.session.get_inputs()[0].name
 self.output_name = self.session.get_outputs()[0].name

 logger.info(f"Model loaded with providers: {providers}")

 def preprocess_face(self, face_tensor: torch.Tensor) -> np.ndarray:
 """Preprocess detected face for emotion model."""
 # MTCNN returns tensor of shape (3, 160, 160)
 # Resize to (3, 224, 224) for ResNet
 face_resized = torch.nn.functional.interpolate(
 face_tensor.unsqueeze(0), size=(224, 224), mode='bilinear'
 ).squeeze(0)

 # Normalize with ImageNet stats
 mean = torch.tensor([0.485, 0.456, 0.406]).view(3, 1, 1)
 std = torch.tensor([0.229, 0.224, 0.225]).view(3, 1, 1)
 face_normalized = (face_resized / 255.0 - mean) / std

 # Convert to numpy for ONNX
 return face_normalized.numpy().astype(np.float32)

 def predict(self, image: np.ndarray) -> List[dict]:
 """
 Detect faces and predict emotions.

 Args:
 image: RGB image as numpy array (H, W, 3)

 Returns:
 List of dicts with 'bbox', 'emotion', 'confidence'
 """
 # Detect faces
 boxes, probs = self.face_detector.detect(image)

 if boxes is None:
 DETECTION_FAILURES.inc()
 return []

 results = []
 for i, (box, prob) in enumerate(zip(boxes, probs)):
 if prob < 0.9: # Confidence threshold
 continue

 # Extract face using MTCNN's internal method
 face_tensor = self.face_detector.extract(image, [box], save_path=None)

 if face_tensor is None:
 continue

 # Preprocess and predict
 input_tensor = self.preprocess_face(face_tensor)

 # ONNX inference
 start_time = time.time()
 outputs = self.session.run(
 [self.output_name],
 {self.input_name: np.expand_dims(input_tensor, axis=0)}
 )
 latency = time.time() - start_time

 # Get probabilities
 logits = outputs[0][0]
 exp_logits = np.exp(logits - np.max(logits))
 probabilities = exp_logits / exp_logits.sum()

 # Get top prediction
 pred_idx = np.argmax(probabilities)
 confidence = float(probabilities[pred_idx])

 # Convert box to int list
 bbox = [int(x) for x in box.tolist()]

 results.append({
 'bbox': bbox,
 'face_confidence': float(prob),
 'emotion': self.emotions[pred_idx],
 'confidence': confidence,
 'probabilities': {
 emotion: float(probabilities[j])
 for j, emotion in enumerate(self.emotions)
 },
 'latency_ms': round(latency * 1000, 2)
 })

 PREDICTION_COUNTER.inc()
 PREDICTION_LATENCY.observe(latency)

 return results

# Initialize model at startup
model_instance = None

@app.on_event("startup")
async def startup_event():
 global model_instance
 model_instance = EmotionModel()
 logger.info("Emotion model initialized")

# Pydantic models for response
class EmotionResult(BaseModel):
 bbox: List[int] = Field(.., description="Bounding box [x1, y1, x2, y2]")
 face_confidence: float = Field(.., ge=0, le=1)
 emotion: str
 confidence: float = Field(.., ge=0, le=1)
 probabilities: dict
 latency_ms: float

class PredictionResponse(BaseModel):
 faces: List[EmotionResult]
 total_faces: int
 processing_time_ms: float

@app.post("/predict", response_model=PredictionResponse)
async def predict_emotion(file: UploadFile = File(..)):
 """
 Upload an image and get emotion predictions for all detected faces.

 Accepts: JPEG, PNG, WebP
 Max file size: 10MB (configured in nginx/proxy)
 """
 # Validate file type
 if file.content_type not in ['image/jpeg', 'image/png', 'image/webp']:
 raise HTTPException(status_code=400, detail="Unsupported image format")

 # Read image
 contents = await file.read()

 try:
 # Convert to numpy array
 image = Image.open(io.BytesIO(contents))
 image = image.convert('RGB')
 image_np = np.array(image)
 except Exception as e:
 raise HTTPException(status_code=400, detail=f"Invalid image: {str(e)}")

 # Predict
 start_time = time.time()
 results = model_instance.predict(image_np)
 processing_time = (time.time() - start_time) * 1000

 return PredictionResponse(
 faces=[EmotionResult(**r) for r in results],
 total_faces=len(results),
 processing_time_ms=round(processing_time, 2)
 )

@app.post("/predict_batch")
async def predict_batch(files: List[UploadFile] = File(..)):
 """
 Batch prediction for multiple images.
 Useful for video frame analysis or bulk processing.
 """
 all_results = []

 for file in files:
 contents = await file.read()
 image = Image.open(io.BytesIO(contents)).convert('RGB')
 image_np = np.array(image)

 results = model_instance.predict(image_np)
 all_results.append({
 'filename': file.filename,
 'faces': results
 })

 return {'results': all_results, 'total_images': len(files)}

@app.get("/health")
async def health_check():
 """Health check endpoint for Kubernetes liveness probe."""
 return {
 'status': 'healthy',
 'model_loaded': model_instance is not None,
 'device': 'cuda' if torch.cuda.is_available() else 'cpu'
 }

@app.get("/metrics")
async def metrics():
 """Prometheus metrics endpoint."""
 return Response(content=generate_latest(), media_type=CONTENT_TYPE_LATEST)

if __name__ == '__main__':
 uvicorn.run(
 'api:app',
 host='0.0.0.0',
 port=8000,
 workers=4, # Adjust based on CPU cores
 log_level='info'
 )

Critical production considerations in the API code:

Face detection threshold: We set MTCNN's confidence threshold to 0.9. This reduces false positives but may miss some faces in challenging conditions. In production, you might want to make this configurable via environment variables.
ONNX Runtime providers: The code automatically selects CUDA if available. On CPU-only systems, it falls back to CPUExecutionProvider. For edge deployment, consider using TensorRT or OpenVINO providers.
Batch processing: The /predict_batch endpoint allows processing multiple images in a single request. This is useful for video analysis where you extract frames at regular intervals.
Prometheus metrics: We track prediction count, latency, and detection failures. These can be scraped by Prometheus and visualized in Grafana for monitoring.
File validation: We check content type and handle malformed images gracefully. In production, add file size limits at the reverse proxy level (nginx/AWS ALB).

Edge Cases and Error Handling

Real-world emotion recognition systems face numerous edge cases that can break naive implementations:

1. Multiple faces with varying sizes: Our MTCNN detector handles this naturally, but large group photos may cause memory issues. Consider adding a maximum face count (e.g., 20 faces per image) to prevent resource exhaustion.

2. Occluded faces: Sunglasses, masks, or hands covering parts of the face reduce accuracy. The model was trained on mostly frontal faces, so profile views will perform poorly. Consider adding a head pose estimation module to filter non-frontal faces.

3. Low-light conditions: FER2013 images are grayscale and relatively well-lit. In dark environments, consider preprocessing with histogram equalization or using a denoising autoencoder.

4. Children vs adults: The model was trained primarily on adult faces. Children's facial proportions differ significantly, leading to lower accuracy. If your use case involves pediatric populations, consider fine-tuning [3] on a dataset like AffectNet.

5. Cultural differences: Emotional expression varies across cultures. For example, East Asian cultures may suppress outward displays of sadness. The FER2013 dataset is predominantly Western, so accuracy may degrade for non-Western populations.

Error handling strategy:

Return empty results (not an error) when no face is detected
Log detection failures separately from prediction errors
Implement circuit breakers for downstream services
Use exponential backoff for retries on transient failures

Deployment and Scaling

For production deployment, consider the following architecture:

# docker-compose.yml
version: '3.8'
services:
 emotion-api:
 build: .
 ports:
 - "8000:8000"
 environment:
 - CUDA_VISIBLE_DEVICES=0
 - ONNX_PROVIDER=CUDAExecutionProvider
 deploy:
 resources:
 reservations:
 devices:
 - driver: nvidia
 count: 1
 capabilities: [gpu]
 healthcheck:
 test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
 interval: 30s
 timeout: 10s
 retries: 3

 prometheus:
 image: prom/prometheus:latest
 volumes:
 - ./prometheus.yml:/etc/prometheus/prometheus.yml
 ports:
 - "9090:9090"

 grafana:
 image: grafana/grafana:latest
 ports:
 - "3000:3000"
 environment:
 - GF_SECURITY_ADMIN_PASSWORD=admin

Scaling considerations:

Use a load balancer (nginx, AWS ALB) in front of multiple API instances
Cache face detection results for identical images (e.g., video frames)
Consider using Redis for request queuing during traffic spikes
Implement rate limiting per API key to prevent abuse

What's Next

This tutorial provides a production-ready foundation for emotion recognition, but several improvements can enhance accuracy and robustness:

Multi-modal fusion: Combine facial expressions with voice tone and text sentiment for more accurate emotion detection. Research from MIT Media Lab shows that multi-modal systems achieve 15-20% higher accuracy than vision-only systems.
Temporal modeling: For video analysis, use a 3D CNN or LSTM to capture emotional transitions over time. This is critical for detecting micro-expressions that last only 1/25th of a second.
Federated learning: If deploying on edge devices, consider federated learning to improve the model without centralizing sensitive facial data. Google's TensorFlow [4] Federated framework supports this.
Explainability: Add Grad-CAM visualizations to show which facial regions influenced the prediction. This builds trust with users and helps debug misclassifications.
Privacy preservation: Implement on-device processing to avoid transmitting facial images over the network. Apple's Vision framework already does this for face detection on iOS.

The complete source code for this tutorial is available on GitHub. For further reading, check out our guides on deploying PyTorch models with ONNX Runtime and building scalable computer vision APIs.

References

1. Wikipedia - TensorFlow. Wikipedia. [Source]

2. Wikipedia - Rag. Wikipedia. [Source]

3. Wikipedia - Fine-tuning. Wikipedia. [Source]

4. GitHub - tensorflow/tensorflow. Github. [Source]

5. GitHub - Shubhamsaboo/awesome-llm-apps. Github. [Source]

6. GitHub - hiyouga/LlamaFactory. Github. [Source]

7. GitHub - pytorch/pytorch. Github. [Source]

How to Build an Emotion Recognition Tool with PyTorch and FastAPI

How to Build an Emotion Recognition Tool with PyTorch and FastAPI

Table of Contents

📺 Watch: Neural Networks Explained

Real-World Use Case and Architecture

Prerequisites and Environment Setup

Core Implementation: Training the Emotion Classifier

Building the Production API with FastAPI

Edge Cases and Error Handling

Deployment and Scaling

What's Next

References

Was this article helpful?

Related Articles

How to Build an LLM from Scratch with PyTorch

How to Build a Smart Speaker with Gemini Integration

How to Deploy a Custom Transformer for Text Classification in 2026