11 Model Evaluation & Debugging

11.1 Learning Curves

Learning curves show training and validation metrics over time. They reveal if your model is overfitting, underfitting, or just right.

PyTorch
TensorFlow

import matplotlib.pyplot as plt

train_losses = [0.8, 0.5, 0.3, 0.2, 0.15, 0.12, 0.10]
val_losses = [0.7, 0.45, 0.35, 0.4, 0.45, 0.48, 0.50]

plt.figure(figsize=(10, 4))
plt.plot(train_losses, label='Training Loss', marker='o')
plt.plot(val_losses, label='Validation Loss', marker='s')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Learning Curves')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

print("⚠️ Overfitting detected!")
print("Training loss decreases but validation loss increases after epoch 2")

import matplotlib.pyplot as plt

# After training
# history = model.fit(...)

# Plot learning curves
plt.figure(figsize=(12, 4))

plt.subplot(1, 2, 1)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.grid(True, alpha=0.3)

plt.subplot(1, 2, 2)
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

11.2 Diagnosing Problems

11.2.1 Underfitting

Symptoms: - Low training accuracy (< 80%) - Training and validation accuracy both low

Solutions: - Increase model capacity (more layers/neurons) - Train longer - Reduce regularization

11.2.2 Overfitting

Symptoms: - High training accuracy (> 95%) - Low validation accuracy (< 80%) - Large gap between train and val

Solutions: - Add regularization (dropout, weight decay) - Get more data - Data augmentation - Early stopping

11.2.3 Just Right

Symptoms: - High training accuracy (~95%) - High validation accuracy (~92%) - Small gap between train and val

Action: You’re done! 🎉

from torch.utils.tensorboard import SummaryWriter

writer = SummaryWriter('runs/experiment_1')

# In training loop:
# writer.add_scalar('Loss/train', train_loss, epoch)
# writer.add_scalar('Loss/val', val_loss, epoch)
# writer.add_scalar('Accuracy/train', train_acc, epoch)

# writer.close()

# View in terminal: tensorboard --logdir=runs
print("✅ TensorBoard logging setup")
print("Run: tensorboard --logdir=runs")

tensorboard_callback = keras.callbacks.TensorBoard(log_dir='logs/')

# model.fit(..., callbacks=[tensorboard_callback])

# View in terminal: tensorboard --logdir=logs
print("✅ TensorBoard logging setup")
print("Run: tensorboard --logdir=logs")

11.4 Confusion Matrix

For classification, see where your model makes mistakes:

PyTorch
TensorFlow

from sklearn.metrics import confusion_matrix
import seaborn as sns

# After inference
# y_true = ...
# y_pred = ...

# Example data
y_true = [0, 1, 2, 2, 1, 0, 1, 2]
y_pred = [0, 2, 2, 2, 1, 0, 1, 1]

cm = confusion_matrix(y_true, y_pred)

plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix')
plt.show()

from sklearn.metrics import confusion_matrix
import seaborn as sns

# After inference
# y_pred = model.predict(x_test)
# y_pred_classes = np.argmax(y_pred, axis=1)

# Example data
y_true = [0, 1, 2, 2, 1, 0, 1, 2]
y_pred_classes = [0, 2, 2, 2, 1, 0, 1, 1]

cm = confusion_matrix(y_true, y_pred_classes)

plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix')
plt.show()

11.5 Model Checkpointing

Save best model during training:

PyTorch
TensorFlow

# In training loop
if val_loss < best_val_loss:
    best_val_loss = val_loss
    torch.save({
        'epoch': epoch,
        'model_state_dict': model.state_dict(),
        'optimizer_state_dict': optimizer.state_dict(),
        'loss': val_loss,
    }, 'checkpoint.pth')
    print(f"✅ Model saved at epoch {epoch}")

# Load checkpoint
checkpoint = torch.load('checkpoint.pth')
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']

checkpoint_callback = keras.callbacks.ModelCheckpoint(
    'best_model.h5',
    monitor='val_loss',
    save_best_only=True,
    verbose=1
)

# model.fit(..., callbacks=[checkpoint_callback])

# Load checkpoint
model = keras.models.load_model('best_model.h5')

11.6 Hyperparameter Tuning

Key hyperparameters to tune: 1. Learning rate (most important!) 2. Batch size 3. Number of layers 4. Neurons per layer 5. Dropout rate

Simple grid search:

learning_rates = [0.1, 0.01, 0.001, 0.0001]
batch_sizes = [16, 32, 64, 128]

for lr in learning_rates:
    for bs in batch_sizes:
        train_model(lr=lr, batch_size=bs)
        # Track results

11.7 Common Issues & Fixes

Problem	Symptom	Solution
NaN Loss	Loss becomes NaN	Lower learning rate, check data
Exploding Gradients	Loss spikes	Gradient clipping, lower LR
Slow Convergence	Loss plateaus early	Increase LR, check data normalization
No Learning	Loss doesn’t change	Check loss function, verify data flow

11.8 Summary

Learning curves diagnose overfitting/underfitting
TensorBoard visualizes training in real-time
Confusion matrix shows classification errors
Checkpointing saves best models
Hyperparameter tuning improves performance

11.9 What’s Next?

Chapter 12: Building real-world projects and deploying models!