11  Model Evaluation & Debugging

11.1 Learning Curves

Learning curves show training and validation metrics over time. They reveal if your model is overfitting, underfitting, or just right.

import matplotlib.pyplot as plt

train_losses = [0.8, 0.5, 0.3, 0.2, 0.15, 0.12, 0.10]
val_losses = [0.7, 0.45, 0.35, 0.4, 0.45, 0.48, 0.50]

plt.figure(figsize=(10, 4))
plt.plot(train_losses, label='Training Loss', marker='o')
plt.plot(val_losses, label='Validation Loss', marker='s')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Learning Curves')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

print("⚠️ Overfitting detected!")
print("Training loss decreases but validation loss increases after epoch 2")
import matplotlib.pyplot as plt

# After training
# history = model.fit(...)

# Plot learning curves
plt.figure(figsize=(12, 4))

plt.subplot(1, 2, 1)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.grid(True, alpha=0.3)

plt.subplot(1, 2, 2)
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

11.2 Diagnosing Problems

11.2.1 Underfitting

Symptoms: - Low training accuracy (< 80%) - Training and validation accuracy both low

Solutions: - Increase model capacity (more layers/neurons) - Train longer - Reduce regularization

11.2.2 Overfitting

Symptoms: - High training accuracy (> 95%) - Low validation accuracy (< 80%) - Large gap between train and val

Solutions: - Add regularization (dropout, weight decay) - Get more data - Data augmentation - Early stopping

11.2.3 Just Right

Symptoms: - High training accuracy (~95%) - High validation accuracy (~92%) - Small gap between train and val

Action: You’re done! 🎉

11.3 TensorBoard Visualization

from torch.utils.tensorboard import SummaryWriter

writer = SummaryWriter('runs/experiment_1')

# In training loop:
# writer.add_scalar('Loss/train', train_loss, epoch)
# writer.add_scalar('Loss/val', val_loss, epoch)
# writer.add_scalar('Accuracy/train', train_acc, epoch)

# writer.close()

# View in terminal: tensorboard --logdir=runs
print("✅ TensorBoard logging setup")
print("Run: tensorboard --logdir=runs")
tensorboard_callback = keras.callbacks.TensorBoard(log_dir='logs/')

# model.fit(..., callbacks=[tensorboard_callback])

# View in terminal: tensorboard --logdir=logs
print("✅ TensorBoard logging setup")
print("Run: tensorboard --logdir=logs")

11.4 Confusion Matrix

For classification, see where your model makes mistakes:

from sklearn.metrics import confusion_matrix
import seaborn as sns

# After inference
# y_true = ...
# y_pred = ...

# Example data
y_true = [0, 1, 2, 2, 1, 0, 1, 2]
y_pred = [0, 2, 2, 2, 1, 0, 1, 1]

cm = confusion_matrix(y_true, y_pred)

plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix')
plt.show()
from sklearn.metrics import confusion_matrix
import seaborn as sns

# After inference
# y_pred = model.predict(x_test)
# y_pred_classes = np.argmax(y_pred, axis=1)

# Example data
y_true = [0, 1, 2, 2, 1, 0, 1, 2]
y_pred_classes = [0, 2, 2, 2, 1, 0, 1, 1]

cm = confusion_matrix(y_true, y_pred_classes)

plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix')
plt.show()

11.5 Model Checkpointing

Save best model during training:

# In training loop
if val_loss < best_val_loss:
    best_val_loss = val_loss
    torch.save({
        'epoch': epoch,
        'model_state_dict': model.state_dict(),
        'optimizer_state_dict': optimizer.state_dict(),
        'loss': val_loss,
    }, 'checkpoint.pth')
    print(f"✅ Model saved at epoch {epoch}")

# Load checkpoint
checkpoint = torch.load('checkpoint.pth')
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']
checkpoint_callback = keras.callbacks.ModelCheckpoint(
    'best_model.h5',
    monitor='val_loss',
    save_best_only=True,
    verbose=1
)

# model.fit(..., callbacks=[checkpoint_callback])

# Load checkpoint
model = keras.models.load_model('best_model.h5')

11.6 Hyperparameter Tuning

Key hyperparameters to tune: 1. Learning rate (most important!) 2. Batch size 3. Number of layers 4. Neurons per layer 5. Dropout rate

Simple grid search:

learning_rates = [0.1, 0.01, 0.001, 0.0001]
batch_sizes = [16, 32, 64, 128]

for lr in learning_rates:
    for bs in batch_sizes:
        train_model(lr=lr, batch_size=bs)
        # Track results

11.7 Common Issues & Fixes

Problem Symptom Solution
NaN Loss Loss becomes NaN Lower learning rate, check data
Exploding Gradients Loss spikes Gradient clipping, lower LR
Slow Convergence Loss plateaus early Increase LR, check data normalization
No Learning Loss doesn’t change Check loss function, verify data flow

11.8 Summary

  • Learning curves diagnose overfitting/underfitting
  • TensorBoard visualizes training in real-time
  • Confusion matrix shows classification errors
  • Checkpointing saves best models
  • Hyperparameter tuning improves performance

11.9 What’s Next?

Chapter 12: Building real-world projects and deploying models!