10  Regularization Techniques

Regularization prevents overfitting—when your model memorizes training data but fails on new data.

10.1 Signs of Overfitting

  • Training accuracy: 98% ✅
  • Validation accuracy: 75% ❌

Solution: Regularization techniques!

10.2 1. Dropout

Randomly “drop” neurons during training. Forces network to learn robust features.

import torch.nn as nn

class RegularizedNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784, 256)
        self.dropout1 = nn.Dropout(0.5)  # Drop 50% of neurons
        self.fc2 = nn.Linear(256, 128)
        self.dropout2 = nn.Dropout(0.3)  # Drop 30% of neurons
        self.fc3 = nn.Linear(128, 10)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = self.dropout1(x)  # Apply dropout
        x = torch.relu(self.fc2(x))
        x = self.dropout2(x)
        x = self.fc3(x)
        return x
from tensorflow import keras

model = keras.Sequential([
    keras.layers.Dense(256, activation='relu', input_shape=(784,)),
    keras.layers.Dropout(0.5),  # Drop 50% of neurons
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dropout(0.3),  # Drop 30% of neurons
    keras.layers.Dense(10, activation='softmax')
])

When to use: After dense layers, use 0.2-0.5 dropout rate

10.3 2. Batch Normalization

Normalizes activations between layers. Stabilizes training and acts as regularization.

class BatchNormNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 32, 3)
        self.bn1 = nn.BatchNorm2d(32)  # Normalize conv output
        self.conv2 = nn.Conv2d(32, 64, 3)
        self.bn2 = nn.BatchNorm2d(64)
        self.fc1 = nn.Linear(64 * 6 * 6, 128)
        self.bn3 = nn.BatchNorm1d(128)  # Normalize dense output

    def forward(self, x):
        x = torch.relu(self.bn1(self.conv1(x)))
        x = torch.relu(self.bn2(self.conv2(x)))
        x = x.view(-1, 64 * 6 * 6)
        x = torch.relu(self.bn3(self.fc1(x)))
        return x
model = keras.Sequential([
    keras.layers.Conv2D(32, 3, input_shape=(32, 32, 3)),
    keras.layers.BatchNormalization(),  # Normalize conv output
    keras.layers.Activation('relu'),
    keras.layers.Conv2D(64, 3),
    keras.layers.BatchNormalization(),
    keras.layers.Activation('relu'),
    keras.layers.Flatten(),
    keras.layers.Dense(128),
    keras.layers.BatchNormalization(),  # Normalize dense output
    keras.layers.Activation('relu'),
    keras.layers.Dense(10, activation='softmax')
])

10.4 3. L2 Regularization (Weight Decay)

Penalizes large weights: loss = loss + λ * Σ(weights²)

# Add weight_decay to optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=0.01)
# Add L2 regularizer to layers
model = keras.Sequential([
    keras.layers.Dense(128, activation='relu',
                      kernel_regularizer=keras.regularizers.l2(0.01)),
    keras.layers.Dense(10, activation='softmax')
])

10.5 4. Early Stopping

Stop training when validation loss stops improving.

best_val_loss = float('inf')
patience = 5
patience_counter = 0

for epoch in range(epochs):
    train(...)
    val_loss = validate(...)

    if val_loss < best_val_loss:
        best_val_loss = val_loss
        torch.save(model.state_dict(), 'best_model.pth')
        patience_counter = 0
    else:
        patience_counter += 1
        if patience_counter >= patience:
            print("Early stopping!")
            break
early_stop = keras.callbacks.EarlyStopping(
    monitor='val_loss',
    patience=5,
    restore_best_weights=True
)

model.fit(x_train, y_train, validation_split=0.2, callbacks=[early_stop])

10.6 5. Data Augmentation

Create variations of training data (covered in Chapter 5).

10.7 Layer Normalization

Alternative to batch norm, works better for RNNs:

layer_norm = nn.LayerNorm(128)  # Normalize across features
keras.layers.LayerNormalization()

10.8 Combining Techniques

Best practices:

# For CNNs:
Conv → BatchNorm → ReLU → Pool → Dropout(0.25) → ...

# For Dense layers:
Dense → BatchNorm → ReLU → Dropout(0.5) → ...

# For RNNs:
Embedding → LSTM(dropout=0.2, recurrent_dropout=0.2) → Dense

10.9 Summary

Technique When to Use Typical Value
Dropout Dense layers 0.2-0.5
BatchNorm After Conv/Dense Default params
Weight Decay All models 0.001-0.01
Early Stopping Always Patience 3-10
Data Augmentation Images Always

Rule of thumb: Start with dropout + weight decay, add batch norm if training is unstable.

10.10 What’s Next?

Chapter 11: Model evaluation and debugging!