import torch.nn as nn
class RegularizedNet(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(784, 256)
self.dropout1 = nn.Dropout(0.5) # Drop 50% of neurons
self.fc2 = nn.Linear(256, 128)
self.dropout2 = nn.Dropout(0.3) # Drop 30% of neurons
self.fc3 = nn.Linear(128, 10)
def forward(self, x):
x = torch.relu(self.fc1(x))
x = self.dropout1(x) # Apply dropout
x = torch.relu(self.fc2(x))
x = self.dropout2(x)
x = self.fc3(x)
return x10 Regularization Techniques
Regularization prevents overfitting—when your model memorizes training data but fails on new data.
10.1 Signs of Overfitting
- Training accuracy: 98% ✅
- Validation accuracy: 75% ❌
Solution: Regularization techniques!
10.2 1. Dropout
Randomly “drop” neurons during training. Forces network to learn robust features.
from tensorflow import keras
model = keras.Sequential([
keras.layers.Dense(256, activation='relu', input_shape=(784,)),
keras.layers.Dropout(0.5), # Drop 50% of neurons
keras.layers.Dense(128, activation='relu'),
keras.layers.Dropout(0.3), # Drop 30% of neurons
keras.layers.Dense(10, activation='softmax')
])When to use: After dense layers, use 0.2-0.5 dropout rate
10.3 2. Batch Normalization
Normalizes activations between layers. Stabilizes training and acts as regularization.
class BatchNormNet(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 32, 3)
self.bn1 = nn.BatchNorm2d(32) # Normalize conv output
self.conv2 = nn.Conv2d(32, 64, 3)
self.bn2 = nn.BatchNorm2d(64)
self.fc1 = nn.Linear(64 * 6 * 6, 128)
self.bn3 = nn.BatchNorm1d(128) # Normalize dense output
def forward(self, x):
x = torch.relu(self.bn1(self.conv1(x)))
x = torch.relu(self.bn2(self.conv2(x)))
x = x.view(-1, 64 * 6 * 6)
x = torch.relu(self.bn3(self.fc1(x)))
return xmodel = keras.Sequential([
keras.layers.Conv2D(32, 3, input_shape=(32, 32, 3)),
keras.layers.BatchNormalization(), # Normalize conv output
keras.layers.Activation('relu'),
keras.layers.Conv2D(64, 3),
keras.layers.BatchNormalization(),
keras.layers.Activation('relu'),
keras.layers.Flatten(),
keras.layers.Dense(128),
keras.layers.BatchNormalization(), # Normalize dense output
keras.layers.Activation('relu'),
keras.layers.Dense(10, activation='softmax')
])10.4 3. L2 Regularization (Weight Decay)
Penalizes large weights: loss = loss + λ * Σ(weights²)
# Add weight_decay to optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=0.01)# Add L2 regularizer to layers
model = keras.Sequential([
keras.layers.Dense(128, activation='relu',
kernel_regularizer=keras.regularizers.l2(0.01)),
keras.layers.Dense(10, activation='softmax')
])10.5 4. Early Stopping
Stop training when validation loss stops improving.
best_val_loss = float('inf')
patience = 5
patience_counter = 0
for epoch in range(epochs):
train(...)
val_loss = validate(...)
if val_loss < best_val_loss:
best_val_loss = val_loss
torch.save(model.state_dict(), 'best_model.pth')
patience_counter = 0
else:
patience_counter += 1
if patience_counter >= patience:
print("Early stopping!")
breakearly_stop = keras.callbacks.EarlyStopping(
monitor='val_loss',
patience=5,
restore_best_weights=True
)
model.fit(x_train, y_train, validation_split=0.2, callbacks=[early_stop])10.6 5. Data Augmentation
Create variations of training data (covered in Chapter 5).
10.7 Layer Normalization
Alternative to batch norm, works better for RNNs:
layer_norm = nn.LayerNorm(128) # Normalize across featureskeras.layers.LayerNormalization()10.8 Combining Techniques
Best practices:
# For CNNs:
Conv → BatchNorm → ReLU → Pool → Dropout(0.25) → ...
# For Dense layers:
Dense → BatchNorm → ReLU → Dropout(0.5) → ...
# For RNNs:
Embedding → LSTM(dropout=0.2, recurrent_dropout=0.2) → Dense10.9 Summary
| Technique | When to Use | Typical Value |
|---|---|---|
| Dropout | Dense layers | 0.2-0.5 |
| BatchNorm | After Conv/Dense | Default params |
| Weight Decay | All models | 0.001-0.01 |
| Early Stopping | Always | Patience 3-10 |
| Data Augmentation | Images | Always |
Rule of thumb: Start with dropout + weight decay, add batch norm if training is unstable.
10.10 What’s Next?
Chapter 11: Model evaluation and debugging!