2 Neural Network Fundamentals

2.1 The Building Block: Artificial Neuron

An artificial neuron mimics a biological neuron. It receives inputs, processes them, and produces an output.

Mathematical view:

output = activation(w₁×x₁ + w₂×x₂ + ... + wₙ×xₙ + bias)

Intuitive view: Think of it as a tiny decision maker that: 1. Takes multiple inputs (features) 2. Weighs each input (some matter more) 3. Sums everything up (plus a bias term) 4. Applies an activation function (introduces non-linearity) 5. Produces an output

Let’s implement a single neuron:

PyTorch
TensorFlow

import torch
import torch.nn as nn

# Manual neuron implementation
class SingleNeuron:
    def __init__(self, n_inputs):
        # Initialize weights and bias randomly
        self.weights = torch.randn(n_inputs)
        self.bias = torch.randn(1)

    def forward(self, x):
        # Compute: w·x + b
        return torch.sum(self.weights * x) + self.bias

# Example: 3 inputs
neuron = SingleNeuron(n_inputs=3)
x = torch.tensor([1.0, 2.0, 3.0])

output = neuron.forward(x)
print(f"Weights: {neuron.weights}")
print(f"Bias: {neuron.bias.item():.4f}")
print(f"Output: {output.item():.4f}")

import tensorflow as tf
import numpy as np

# Manual neuron implementation
class SingleNeuron:
    def __init__(self, n_inputs):
        # Initialize weights and bias randomly
        self.weights = tf.Variable(tf.random.normal([n_inputs]))
        self.bias = tf.Variable(tf.random.normal([1]))

    def forward(self, x):
        # Compute: w·x + b
        return tf.reduce_sum(self.weights * x) + self.bias

# Example: 3 inputs
neuron = SingleNeuron(n_inputs=3)
x = tf.constant([1.0, 2.0, 3.0])

output = neuron.forward(x)
print(f"Weights: {neuron.weights.numpy()}")
print(f"Bias: {neuron.bias.numpy()[0]:.4f}")
print(f"Output: {output.numpy():.4f}")

2.2 Activation Functions

Without activation functions, neural networks would just be linear transformations (useless for complex patterns!). Activation functions introduce non-linearity, allowing networks to learn complex patterns.

2.2.1 Common Activation Functions

2.2.1.1 1. ReLU (Rectified Linear Unit)

Most popular in deep learning!

ReLU(x) = max(0, x)

Why it’s great: - Fast to compute - Doesn’t saturate for positive values - Works well in practice

PyTorch
TensorFlow

import torch
import matplotlib.pyplot as plt
import numpy as np

x = torch.linspace(-5, 5, 100)
relu = torch.relu(x)

plt.figure(figsize=(10, 4))
plt.plot(x.numpy(), relu.numpy(), label='ReLU', linewidth=2)
plt.grid(True, alpha=0.3)
plt.xlabel('Input')
plt.ylabel('Output')
plt.title('ReLU Activation Function')
plt.legend()
plt.axhline(y=0, color='k', linestyle='--', alpha=0.3)
plt.axvline(x=0, color='k', linestyle='--', alpha=0.3)
plt.show()

print(f"ReLU(-2) = {torch.relu(torch.tensor(-2.0))}")
print(f"ReLU(2) = {torch.relu(torch.tensor(2.0))}")

import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np

x = tf.linspace(-5.0, 5.0, 100)
relu = tf.nn.relu(x)

plt.figure(figsize=(10, 4))
plt.plot(x.numpy(), relu.numpy(), label='ReLU', linewidth=2)
plt.grid(True, alpha=0.3)
plt.xlabel('Input')
plt.ylabel('Output')
plt.title('ReLU Activation Function')
plt.legend()
plt.axhline(y=0, color='k', linestyle='--', alpha=0.3)
plt.axvline(x=0, color='k', linestyle='--', alpha=0.3)
plt.show()

print(f"ReLU(-2) = {tf.nn.relu(tf.constant(-2.0)).numpy()}")
print(f"ReLU(2) = {tf.nn.relu(tf.constant(2.0)).numpy()}")

2.2.1.2 2. Sigmoid

Squashes values between 0 and 1. Useful for binary classification output.

Sigmoid(x) = 1 / (1 + e^(-x))

PyTorch
TensorFlow

x = torch.linspace(-5, 5, 100)
sigmoid = torch.sigmoid(x)

plt.figure(figsize=(10, 4))
plt.plot(x.numpy(), sigmoid.numpy(), label='Sigmoid', linewidth=2, color='orange')
plt.grid(True, alpha=0.3)
plt.xlabel('Input')
plt.ylabel('Output')
plt.title('Sigmoid Activation Function')
plt.legend()
plt.axhline(y=0.5, color='k', linestyle='--', alpha=0.3)
plt.axvline(x=0, color='k', linestyle='--', alpha=0.3)
plt.show()

print(f"Sigmoid(-5) = {torch.sigmoid(torch.tensor(-5.0)):.4f}")
print(f"Sigmoid(0) = {torch.sigmoid(torch.tensor(0.0)):.4f}")
print(f"Sigmoid(5) = {torch.sigmoid(torch.tensor(5.0)):.4f}")

x = tf.linspace(-5.0, 5.0, 100)
sigmoid = tf.nn.sigmoid(x)

plt.figure(figsize=(10, 4))
plt.plot(x.numpy(), sigmoid.numpy(), label='Sigmoid', linewidth=2, color='orange')
plt.grid(True, alpha=0.3)
plt.xlabel('Input')
plt.ylabel('Output')
plt.title('Sigmoid Activation Function')
plt.legend()
plt.axhline(y=0.5, color='k', linestyle='--', alpha=0.3)
plt.axvline(x=0, color='k', linestyle='--', alpha=0.3)
plt.show()

print(f"Sigmoid(-5) = {tf.nn.sigmoid(tf.constant(-5.0)).numpy():.4f}")
print(f"Sigmoid(0) = {tf.nn.sigmoid(tf.constant(0.0)).numpy():.4f}")
print(f"Sigmoid(5) = {tf.nn.sigmoid(tf.constant(5.0)).numpy():.4f}")

2.2.1.3 3. Tanh (Hyperbolic Tangent)

Similar to sigmoid but outputs range from -1 to 1.

Tanh(x) = (e^x - e^(-x)) / (e^x + e^(-x))

2.2.1.4 4. Softmax

Used for multi-class classification in the output layer. Converts logits to probabilities that sum to 1.

PyTorch
TensorFlow

# Example: 3-class classification
logits = torch.tensor([2.0, 1.0, 0.1])
probabilities = torch.softmax(logits, dim=0)

print("Logits (raw outputs):", logits.numpy())
print("Probabilities:", probabilities.numpy())
print(f"Sum of probabilities: {probabilities.sum():.4f}")
print(f"Predicted class: {probabilities.argmax()}")

# Example: 3-class classification
logits = tf.constant([2.0, 1.0, 0.1])
probabilities = tf.nn.softmax(logits)

print("Logits (raw outputs):", logits.numpy())
print("Probabilities:", probabilities.numpy())
print(f"Sum of probabilities: {tf.reduce_sum(probabilities).numpy():.4f}")
print(f"Predicted class: {tf.argmax(probabilities).numpy()}")

2.2.2 When to Use Which?

Activation	Where to Use	Why
ReLU	Hidden layers	Fast, works well, default choice
Sigmoid	Binary classification output	Outputs probability (0-1)
Softmax	Multi-class classification output	Outputs probability distribution
Tanh	Hidden layers (RNNs)	Centered around 0

2.3 Forward Propagation

Forward propagation is the process of passing input through the network to get an output.

Steps: 1. Input enters first layer 2. Each neuron computes: activation(weights × inputs + bias) 3. Output becomes input for next layer 4. Repeat until final layer 5. Final layer produces prediction

Let’s build a 2-layer network from scratch:

PyTorch
TensorFlow

import torch

# Simple 2-layer network: 3 inputs → 4 hidden → 2 outputs
class TwoLayerNet:
    def __init__(self):
        self.W1 = torch.randn(3, 4) * 0.1  # Weights: input→hidden
        self.b1 = torch.zeros(4)            # Bias: hidden layer
        self.W2 = torch.randn(4, 2) * 0.1  # Weights: hidden→output
        self.b2 = torch.zeros(2)            # Bias: output layer

    def forward(self, x):
        # Layer 1: input → hidden
        hidden = torch.relu(x @ self.W1 + self.b1)
        print(f"Hidden layer output: {hidden}")

        # Layer 2: hidden → output
        output = hidden @ self.W2 + self.b2
        print(f"Final output: {output}")

        return output

# Test the network
net = TwoLayerNet()
x = torch.tensor([1.0, 2.0, 3.0])
prediction = net.forward(x)

print(f"\nInput shape: {x.shape}")
print(f"Output shape: {prediction.shape}")

import tensorflow as tf

# Simple 2-layer network: 3 inputs → 4 hidden → 2 outputs
class TwoLayerNet:
    def __init__(self):
        self.W1 = tf.Variable(tf.random.normal([3, 4]) * 0.1)  # Weights: input→hidden
        self.b1 = tf.Variable(tf.zeros([4]))                    # Bias: hidden layer
        self.W2 = tf.Variable(tf.random.normal([4, 2]) * 0.1)  # Weights: hidden→output
        self.b2 = tf.Variable(tf.zeros([2]))                    # Bias: output layer

    def forward(self, x):
        # Layer 1: input → hidden
        hidden = tf.nn.relu(tf.matmul(x, self.W1) + self.b1)
        print(f"Hidden layer output: {hidden.numpy()}")

        # Layer 2: hidden → output
        output = tf.matmul(hidden, self.W2) + self.b2
        print(f"Final output: {output.numpy()}")

        return output

# Test the network
net = TwoLayerNet()
x = tf.constant([[1.0, 2.0, 3.0]])  # Note: batch dimension
prediction = net.forward(x)

print(f"\nInput shape: {x.shape}")
print(f"Output shape: {prediction.shape}")

2.4 Loss Functions

Loss functions measure how wrong our predictions are. The network learns by minimizing this loss.

2.4.1 Classification Loss Functions

2.4.1.1 Cross-Entropy Loss

Used for classification. Measures the difference between predicted probabilities and true labels.

PyTorch
TensorFlow

import torch
import torch.nn as nn

# Example: 3-class classification
# Predictions (raw logits before softmax)
predictions = torch.tensor([[2.0, 1.0, 0.1],    # Sample 1
                             [0.5, 2.5, 0.2]])   # Sample 2

# True labels (class indices)
targets = torch.tensor([0, 1])  # Sample 1 is class 0, Sample 2 is class 1

# Compute cross-entropy loss
criterion = nn.CrossEntropyLoss()
loss = criterion(predictions, targets)

print(f"Predictions:\n{predictions}")
print(f"True labels: {targets}")
print(f"Cross-entropy loss: {loss.item():.4f}")

import tensorflow as tf

# Example: 3-class classification
# Predictions (raw logits before softmax)
predictions = tf.constant([[2.0, 1.0, 0.1],    # Sample 1
                            [0.5, 2.5, 0.2]])   # Sample 2

# True labels (class indices)
targets = tf.constant([0, 1])  # Sample 1 is class 0, Sample 2 is class 1

# Compute cross-entropy loss
criterion = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
loss = criterion(targets, predictions)

print(f"Predictions:\n{predictions.numpy()}")
print(f"True labels: {targets.numpy()}")
print(f"Cross-entropy loss: {loss.numpy():.4f}")

2.4.2 Regression Loss Functions

2.4.2.1 Mean Squared Error (MSE)

Used for regression. Measures average squared difference between predictions and targets.

PyTorch
TensorFlow

import torch
import torch.nn as nn

# Predictions and true values
predictions = torch.tensor([2.5, 3.8, 1.2])
targets = torch.tensor([3.0, 4.0, 1.0])

# Compute MSE
criterion = nn.MSELoss()
loss = criterion(predictions, targets)

print(f"Predictions: {predictions.numpy()}")
print(f"Targets: {targets.numpy()}")
print(f"MSE Loss: {loss.item():.4f}")

import tensorflow as tf

# Predictions and true values
predictions = tf.constant([2.5, 3.8, 1.2])
targets = tf.constant([3.0, 4.0, 1.0])

# Compute MSE
criterion = tf.keras.losses.MeanSquaredError()
loss = criterion(targets, predictions)

print(f"Predictions: {predictions.numpy()}")
print(f"Targets: {targets.numpy()}")
print(f"MSE Loss: {loss.numpy():.4f}")

2.5 Backpropagation (Intuition)

The key idea: Adjust weights to reduce the loss.

How it works: 1. Make a prediction (forward pass) 2. Calculate loss (how wrong we were) 3. Calculate gradients (how to change each weight to reduce loss) 4. Update weights using gradients 5. Repeat!

Good news: PyTorch and TensorFlow handle backpropagation automatically! You just need to call .backward() (PyTorch) or use GradientTape (TensorFlow).

Let’s see automatic differentiation in action:

PyTorch
TensorFlow

import torch

# Simple example: y = x^2 + 3x + 1
# What's the gradient dy/dx?

x = torch.tensor(2.0, requires_grad=True)  # Track gradients
y = x**2 + 3*x + 1

print(f"x = {x.item()}")
print(f"y = {y.item()}")

# Compute gradient automatically
y.backward()

print(f"Gradient dy/dx = {x.grad.item()}")
print(f"Analytical gradient = {2*x.item() + 3}")  # Should match!

import tensorflow as tf

# Simple example: y = x^2 + 3x + 1
# What's the gradient dy/dx?

x = tf.Variable(2.0)

# Track operations
with tf.GradientTape() as tape:
    y = x**2 + 3*x + 1

print(f"x = {x.numpy()}")
print(f"y = {y.numpy()}")

# Compute gradient automatically
gradient = tape.gradient(y, x)

print(f"Gradient dy/dx = {gradient.numpy()}")
print(f"Analytical gradient = {2*x.numpy() + 3}")  # Should match!

2.6 Putting It All Together: Mini Training Loop

Here’s a complete training loop with all concepts:

PyTorch
TensorFlow

import torch
import torch.nn as nn

# 1. Create simple dataset
X = torch.randn(100, 3)  # 100 samples, 3 features
y = torch.randint(0, 2, (100,))  # Binary labels (0 or 1)

# 2. Define model
model = nn.Sequential(
    nn.Linear(3, 4),   # 3 inputs → 4 hidden neurons
    nn.ReLU(),
    nn.Linear(4, 2)    # 4 hidden → 2 outputs (binary classification)
)

# 3. Define loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

# 4. Training loop (5 epochs)
for epoch in range(5):
    # Forward pass
    outputs = model(X)
    loss = criterion(outputs, y)

    # Backward pass
    optimizer.zero_grad()  # Clear previous gradients
    loss.backward()         # Compute gradients
    optimizer.step()        # Update weights

    print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")

print("\n✅ Training complete!")

import tensorflow as tf
from tensorflow import keras

# 1. Create simple dataset
X = tf.random.normal([100, 3])  # 100 samples, 3 features
y = tf.random.uniform([100], 0, 2, dtype=tf.int32)  # Binary labels (0 or 1)

# 2. Define model
model = keras.Sequential([
    keras.layers.Dense(4, activation='relu', input_shape=(3,)),  # 3 inputs → 4 hidden
    keras.layers.Dense(2, activation='softmax')  # 4 hidden → 2 outputs
])

# 3. Compile model
model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

# 4. Training (5 epochs)
history = model.fit(X, y, epochs=5, verbose=1, batch_size=32)

print("\n✅ Training complete!")

2.7 Key Concepts Comparison: PyTorch vs TensorFlow

Concept	PyTorch	TensorFlow/Keras
Model Definition	`nn.Module` class	`keras.Sequential` or `keras.Model`
Forward Pass	Define `forward()` method	Automatic via `model(x)`
Loss Calculation	Manual in training loop	Included in `model.compile()`
Backprop	`loss.backward()`	Automatic in `model.fit()`
Optimizer Step	`optimizer.step()`	Automatic in `model.fit()`
Style	More explicit control	Higher-level, less code

Which is better? Neither! Both are excellent. PyTorch gives more control, TensorFlow/Keras is more streamlined.

2.8 Summary

Neurons compute weighted sums + bias, then apply activation
Activation functions introduce non-linearity (ReLU most common)
Forward propagation passes data through the network
Loss functions measure prediction error
Backpropagation computes gradients automatically
Both frameworks handle the hard math for you!

2.9 What’s Next?

In Chapter 3, we’ll build our first real deep neural network and train it on the MNIST dataset, putting all these concepts into practice!