Neural Networks

Lecture 3 · The computational backbone of Generative AI

The Neuron

An artificial neuron computes a weighted sum of its inputs, adds a bias, and passes the result through a non-linear activation function:

y = σ(w₁x₁ + w₂x₂ + … + wₙxₙ + b)

Common Activations

Function	Formula	Use Case
Sigmoid	1 / (1 + e⁻ˣ)	Binary output, gates
ReLU	max(0, x)	Hidden layers (default)
GeLU	x·Φ(x)	Transformers
Softmax	eˣⁱ / Σeˣʲ	Probability over classes / tokens

Training Loop in Python

Python · PyTorch

import torch
import torch.nn as nn

# Simple two-layer network
model = nn.Sequential(
    nn.Linear(784, 256),
    nn.ReLU(),
    nn.Linear(256, 10),
)

optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.CrossEntropyLoss()

for epoch in range(10):
    for X, y in dataloader:
        pred  = model(X)
        loss  = criterion(pred, y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    print(f"Epoch {epoch}: loss={loss.item():.4f}")

⚠️ Common Pitfall

Forgetting optimizer.zero_grad() before backward() will accumulate gradients across batches — a frequent source of bugs.