Neural Networks
Lecture 3 · The computational backbone of Generative AI
The Neuron
An artificial neuron computes a weighted sum of its inputs, adds a bias, and passes the result through a non-linear activation function:
y = σ(w₁x₁ + w₂x₂ + … + wₙxₙ + b)
Common Activations
| Function | Formula | Use Case |
|---|---|---|
| Sigmoid | 1 / (1 + e⁻ˣ) | Binary output, gates |
| ReLU | max(0, x) | Hidden layers (default) |
| GeLU | x·Φ(x) | Transformers |
| Softmax | eˣⁱ / Σeˣʲ | Probability over classes / tokens |
Training Loop in Python
Python · PyTorch
import torch import torch.nn as nn # Simple two-layer network model = nn.Sequential( nn.Linear(784, 256), nn.ReLU(), nn.Linear(256, 10), ) optimizer = torch.optim.Adam(model.parameters(), lr=1e-3) criterion = nn.CrossEntropyLoss() for epoch in range(10): for X, y in dataloader: pred = model(X) loss = criterion(pred, y) optimizer.zero_grad() loss.backward() optimizer.step() print(f"Epoch {epoch}: loss={loss.item():.4f}")
⚠️ Common Pitfall
Forgetting optimizer.zero_grad() before backward() will accumulate gradients across batches — a frequent source of bugs.