§ 01 — Foundation

What is a convolution, really?

A convolution slides a small grid — the kernel — across a larger grid (your image), multiplying overlapping cells and summing the result. Each sum becomes one pixel of the output. Press Step to watch the kernel walk.

Demo 01 · Sliding Window
Input
Kernel position
Output cell
Input · Kernel · Output
output[i,j] = Σ input[i+k, j+l] · kernel[k,l]
Position (0,0) — click Step to begin.

A 5×5 input with a 3×3 kernel produces a 3×3 output. The shape shrinks — that's where padding comes in.

§ 02 — The Operator

Kernels are tiny instruction sets.

The numbers inside the kernel decide what the convolution detects: edges, blurs, sharpens, embosses. Edit the 3×3 grid below or pick a preset to see how a real image of a cat transforms under different operators.

Demo 02 · Kernel Workshop
Original · Filtered output
Edit kernel weights Identity
Edge detection uses a kernel where the center is positive and surroundings are negative — when neighbors disagree (an edge!), the result is large. Blur averages neighbors. Sharpen amplifies the center against its surroundings.

In a real CNN, these weights aren't hand-designed — they're learned from data through gradient descent.

§ 03 — Geometry

Stride controls speed. Padding controls shape.

Stride is how many cells the kernel jumps each step — bigger strides skip pixels and shrink output faster. Padding wraps the input in zeros so the kernel can reach edge pixels and the output stays the same size.

Demo 03 · Geometry Lab
Input
Padding (zeros)
Kernel
Output
Input (padded) · Output
Input size N 8×8
5678910
Kernel size K 3×3
1357
Stride S 1
1234
Padding P 0
01234
output = ⌊(N + 2PK) / S⌋ + 1
N=8 K=3 P=0 S=1 → output = 6×6
§ 04 — Downsampling

Pooling compresses what matters.

After convolution, feature maps are still large. Pooling shrinks them by replacing each window with a single number — either the max (preserve strongest signal) or the average (preserve overall energy). It also makes the network slightly translation-invariant.

Demo 04 · Pooling Comparison
Input · Max-pooled output
Window size 2×2
234
Pool stride 2
1234
max-pool: output[i,j] = max over window
Window (0,0) — Click Step.
Max pooling is the favorite for sharp features — it keeps the strongest activation and discards the rest. Average pooling smooths and is gentler on small variations.