ch 06 / 12Architectural blocks
slide 01 / 12
Inductive bias

Why we need more than dense layers

A dense layer connects every input to every output. For an HD image that's 3×1080×19206M3 \times 1080 \times 1920 \approx 6 \text{M} inputs per output unit — and the layer learns nothing about where a pattern occurs in the image.

Different domains have different structure. The right block encodes that structure into the architecture so the network does not have to discover it from scratch:

  • · Images — local correlations, translation invariance → convolution.
  • · Sequences (text, audio) — temporal order, long-range dependence → recurrence or attention.
  • · Sets — order-invariance → attention, deep sets.
  • · Graphs — message passing on edges → graph neural networks.