Inductive bias
Why we need more than dense layers
A dense layer connects every input to every output. For an HD image that's inputs per output unit — and the layer learns nothing about where a pattern occurs in the image.
Different domains have different structure. The right block encodes that structure into the architecture so the network does not have to discover it from scratch:
- · Images — local correlations, translation invariance → convolution.
- · Sequences (text, audio) — temporal order, long-range dependence → recurrence or attention.
- · Sets — order-invariance → attention, deep sets.
- · Graphs — message passing on edges → graph neural networks.