AI for Aerial Robotics

Inductive bias

Why we need more than dense layers

A dense layer connects every input to every output. For an HD image that's $3 \times 1080 \times 1920 \approx 6 \text{M}$ inputs per output unit — and the layer learns nothing about where a pattern occurs in the image.

Different domains have different structure. The right block encodes that structure into the architecture so the network does not have to discover it from scratch:

· Images — local correlations, translation invariance → convolution.
· Sequences (text, audio) — temporal order, long-range dependence → recurrence or attention.
· Sets — order-invariance → attention, deep sets.
· Graphs — message passing on edges → graph neural networks.