ch 02 / 12Data
slide 01 / 05
Why data first

Garbage in, garbage out

A learning algorithm cannot recover signal that was never recorded. Same model, same underlying truth — the only thing that changes on the right is the noise on the labels:

yi=wxi+b+εiy_i = w^* x_i + b^* + \varepsilon_i

The fit drifts every time the noise is resampled. The dashed line is the truth; the blue line is what the model recovers.

Clean labelsŵ 0.60 · b̂ 1.23Noisy labelsŵ 0.51 · b̂ 1.44
same model, same truth — noise alone moves the fit