ch 07 / 12Computer vision tasks
slide 01 / 07
What is in this image

Classification

One label per image. Output is a probability distribution over a fixed vocabulary, produced by softmax over the network logits:

p^k=ezkjezj\hat p_k = \frac{e^{z_k}}{\sum_j e^{z_j}}

The prediction is argmaxkp^k\arg\max_k \hat p_k; loss is categorical cross-entropy. Top-1 and top-5 accuracy are the standard metrics. ImageNet (1000 classes) was the long-standing benchmark.

The bars on the right show the full distribution, not just the argmax. Image 5 is the same person as image 1; the model splits probability between two classes once the appearance changes.

team member 1
predicted distribution
que mulher maravilhosa (UAU)94%
readstone3.0%
modelo de comercial de shampoo1.2%
sereia1.1%
engenheira da computação0.7%
softmax over a class vocabulary — argmax wins