AI for Aerial Robotics

What is in this image

Classification

One label per image. Output is a probability distribution over a fixed vocabulary, produced by softmax over the network logits:

\hat p_k = \frac{e^{z_k}}{\sum_j e^{z_j}}

The prediction is $\arg\max_k \hat p_k$ ; loss is categorical cross-entropy. Top-1 and top-5 accuracy are the standard metrics. ImageNet (1000 classes) was the long-standing benchmark.

The bars on the right show the full distribution, not just the argmax. Image 5 is the same person as image 1; the model splits probability between two classes once the appearance changes.

team member 1 — softmax over a class vocabulary — argmax wins