For evaluating the vision model alone and the automated suggestions as a whole, we are comparing with the observation taxon of the observations we’re using to test, not to any “expert” standard. Observations in the test set for vision and the test set for the whole system are drawn from Research Grade observations and observations that would be RG if they weren’t captive (RG+Captive for short).
Sorry, bit of jargon: “iconic taxa” are higher-level taxa covering “iconic,” hopefully recognizable swaths of the tree of life. Basically these things you see in obs search:
Lots of problems with this concept, but it serves for differentiating stats like this (e.g. bird accuracy might be different from mollusk accuracy).