then the main unresolved problem would be the problem of the situation where the real answer isn’t actually in the list. hopefully those kinds of situations would more often look more like your bottom example with lots of red than your top example above with bright green choices…
I think this is already a problem, given the “We’re pretty sure…” text (in fact, I don’t understand why that feature exists at all if what @kueda said above about the score only being good for ordinal ranking among suggestions is true). For example, computer vision is pretty sure that this golden paper wasp is in fact a Syngamia moth, and the UI is already heavily influencing people to choose that ID. What the colors would add is an addition push to select the species Syngamia florella:
The Chrome extension is published, and the code is on GitHub. You can choose between the two different display modes (sidebar vs. gradient), as well as a color-blind mode that changes the range from 0->120 to 240->120 on the hue spectrum.
I like what @psium suggested. Perhaps it can be a % similarity [to other photos of which are RG]
98% similar to [photos of] genus A
92% similar to [photos of] species X
91% similar to [photos of] species Y
75% similar to [photos of] species z
With an option to view the top ~10 ranking IDs in the Identotron on a separate tab, this will also show where the nearest observations have been to try and eliminate Australian species suggestions for African observations.
I’m new to this community, so I won’t weight much here, but I think it might be a bad idea.
We have some Formicidae screenshot, they are a good example of why this is a bad idea.
Currently the CV is very bad at suggesting and identifying ants. I don’t recall the algorithm ever suggesting the right species as its favorite choice. I even had “we are pretty sure it is a [genus]” on an insect from a totally different order. Also almost every black ant are suggested as “Camponotus” Because they are all black, 6 leg 2 antennae 1 head one thorax one “wasp waist” and a big gaster. Most don’t have striking pattern like butterflies or birds can have for example.
Such a feature would give a false sentiment of confidence, as stated by other members. But it is already the case with “We are pretty sure it is”.
But worse, a red highlight would give the impression that it is unlikely the correct ID is actually among these “uncertain” suggestions.
Simply put : it’s a confusing system, especially for beginners.
yes. i think that’s exactly why showing the underlying scores would be enlightening – because the #1 choice in a list of bad choices is not the same as the #1 choice in a list of good choices. so if you can see the actual scores, you have better insight into whether you’re being presented good choices or if you’re being presented bad choices.
just for example, here are 3 of my own ant observations, along with the actual computer vision scores:
our top suggestions (combined score, vision score):
(64.3, 61.3) Shimmering Golden Sugar Ant (C. sericeiventris)
(7.0, 1.8) Eastern Black Carpenter Ant (C. pennsylvanicus)
(6.9, 7.3) Giant Turtle Ant (Cephalotes atratus)
(4.0, 4.3) Eciton genus
(2.6, 2.8) Bullet Ant (Paraponera clavata)
(2.0, 2.2) Diacamma genus
(1.9, 2.0) Giant Forest Ant (Dinomyrmex gigas)
(1.8,1.9) Hairy Panther Ant (Neoponera villosa)
if using sessilefielder’s red-to-green gradient, remember that:
so most of the “top suggestions” above would be red to yellow, whereas the “we’re pretty sure” suggestion would be more green. hopefully in such cases, that would push most folks to select the green rather than the yellow or orange, if they were simply choosing blindly based on the system’s suggestions.
i also think if people could see that, say, bird suggestions tend to be very green, while, say, spider suggestions tend to be very red, then they would also be much more careful about relying on the computer vision for spiders.
or if they see two equally green birds suggestions, they might pause for a moment to consider why both are equally green before just blindly selecting the first choice.
of course, computer vision suggestions will never be perfect. there will always be mistakes, but i think showing the computer vision scores will help reduce (rather than increase) the likelihood that the community will adopt those mistakes.
Tanks, that’s interesting. I played with some observation to actually measure how it performed. I’m surprised by how good the software is at avoiding false high confidence positives which were my biggest fear. But there are still some errors. Take this recent ID : https://www.inaturalist.org/observations/52616986 . The CV predicts a Formica at 99.2%, the picture is of decent quality without any disturbing pattern in the back or foreground and I think can allow to refine up to a species complex. It is not a Formica, but a Crematogastrini, which is very far off.
I still find color code to be loaded with meaning (“red = don’t”, “green”=“go for it”), but that’s clearly a matter of taste, and I totally hear you.
the color-blind option for sessilefielder’s browser extension uses a hue gradient that goes from blue (bad match) to green (good match). blue might have less of a negative connation than red.
it would also be technically possible to do a saturation gradient, where you could go from, say, green (good match) to gray (bad match). or maybe a saturation + lightness gradient, where you could go from, say, green (good match) through light greenish-gray to white (bad match). there are lots of different ways to represent the data with color.
if you have thoughts on the best way to represent the data, whether with plain numbers, or color, or something else, please feel free to describe your preferred approach.
I’m not an UX/UI designer nor a good data vizualiation artist, however I’d be happy to give feedback.
The white/grey to blue seems very culturally neutral at first glance with the advantage of being visible by everyone. The main disadvantage is see is that the function of the blue tip left bar is not obvious compared to a “green-red” or any “good-bad” multicolor scale.
I would have assumed that the scores are the classification scores in the final “softmax” layer of the neural network. In that case, every taxon would be given a score from 0.0 to 1.0, and all scores would sum to 1.0, like probabilities.
i think that’s hard in this case because of the hierarchical nature of the taxa. suppose your had an observation of a blue jay, and the existing algorithm assigned a score of .90 for blue jay, .95 for bird, and 1.0 for animal, what kinds of scores would your assumed implementation assign those taxa?
EDIT: nevermind – alex’s response below changes the way i have to look at things…
Alex here, I’m one of the people who trains the computer vision system for iNat.
As tpollard mentions, the softmax function outputs in a format that is shaped like a probability. However, the output of the softmax function is strongly influenced by its input distribution. Given the imbalanced nature of the iNat dataset (we have a lot more images for some taxa than for others), these scores should absolutely not be interpreted as statistical probabilities.
I know tpollard didn’t suggest that they should be interpreted this way. I just wanted to make sure that the format similarity didn’t encourage someone to think about the scores in a way that isn’t warranted.
If the CV thinks it’s 75% sure, the system should probably recommend identifying at a higher level. Guessing species IDs when not sure, is not helpful in my opinion. It just frequently leads to wrong IDs which often have to be overridden by multiple identifiers if the original ID is not corrected, which all too frequently happens.
As seen in cicadas, a more frequent scenario would be that the CV is 95% sure of the top pick, yet the user picks something else presumably because they are matching unimportant details (like color rather than certain pattern elements). If they see their pick has a 1% chance of being correct, they would probably be less likely to select it.
It would improve things a lot if when the CV is pretty sure an ID is a particular species it would say so. Currently for the cicada Neotibicen superbus, which is easily identified, it just says it’s pretty sure its Neotibicen. See for example here: https://www.inaturalist.org/observations/94684158. Does the CV ever say its sure of a species? It definitely should in my opinion.
No it doesn’t, and I think that’s a good thing for two related reasons.
The CV doesn’t know all species. It doesn’t even know all species with records on iNat! It only knows the species which had at least 100 observations when the model was last trained. This means that the AI’s calculated level of certainty may be totally inappropriate, if a picture matches only one species in the training data set, but would also match 10 or 100 similar species that didn’t make it in (e.g., they may be rarely observed or hard/impossible to ID from photos).
“Pretty Sure” suggestions don’t take location into account at all. This means that these recommendations can be based on a match with a species that doesn’t live anywhere near you. Usually it’s pretty good for North America and Europe, but sometimes it’s still wrong. I can only imagine that places with relatively few observations, like South America and Africa, are much worse. I’m sure that leaving “Pretty Sure” at genus level greatly reduces the number of geographically inapproprate IDs on iNat.
Inappropriate CV-based IDs are a perennial problem here. They can flood CV-included taxa with clearly incorrect junk, and trigger-happy agreers can easily push them into Research Grade, where they get forwarded to GBIF’s database, and, critically, no longer show up to IDers by default. I understand your desire for “Pretty Sure” species suggestions, but I don’t think we’re close to being ready for that. I’d much rather have correct but vague IDs than precise but wrong ones.