I’ve got a quick question for everyone. Does anyone know how the percentages for the AI work on the new Inat app? They show up before an observation like (98.4%) and list many species. They seem frivolous and randomly generated.
why would they be randomly generated?
isn’t this just a numerical display of what used to be that five-dot visualization? it’s the same measure, some sort of quantification of certainty. I certainly couldn’t tell you how it works since it’s basically a machine-learning program (not my expertise at all) but I had the same question as above: why would you assume they’d be random?
Updated the title to “percentages”
screenshot, please
The numbers appear to be pulled from a RNG with higher odds near 100%
I used ten random observations of plants and got (99.7, 90.9, 69.6, 99.2, 99.8, 89.1, 99.8, 99.8, 98.2, 72.1, 94.6). The 89.1% happened to be incorrect, while one of the 99.8%(a Venus flytrap with no other species in photo) doesn’t make sense why it wouldn’t be 100%.
I use statistics on a daily basis for Inat and other projects and it doesn’t really make sense why percentages are given to the tenth decimal place unless the AI is giving a range(likely correct between 84% and 97.7% and arbitrarily assigning a random value).
I probably am wrong but it appears that way.
In this example this Trillium erectum is given a 29.9% chance of being correct while the genus is 84.2% likely(exactly 1 standard deviation away).
for any given observation the entire list of computer vision suggestions at a species level (not always returned if they are below some threshold) will add up to 100%. the higher-than-species suggestion is a sum of all the descendant species-level suggestions.
in this case, if you add the numbers for the species-level suggestions returned to you, you get 84.0%, which is just below the 84.2% genus-level number. (the excess 0.2% would be associated with other Trillium species not shown in this list.)
I am aware of that, I would like to know where the percentages come from. Like how does the AI get 84.2% confidence and not 100% or a range.
it’s a visual match score, weighted by a geomodel score and normalized back to 1.0 across all scores.
you’re unlikely to ever get 100% on any given species because there will always be other species that have similar features. (since the total must always add up to 100%, you, you’re going to almost always end up below 100% on any given suggestion because the total pie will almost always have more than 1 slice.) you won’t get a range because each species gets a specific score.
Maybe it’s less confusing if we think about how CV (computer vision) selects its suggestions.
As I understand it (and maybe with some gross simplifications), iNat submits one photo via the API, which is scored against the pre-built identification model. The model returns a ranked list of up to 10 weighted suggestions of possible species matches. The list doesn’t go beyond 10 suggestions as far as I’m aware. There’s probably a minimum similarity threshold to prevent it returning really unlikely matches; this explains why you sometimes get a list with less than 10 suggestions.
The items in the list do not have equal weight and in fact the CV algorithm ranks them in order of confidence and gives each a score, which is apparently calculated as a percentage of all suggestions. This shouldn’t be mistaken as a “percentage probability” because (a) almost always there will be species that CV doesn’t know about, and (b) any real probability figure would need to be based on comparison to photos we assume are accurate, which is a much higher bar than the ID standard for training data.
But using a percentage scale is kinda intuitive (we have a gut feeling that 95% is a lot more confident than 25%) and it also normalizes the scoring so you can use the number more easily. The weird thing is this appears to be a percentage of “all suggestions”.
If iNat had some way of estimating a genuine probability figure, then citing a range might make sense; but these are just relative scores out of 100% total, so quoting a range would have even less meaning.
I used ten random observations of plants and got (99.7, 90.9, 69.6, 99.2, 99.8, 89.1, 99.8, 99.8, 98.2, 72.1, 94.6). The 89.1% happened to be incorrect, while one of the 99.8%(a Venus flytrap with no other species in photo) doesn’t make sense why it wouldn’t be 100%.
So, out of the 11 scores you list, iNat CV was extremely confident with it’s #1 ID for seven of them, all of which apparently, you agree were correct. It was very confident for two others, one of which you believe was incorrect (was the #2 or #3 suggestion correct?). And CV was moderately confident for two more, both apparently correct.
Overall the accuracy rate of #1 suggestions from your sample appears to be ~91%, which closely matches the accuracy reported when each model is released. For the correctly identified Venus flytrap, CV thought some other species (maybe a sundew?) was very slightly similar, but still ranked it 500 times lower than Dionaea muscipula—that seems pretty reasonable to me.
Thanks! I hope there’s a way that the percentages could be shown on the website as well for identifying.
since the apps are used much more than the website to record observations nowadays, i think website development gets lower priority, but there is an existing request for this: https://forum.inaturalist.org/t/computer-vision-should-tell-us-how-sure-it-is-of-its-suggestions/1230.
in the meantime, sessilefielder has created a Chrome browser extension that visualizes the scores as colors (among other things): https://forum.inaturalist.org/t/chrome-extension-showing-computer-vision-confidence/35171/8.