Is there any way for us to flag recurrent or common problems with the AI ID? Quite a few Melaleuca (Bottlebrushes - Myrtaceae) are being identified as Banksia ericifolia (Proteaceae): even though they dont look much alike, although the AI seems to think that they do.
((on second thoughts, perhaps I should visit the Melaleuca viminalis observations and check if they are not perhaps incorrectly identified and thus affecting the training of the AI?))
Are you requesting a warning flag on taxon pages for taxa that have computer vision identifications that are frequently manually corrected? A label on IDs? Or are you asking for a way to manually flag taxa that get misIDed frequently to alert other identifiers? (Or something else?)
What are the short term goals of the proposed flagging system–to reduce IDs from occurring in the first place, attract identifiers to help fix them, both, something else?
Some ways people have dealt with it (on an individual basis) is to assemble a list, whether mental or written, e.g. @treegrow’s List of Computer Vision Traps, or to use the existing flagging system to discuss and attract identifiers to help out, e.g. Trombidium holosericeum.
I am requesting a way of being able to add a flag when as an identifier I note a common recurring misidentification that should not be occurring. Not an ambiguous or understandable one (such as indistinguishable species, or user confusion) - one most likely due to bad training or inadequate material used in training (or perhaps a bad algorithm).
How it is used after flagging - for instance to alert future users - is not an issue I had thought about or entertained.
I was more thinking about it being used for future machine recognition training sessions - including checking the current IDs before training (to prevent “tautologous” IDs), checking that there are adequate pictures for training in both taxa, and checking for possible glitches in the training algorithm.
I can ask our computer vision team about this.
I wonder if something like this could be done just based on ID statistics already in the data? Analyze how often CV identifications end up not being the community ID for each taxon in CV, pick a threshold along the resulting “bell curve” (or whatever shape it manifests), and “flag” in some way(s) those taxa with especially unreliable CV results. Maybe a little “caution” flag when displayed among CV suggestions. And maybe a filter parameter to support/inform the manual flagging option that @tonyrebelo has in mind? Just thinking out loud here…
As you say, this should be in the data already. ie, 51% chance it’s this, 49% it’s that, between the two similar organisms. It shouldn’t be too hard to detect and flag those situations. That is even without taking into account the eventual Community ID. As usual, the devil is in the details. What are the criteria that determines when the warning appears, and how does that work if there’s 3 or more similar species, like in a mimicry complex, etc., or under other odd circumstances?
Edit …or do you just present the confidence percentages and let the identifier decide what they mean?
@mtank - One simple and unobtrusive way you could make the confidence data available would be to add it as a title attribute to the “Visually similar” span in the suggestion interface, so that if someone hovered over it, it would show “89% confidence” or whatever:
<span class="subtitle vision" title="89% confidence">Visually Similar</span>
I love the idea of seeing some indication of certainty from the computer vision. I am often amazed at how well it does, and occasionally amused at how poorly. when entering observations I routinely ID with a suggestion even when I know what the species is, occasionally I see it getting the result wrong. and it would be interesting to see just what the odds were for each of the alternate suggestions.
I’m going through some old feature requests, just wanted to note that this one was suggested:
https://forum.inaturalist.org/t/computer-vision-should-tell-us-how-sure-it-is-of-its-suggestions/1230
I checked with the staff on this one and it doesn’t look like they’re going to move forward with adding a different kind of flag or alert system for computer vision training errors. I’d recommend adding species here: https://forum.inaturalist.org/t/computer-vision-clean-up-wiki/7281 and using the existing @ mention and flagging system to get help with people reidentifying incorrect observations, which in turn helps retrain future versions of computer vision.
Ken-ichi had a nice post yesterday partly about computer vision and the thresholds for accuracy/number of training photos. Might be a good place to ask some follow-up questions: https://forum.inaturalist.org/t/identification-quality-on-inaturalist/7507