I’ve seen versions of this question before but not quite the answer I need.
iNat’s AI suggests “Crome Sphagnum” for nearly everything it can perceive as Sphagnum (peat moss). Crome Sphagnum is just one of several hundred Sphagnum species, but we now have about 2000 Sphagnums identified as Crome Sphagnum, and nearly all of them are wrong.
I’m willing to go through them and over time bump them back to just “Sphagnum,” but meantime it would be nice if the AI would stop doing this for new observations. Is there a way to get it to quit?
I think this is a particular example of a general phenomenon, and the underlying cause of most of the criticisms of CV. When species are difficult to distinguish in images, CV in effect just recommends one of the most commonly seen species, or at least one of the names most often used within the iNaturalist data. The user sees that it looks “the same” and thinks it’s the right ID, and now there’s one more observation that entrenches CV in the same error. It’s a feedback loop that entrenches wrong IDs.
iNat’s AI works really well with some taxa, but not so well with others. It is basically trying to sort out things that the human brain knows intuitively. With moths, I see a lot of Peridroma saucia ID’s that are likely picked off the top suggestion. The AI is an interesting feature, but I believe (as I have stated many times in the past) that it is up to the ‘confirmer’ or a researcher to ensure that ID’s are correct. Unless the AI becomes a lot ‘smarter’, I don’t think it is possible to stop incorrect ID’s. I know nothing about moss, so if I did have an observation, I might be tempted to stick in the first suggestion and let others sort it out.
One solution might be to cut CVs feedback loop. For instance, if taxon X was chosen from the CV selections, that observation can never go into the CV training pool for that taxon. This would kick out some good training data along the way, of course, and may not be feasible in terms of data management / programming… but basically we’ve figured out how to engineer confirmation bias into CV, and it’s going to keep making the same errors so long as this is the case.
Similar problem with Usnea longissima. AI thinks all long Usnea are the moderately rare, relatively distinctive U. longissima (and so do some individuals, independent of AI). So there are lots and lots of wrong U. longissima photos, and they’re used to train the AI, so . . . . My attempts to clean this up have made only a small dent in the problem so it persists, so next time the AI is trained, it will learn that all long Usnea are U. longissima . . . .
It would be nice if the computer could be forced to stop making any suggestion for these known problem taxa, for at least one round of AI training. (With people encouraged to clean up the problem in the interim.)
This works really well. I have spent a lot of time going through plants which were ID’d wrong by the CV and adding a disagreeing Dicots ID and now those plants don’t need constant fixing since the AI has fixed itself when it was retrained.
I’ve noticed a lot of people are afraid to add disagreeing ID’s. Don’t be! It’s absolutely required to clean up these CV messes.
Aha, good question. At least currently, it looks like the actual suggestion is “Sphagnum.,” with Crome Sphagnum heading the list of visually similar. I think that is different from in the past. IF that’s true, I shouldn’t blame the AI for a missed ID on the recent submissions. That’s encouraging!
Yes, I like this ides, @sedgequeen. A temporary hold would probably solve this general problem. As I’m figuring out from the questions, this particular instance may be healing itself through time. I just have to go back and correct those 2000+ mistakes from the past, ha ha.
This is a good point, that there just need to be some other species with more than 50 RG /100 total. I think we may be getting to that point soon. We have one super active identifier (not me!) who is building our visual knowledge of Sphagnum fast. Thanks for the insight.
The only real option here that isn’t tilting at windmills is functionality change by the site to not recommend species level identifications for certain things.
Yes, you can add dissenting identifications but it will stop being impactful once there are 100 legitimate, accurate records, then it goes straight back into the list of suggestions.
Even a functionality change to not recommend taxa A will likely have limited impact as people will still use the CV, and just pick whatever is suggested from the list with or without that taxa being there.
I actually prefer it when the CV takes me to genus or maybe just family but not to species since the species ID might be tough and often wrong. For plants, I prefer a human to weigh in in the species ID. At least if I have fairly reliable genus level ID I can do my own homework to propose a likely species ID based on RG records from my area.
If we’re relying on annotations to correct the situation (IMO, not the best solution to a systemic problem, but perhaps a reasonable expedient), I think it would be helpful to have an option to annotate observations en masse. @egordon88’s offer to knock observations back to genus in places where Sphagnum squarrosum does not occur made me think–well, the proper way to do it is to upload a polygon for the range of Sphagnum squarrosum, select all observations identified as Sphagnum squarrosum that do not intersect that polygon, and annotate them all as Sphagnum. Obviously this would not be something we would want to allow the average user to do willy-nilly and if we dont’ have a decent range map for Sphagnum squarrosum it wouldn’t help us in this particular case. If we can help it, though, we shouldn’t have people spending hours and hours going through thousands of misIDs manually when there’s a better solution that could be applied in minutes.
It may be that the taxon split functionality could be kluged into filling this role, though this is not a good solution for some of the same reasons that iNaturalist’s handling of taxonomy is not a good solution in general…