iNat’s AI works really well with some taxa, but not so well with others. It is basically trying to sort out things that the human brain knows intuitively. With moths, I see a lot of Peridroma saucia ID’s that are likely picked off the top suggestion. The AI is an interesting feature, but I believe (as I have stated many times in the past) that it is up to the ‘confirmer’ or a researcher to ensure that ID’s are correct. Unless the AI becomes a lot ‘smarter’, I don’t think it is possible to stop incorrect ID’s. I know nothing about moss, so if I did have an observation, I might be tempted to stick in the first suggestion and let others sort it out.
One solution might be to cut CVs feedback loop. For instance, if taxon X was chosen from the CV selections, that observation can never go into the CV training pool for that taxon. This would kick out some good training data along the way, of course, and may not be feasible in terms of data management / programming… but basically we’ve figured out how to engineer confirmation bias into CV, and it’s going to keep making the same errors so long as this is the case.
I see the same thing in the UK with red Russula, every one becomes Russula sanguinaria. I think just picking the top rec species because they will change to whatever you say anyway
Similar problem with Usnea longissima. AI thinks all long Usnea are the moderately rare, relatively distinctive U. longissima (and so do some individuals, independent of AI). So there are lots and lots of wrong U. longissima photos, and they’re used to train the AI, so . . . . My attempts to clean this up have made only a small dent in the problem so it persists, so next time the AI is trained, it will learn that all long Usnea are U. longissima . . . .
It would be nice if the computer could be forced to stop making any suggestion for these known problem taxa, for at least one round of AI training. (With people encouraged to clean up the problem in the interim.)
Fix them and keep a handle on the taxon until the next CV model is trained so it’s not included.
Add enough observations of another species from the genus ( 50 RG /100 total ) so that the CV model realises there are similar species.
Well, you post them in wiki (https://forum.inaturalist.org/t/computer-vision-clean-up-wiki/7281), then a knowledgable person should go and check all those obs. Mosses and fungi are just one pile of wrong ids, if you are willing to change it, it’s possible by adding correct ids and then waiting for new cv model.
This works really well. I have spent a lot of time going through plants which were ID’d wrong by the CV and adding a disagreeing Dicots ID and now those plants don’t need constant fixing since the AI has fixed itself when it was retrained.
I’ve noticed a lot of people are afraid to add disagreeing ID’s. Don’t be! It’s absolutely required to clean up these CV messes.
Aha, good question. At least currently, it looks like the actual suggestion is “Sphagnum.,” with Crome Sphagnum heading the list of visually similar. I think that is different from in the past. IF that’s true, I shouldn’t blame the AI for a missed ID on the recent submissions. That’s encouraging!
So it’s those pesky users?
If you send me a few locations where Crome Sphagnum (great name by the way) definitely doesn’t occur, I’m happy to knock several pages back to genus.
Yes, I like this ides, @sedgequeen. A temporary hold would probably solve this general problem. As I’m figuring out from the questions, this particular instance may be healing itself through time. I just have to go back and correct those 2000+ mistakes from the past, ha ha.
This is a good point, that there just need to be some other species with more than 50 RG /100 total. I think we may be getting to that point soon. We have one super active identifier (not me!) who is building our visual knowledge of Sphagnum fast. Thanks for the insight.
Thanks, @egordon88! I’m not sure enough of the range right now to do that, but that’s a great suggestion for quick corrections. If I figure it out I may take you up on it!
I’m happy to help too if this is a large problem (but simple enough that I can actually ID them).
The only real option here that isn’t tilting at windmills is functionality change by the site to not recommend species level identifications for certain things.
Yes, you can add dissenting identifications but it will stop being impactful once there are 100 legitimate, accurate records, then it goes straight back into the list of suggestions.
Even a functionality change to not recommend taxa A will likely have limited impact as people will still use the CV, and just pick whatever is suggested from the list with or without that taxa being there.
I actually prefer it when the CV takes me to genus or maybe just family but not to species since the species ID might be tough and often wrong. For plants, I prefer a human to weigh in in the species ID. At least if I have fairly reliable genus level ID I can do my own homework to propose a likely species ID based on RG records from my area.
If we’re relying on annotations to correct the situation (IMO, not the best solution to a systemic problem, but perhaps a reasonable expedient), I think it would be helpful to have an option to annotate observations en masse. @egordon88’s offer to knock observations back to genus in places where Sphagnum squarrosum does not occur made me think–well, the proper way to do it is to upload a polygon for the range of Sphagnum squarrosum, select all observations identified as Sphagnum squarrosum that do not intersect that polygon, and annotate them all as Sphagnum. Obviously this would not be something we would want to allow the average user to do willy-nilly and if we dont’ have a decent range map for Sphagnum squarrosum it wouldn’t help us in this particular case. If we can help it, though, we shouldn’t have people spending hours and hours going through thousands of misIDs manually when there’s a better solution that could be applied in minutes.
It may be that the taxon split functionality could be kluged into filling this role, though this is not a good solution for some of the same reasons that iNaturalist’s handling of taxonomy is not a good solution in general…
What do you mean we need annotations for that? We need to reid them, not annotate them.
As written before, there are groups of taxa that are practically impossible to be identified with photos. This is mainly because we cannot expect that users would collect a sample a make photos at the microscope or because some genera are so complicated to require a specialist for their identification.
One simple solution would be to automatically warn users that the proposed id is likely wrong and that it should be taken just as a suggestion. Alternatively, another possibility could be to keep the cv from identifying at the species level but to limit the id to the genus level or to a higher rank depending on the complexity of the subject.
I do not see anything bad in warning users that what they photographed is something complex. On the other hand, we have many beginnners who are convinced to have photographed a certain taxon while it is not so.
I agree that knocking a lot of observations back to genus level is tedious, but I think I’d be wary of doing it in bulk fashion, for several reasons. (1) I’ve encountered a few “Sphagnum squarrosum” that look awfully good even though in the “wrong” place. I’d prefer to leave those as is, with a comment that they appear to be way out of range. We may learn something. (2) I keep an eye out for observers who are way better botanists than I am and may be right where I’m wrong. The bulk approach would miss them. (3) “Corrections” generate replies. By doing just a few each day, I’m not flooded with comments and questions (or criticisms) the next day, but get enough that I can handle them. I guess I feel each observation at least deserves a look. But thanks for your creative ideas.
Sorry, I forget that “annotation” has another meaning here. In the herbarium world, if I add an identification to a specimen, that’s an annotation.