EDIT: I just went ahead and added it in. I can take it off if I misinterpreted the rules.
I’m not sure if this would fit the criteria exactly, but should I add Pennisetum clandestinum to the list?
I ask since it’s a CV misID, but it often jumps drastically in number when students use iNat to ID grass, which means it’s spreading as a suggestion every time some tries to ID grass species. I can’t quite figure out which IDs are legitly this species and when they’re not, but i do know it can’t survive temperatures that are more common in northern climates.
Some time ago, it was split up into several genera - Haworthia, Haworthiopsis, and Tulista. However, many species split off as Haworthiopsis are very commonly grown as houseplants. So it looks like CV tends to over-suggest Haworthia, and it might also be affecting the wild observations too.
Just wanted to say that I really appreciate all of the work that goes into this wiki issue.
It’s been an article of faith for those of us who train these vision systems that the easy answer to failed predictions/suggestions is simply more data. I’m not totally sure how well placed that faith is, but it’s broadly been true for vision suggestions at iNat over the past few years.
At any rate, as we continue to train a new model, I’m starting to think about how to evaluate our current model and the new model using the taxa on this list to try to understand things like: did we increase our dataset size on these problem taxa? Did that image count growth correlate with an increase in our suggestion accuracy? If not, can we figure out why?
Thanks for the hard work. I’ll keep the forum in the loop once the new model is finished training (a few months away) and we can evaluate the new model, and hopefully we can show some progress.
Kashubian Buttercup, Ranunculus cassubicus, has basal leaves very different from stem leaves. The computer seems to be confused, suggesting this species for plants as different as Garlic Mustard and Goldenrods. I cleaned up the North American records; the actual species isn’t known from this area. I’m not sure this is a common problem, but it is a problem.
Great to hear that the model is being updated soon!
I think the issues with problem taxa highlighted on this thread aren’t necessarily just the result of suggestion inaccuracy by the model though. I think they are also dependent on the current UI design.
For example, on complex taxa, the “pretty sure” of model suggestion might be 100% correct and places the observation at genus level (where it belongs without a more detailed image)… but nevertheless, the observer opts to take the top species level ID instead, as they know no better.
e.g. Like on this observation.
This is also coupled with the issue mentioned here.
The problem taxa tend to be high up on the list of species-level suggestions when the model isn’t “pretty sure” of anything whatsoever. Ideally, if its not pretty sure of anything, the UI should present this as a red flag (or autosuggest a very coarse ID) not make things worse by still suggesting something wildly specific.
Perhaps small alterations in the design of the UI around the autosuggest might go further towards fixing many of the taxa on this thread than the development of the model itself.
Though in many cases system can’t decide what is best, it doesn’t show a genus to choose from, only a list of species, and sometimes top one is right, but often all of the mwrong or right one is in the bottom. Maybe adding an additional line appearing like “we are not sure what it is, id it your way” would help.
This is a good suggestion. This should say and spell out that “You should check this suggestion before you use it”. And it should explicitly say how to check it. However, as it is most users just click on the first thumbnail that “looks right”.
Yes… this sort of statement is already a feature on the iOS app I think, so at the least should be replicated on the main browser version.
I guess the problem is, explaining to the average user how to check complex taxa is not so simple in a few words. I came across an identifier this week with thousands of IDs even, many of which were noted as been used as they were visually similar to an autosuggest, resulting in many incorrect and maverick species-level IDs. At least two of the Diptera species are on the list in this wiki…visual similarity just doesn’t cut it.
Perhaps the use of an autosuggested species-level ID could trigger a prompt of some sort, checking if the observation actually shows the diagnostic features.
As so few of the most common Diptera are possible to ID from a quick visual comparison, I wonder if it would be better in some ways if there were no species-level autosuggestions whatsoever in complex taxa. Or at least, if they were more hidden from novice access.
Hi, I know Lupinus nanus and Lupinus bicolor are difficult for the computer vision to tell apart, but I am unsure if there are over 100 observations mis-identified due to this, but I guess probably. I stopped fighting the fight a few years ago after being swamped and burned out on correcting them. So, maybe they should be on the list, I just don’t know without looking through 2000+ observations.
Referring to this taxon: Blue-bordered Pedunculate Ground BeetlePasimachus elongatus
This species has a number of records from the Eastern/Southeastern US here on iNaturalist. I believe that is a systematic error based on over-specific ID’s originating in computer vision suggestions.
Species ID in this genus is quite difficult. Regional checklists and references are useful to narrow down possibilities. References I have do not show P. elongatus as occurring in the Eastern US. BugGuide shows records from the Midwest and Northern Great Plains up into Canada.
Bosquet gives the distribution of P. elongatus: “This species ranges from the southern part of the Prairie Provinces south to northern Sonora, western and northern Texas, and southeastern Louisiana, east to Indiana and southwestern Michigan.”
Ciegler, for instance, lists several species of Pasimachus for South Carolina, but not P. elongatus
There are several similar species in the East/Southeast, and they are difficult to differentiate without very detailed images and/or examination under a microscope. The iNaturalist computer vision/artificial intelligence system is not able to make distinctions among species in this group, which are all rather similar in habitus and somewhat variable. I think the eastern P. elongatus listed here on iNaturalist are simply over-specific ID’s based on the computer vision suggestions. It happens with these difficult groups of species!
I have corrected/commented on research grade observations from North Carolina. (Edit) Others might want to go and make suggestions on other observations that are clearly out of range here. (End edit.)
References:
Bousquet Y. Catalogue of Geadephaga (Coleoptera, Adephaga) of America, north of Mexico. Zookeys. 2012;(245):1-1722 (p. 396). https://doi.org/10.3897/zookeys.245.3416
Ciegler, Ground Beetles and Wrinkled Bark Beetles of South Carolina (Clemson University Press, 2000), pp. 10, 36-38
I would prefer this to be closed. Someone can create a new topic if they would like. Thanks!
Edit to add: I just think that since the model is updated so frequently now it’s not worth having a separate topic like this which gets stale and hard to keep up with.
I think flags (or journal posts) work fine as discussion locations!