Just wanted to thank all of the ID’ers on this thread! This is inspiration to me that my IDs matter.
Also, maybe this thread highlights one of the (inherent) flaws in iNat: range creep. As long as CV suggests a species, some percent of observers will pick that, and some percent of those will be confirmed to RG. Then the CV and geo model learn from those observations and start suggesting the erroneous species farther and farther afield.
Maybe this flaw is also related to iNat’s tendency to “lump” observations under common taxa. If there are, say, several rare species and one common species that look similar, at least some of the rare species will be ID’d as the common species, and those IDs will be used to train the next model…because there are so many more observations of the common species, over time the CV will learn to prefer it over the rare ones…
This can be compounded when a species is frequently misidentified. The native Plantago rugelii and the invasive Plantago major look so similar that most published floras have conflated them as P. major. Early this week, I observed what I believe is P. rugelii. The CV suggestions for “expected nearby” did not include Plantago rugelii but put Plantago major at the top. When I selected “Include species not seen nearby,” Plantago rugelii appeared above Plantago major, suggesting that it is a better match morphologically but was discriminated against because of a lack of nearby identifications. Maybe I need to go through “Plantago major” observations in my area and see how many of them are really P. rugelii.
Actually it depends. Under certain conditions, the CV training can lead it to suggest rare species rather than common ones. This is because it compares photos, not the actual traits of the organisms depicted in those photos.
It suggests rare bees all the time when the photo is in fact a common species.
There are many bee species which only collect pollen from specific plants (oligolectic); they are usually fairly rare compared to generalist species because they are tied to a specific plant. The CV also tends to learn the host plant association because photos of these bees are frequently taken while they are visiting the flowers of these plants.
The problem is that most plants that are visited by rare oligolectic bees are also visited by common generalist species as well. Because they are generalist, the photos of these species include a large variety of flowers being visited, and for really common species the CV only includes small percentage of the total observations, so it does not learn to recognize the generalist species when they are visiting the host plants of oligolectic bees – these photos are numerically overwhelmed by all the other photos where the generalist is visiting different plants.
So when the CV looks for a match in its training set, it will suggest the rare bee if the flower being visited is a certain color, because it finds a stronger correlation here. It cannot assess whether a species is common vs. rare.
This isn’t limited to bees. I believe one reason that hybrids were removed from the CV was because it was frequently suggesting hybrid birds instead of the far more common non-hybrid species.
6 months after first starting this thread, I’m happy to report that my 2 thousand or so identifications of this group is no longer suggesting Palaearctic taxa when working in the Identify module. Still many to go, but the CV is improving, it appears, and observers are actually often picking the right taxa now.
HOWEVER: when not in the identify module, and just entering a name in the “Suggest an Identification” line to make an ID, say “Donacia”, the name suggestions popping up are still all Palaearctic and I have to actually type in the correct Nearctic taxon name till that pops up so I can select. Any ideas on this related but different issue? Will this silly result go away in the fullness of time?
BUT: if I click the “Compare” arrows, the suggestions popping up are correctly Nearctic.
Yes, human IDers have biases as well. But I don’t see the relevance in a thread about CV suggestions.
(I can’t say I’ve encountered IDers as having a fondness for suggesting hybrids at any opportunity that you seem to have experienced. Maybe it depends on the taxon – I mostly ID arthropods and plants. For the former, hybrids are virtually non-existent, and for the latter, apart from hybrid cultivars, I can think of a handful of cases in several thousand observations I’ve interacted with where there has been discussion about whether the observation is a hybrid or not, and always in connection with cases where there is a known, named hybrid that is relevant.)
I think a thread about CV is mightly pertinant. It’s how the whole thing works! Since I’m not coming from a tech-wizard background, I struggle to understand how these things work. I appreciate the feedback and explanations a great deal eh!