There’s a misidentification issue which seems to arrise among certain Ichneumon wasps: While Enicospilus is common and frequently recognized by CV from past training sets (particularly the common species E. pergatus), the similar genus Ophion has few/no species which are identifiable to species level and none that CV has been trained on. But there are now well over a thousand RG observations of Ophion at genus level which have been appropriately marked “as good as it can be”. Yet CV still doesn’t offer this genus as an ID suggestion.
Q: Is CV ever trained on genus-level RG observations? I suspect the answer is “no”. If it was, by now it would seem that it should have been able to distinguish Enicospilus and Ophion.
Ophion peregrinus got learned, so the CV unlearned the genus, The only way to really get it to appear again outside of New Zealand is to get more species in, or remove Ophion peregrinus from the CV by identifying it to genus.
The CV can technically be trained at any taxa level. But it is only leaf taxa.
Examples from just Chironomids
Diamesinae
Harnischia Group
Glyptotendipes
Axarus festivus complex
Tanytarsini
Axarus rogersi complex
With the current system the CV uses, it’s important to realize you can actually reduce accuracy by identifying if you only ID one species getting it in.
Observations do not have to be RG to be used in training the CV, so genus-level observations are potentially eligible regardless of whether they are RG or not.
However, there are rules about what taxa qualify to be included in the CV. If there are no species in a genus that meet the minimum requirements (number of photos) for training, it may be trained on a higher-level taxon instead. But once a single species in the genus is included in the CV, it will no longer be trained on the genus – this means that on several occasions it has “unlearned” a genus that it was previously able to recognize.
This is a huge problem for those of us who ID taxa that often cannot be identified to species from photos, resulting in exactly the situation you are describing. This means that e.g., for several European bee genera, it suggests the completely wrong genus and sometimes even the wrong family a substantial percentage of the time. I spend a significant proportion of my time correcting bad CV IDs, which is extremely frustrating. I suspect the situation is even worse with ichneumonids.
I’ve noticed this is a common problem with spiders as well. There are genera that have dozens of species but only one or two species distinct enough to be IDed to species from photographs. So any observations of the more “generic” species will not be offered the correct genus ID. I think insects and arachnids need a different CV training model due to this problem.
I unfortunately don’t have much hope things will improve a ton anytime soon. The issue is fundamentally with the system itself and that you can not escape the fact that some taxa have only a handful of readily IDable species, or even none at all.
this is a problem in fungi as well. some genera have members more identifiable than the groups mentioned above, but the rare ones are swamped out by computer vision suggestions of the single common species as soon as that makes it into the training (e.g. Blumeria was once a suggestion but now only B. graminicola is a suggestion; in fact the other species, with orangey or brownish mycelium, are equally likely to be misidentified by the CV as Puccinia instead). I haven’t thought much about this issue but I’ve been aware it’s causing problems for a while. smaller, ultra-diverse groups of organisms like arthropods and fungi are inevitably the most problematic taxa for naturalists attempting to identify with high specificity from often low-detail field photos, but this certainly can compound the challenge.
I think changing this could be one of the most beneficial potential improvements to the CV system
as it would
give users a much more “realistic” suggestion for IDing (especially if the probability of the genus is higher than the species, which it should be - the lowest they should ever be is tied)
reduce “overconfident” misidentifications and
reduce “feedback loops” that reinforce “overconfident” CV suggestion IDs (which require massive identifier effort to fix, in the cases where this is even possible).
The potential cost is that some observations where the CV species suggestion is correct would require another ID to reach RG, but I think that is worth it.
That said, while I am not an expert in the CV model, it does seem like it would require some fundamental changes to address. @alex any thoughts?
Yeah, for a select few species. I am the only large main IDer of Chironomids on the site. I do not want to deal with destroying the CV suggestion for some select taxon.
This perfectly illustrates how big of an issue this is. I as an Identifier on a wildlife observing citizen science organization, should not feel like i cant ID something or else an AI will cause me much more work in the future from misidentifications. It is ridiculous when you think if it like this because it is. But that is the system we have.
Make a feature request? Asking if certain genera can be elected for inclusion in the CV training? I mean, a genus where this is a serious problem would be flagged, the problem explained, and then a curator could add it to the mix for CV training. Not all genera would get included! If the programming can be changed so this is done, it would help.
Interesting. Do you write the species as a comment, or just skip it entirely?
I totally see where you’re coming from…but yes, it’s a ridiculous position to have to be in haha
I am pretty limited in how much I could contribute to anything much here in Iceland probably
But it would be good to have a log of species like this where more obs were needed to counteract a single species dominating the training pool.
This always seems to me like something that should be addressable in the design of the model though in theory…
At the very least, again it just supports the notion that for me the autosuggest dropdown should only really operate at a genus and higher level …even if species level suggestions are available somewhere for more experienced users or in a different space to the upload portal.
Species level IDs are rarely reliable at present…and just cause more issues in complex taxa where there are anyway fewer identifiers. Species level autosuggests just exacerbate issues on the whole it seems. Perhaps that’s hard to quantify though.
Alternatively, because of the way this system works. Taxonomy if available can help create a solution. If there is an Ophion complex anywhere for Europe, North America, and its not on INaturalist. Adding it (if it exists) could allow a complex to be learned. (If identified to complex) Assuming the problem species isnt in that complex. This is one reason why complexs are important for inaturalist.
For example, all Chironomids
In CV now
Axarus festivus complex
Axarus rogersi complex
Chironomus striatipennis complex
Chironomus decorus complex
Harnischia generic (genus) complex
On track to be trained
Chironomus australis complex
Chironomus plumosus complex
Polypedilum fallax complex (will get unlearned when species in it get learned)
Yes, but the taxa for which they are reliable lean towards the ones where we have adequate identifiers to support identification anyway. This inordinately affects complex taxa and countries where there are less users.
It’s the inverse of what we need support for.
As such, I’m not convinced the cost-benefit is warranted across the entire system.
I see the point you’re making here. But, (a) this is an explanation of what happens and not a solution, and (b) this CV behavior is simply undesirable. Your suggestions for “solutions” to the problem are unfortunately untenable. IF CV can’t ID an observation as one trained taxon in a genus of several untrained/untrainable taxa, it should default to the genus level and not offer a completely different genus!
What? Thats not how the CV works. It trains the lowest leaf taxon that is eligible for training, any taxa above a leaf taxon that is learned is forgotten.
In this situation.
You can make the CV unlearn that species. This will let the CV relearn the genus.
You can ID other species anywhere in the world and get them in the system. This will allow the CV to get an idea of what ophion looks like in general allowing in a roundabout way it to suggest the genus at the very top suggestion (“we are pretty sure its in x”).
Add a complex if avaliable and ID that to train it. Assuming the problem species isn’t in it.
Advocate to staff to change the system, maybe even create a campaign of awareness to get a change.
As it stands, it is not possible for the CV to learn the genus since a species of it has been learned. Only in a round about way can you get the genus to even show up in CV suggestions outside of New Zealand.
This is why i say sometimes not IDing a species is the better alternative. Here, a species endemic to New Zealand has been learned. It is probably IDable because its on a small island. This is good for New Zealand. But the rest of the world suffers. Ophion is now just not really suggested anywhere else but NZ. Despite being more commonly seen in other countries.
The CV is both an amazing and terrible system. While i’m not staff, so can’t be called an expert on the CV. I have spent a lot of time learning to understand it and how to influence it. So far the CV has gone from knowing just 7 Chironomids to 63 in a year with many more on the way. The biggest delay in adding new Chironomids is actually the lack of observations of taxa i have set as targets. This amount of success has required understanding the system and developing strategies dealing with it.
To put it another way, as identifiers, our main job is not to optimize the CV’s training data, but to provide the best IDs that we can. When the CV can help with that, great, but it was never meant to replace human reviewers.
Not just insects and arachnids but also plants and probably many other taxa.
Maybe it could be revised. Every now and then, the algorithm would be trained ignoring “leaf taxa” higher than family or genus, for example, and the user would have the option to get suggestions at that lower level instead of the highest available level.
If the algorithm (or its implementation) is ever revised, it would also be nice to make it learn from manual ID corrections made by users: if observations initially identified as belonging to certain taxon X have their IDs often corrected to taxon Y, for every new observation that looks like X the algorithm could mention that it “might also be” Y (perhaps in a separate tab in the interface), even if taxon Y is not yet recognized by the algorithm. This could have a huge impact in making users more aware of taxa that are perhaps very common but not yet recognized by the CV algorithm.