CV not suggesting species-complexes

This post is inspired by another CV related post I saw recently. My identification almost exclusively revolves around ants, so I don’t know if this issue is restricted to that taxon or if it applies to others as well.

I’ve noticed that iNat’s Computer Vision tends to ignore species complexes completely - I can’t remember a single time I’ve gotten a species complex suggestion from CV. This is a pretty major issue in certain taxa, for example the genera Formica and Solenopsis.

Formica is a very large genus of ants with many diverse species, which have been broken down into much more easily identifiable species-groups (species-complexes in iNat’s database). A very significant portion of all Formica observations can be IDed to species-group/complex quite easily, but not to species. Despite this, CV will only suggest the genus, followed by a few species. The issue is, most species in the genera are incredibly difficult to identify, requiring very specific knowledge and highly magnified photos.

Certain species, either those more common or easier to identify get recommended instead, and iNat is flooded with inaccurate or unidentifiable species-level observations, such as Formica obscuripes in western NA, and Formica subsericea in eastern NA. With Solenopsis, certain complexes are very easy to tell apart, like molesta complex from geminata and saessivima complexes. CV only gives options between genus and species - the former of which has many, very different looking species - which prompts users to pick something more specific, which often ends up being inaccurate.

This creates huge pile-ups of inaccurate or non-specific observations that IDers have to sort through, as both of these genera are very common. This seems like an easy problem to fix, so I’m wondering if there’s a reason CV doesn’t suggest these (and perhaps any) complexes. Is there a reason complexes aren’t suggested? Is this an issue other people are also running into?

2 Likes

The current CV model doesn’t recognize complexes, though that could be changed if staff decided to. Species will continue to be suggested by the CV if there are enough observations to train the model.

For complexes with no known visual/range differences (e.g., Complex Narceus americanus), the goal is to roll back species-level observations such that <100 remain so those species no longer appear as CV suggestions (the CV would then provide only the genus-level suggestion; it still wouldn’t suggest the complex under the current model).

4 Likes

It does occasionally bring up Section, though I’ve never seen it on anything other than a couple of leaf miners:

https://www.inaturalist.org/observations/247541885

3 Likes

Thanks for the response! Is it likely that’s going to happen anytime soon? It would be a monumental difference for certain genera.

This is really up to staff deciding to do it. The CV models are trained fairly frequently, so complexes should show up pretty quickly if a decision was made to include them.

1 Like

At least one of the formica complexes, Complex Formica difficilis is currently a leaf node in the CV, and can be reccomended, for example it is the top suggestion for this observation: https://www.inaturalist.org/observations/225805253.

I am a little puzzled why it doesn’t say “we’re pretty sure” for, say, “Complex Formica fusca” in this observation: https://www.inaturalist.org/observations/254998839, where the CV only returns two options at all, F. subsericea and F. fusca, both in the Complex, and the returned scores for those taxa combined add up to >95%, which I thought was the standard for ‘pretty sure’.

4 Likes

In the moth world on iNat, we seem to have the opposite frustration: A great many “sections” have been created (even with unofficial status) for certain hard-to-separate species pairs or groups. The problem is, frequently the “section” name (scientific or common name) is identical to one of the included species or is otherwise indistiguishable when CV offers a suggested ID. We only find out that we selected the “section”, as opposed to a desired species, when the observation is posted. It’s a continuing frustration when we really meant to select one species.

3 Likes

Interesting that difficilis group is recognized and fusca group (as well as neogagates and pallidefulva groups) are not. The former is significantly less common and harder to identify. CV has an incredibly high success rate for identifying Formica to the correct species group in the three latter groups I listed. But rather than list pallidefulva group, it lists pallidefulva and incerta, for example.

The CV does a great job of differentiating the groups and identifying to the correct group, which is why it’s so odd to me it wouldn’t suggest species groups.

I don’t know a whole lot about how CV works, it seems that it takes a lot more manual input from staff than I originally thought. That would explain why those groups aren’t recognized. But it’s baffling to me that a pretty obscure group like difficilis would get recognized rather than any of the more common groups, especially F. fusca. Thanks for linking those observations.

https://help.inaturalist.org/en/support/solutions/articles/151000170368-which-taxa-are-included-in-the-computer-vision-suggestions-

100 pictures which is about 60 obs.

CV is updated about once a month
3 Jan 2025
https://www.inaturalist.org/blog/104015-new-computer-vision-model-v2-18-with-nearly-1-500-new-species

It is actually being included because it is less common and harder to identify. The computer vision can only have a complex as a leaf node if no species in it have >100 pictures, but the complex as a whole does have that many. For the difficult complex, there aren’t any species in it that have >100 pictures. The other complex contains several species with more than 100 pictures. Therefore, the way it currently works, the complex itself can’t be added as a leaf node, because the leaf nodes are supposed to be mutually exclusive, but the species are not mutually exclusive from the complex that contains them.
those species are added to the model, and the complex can’t be added as a leaf node because the model

None of this is manual afaik, the inclusion criteria are all being computed by applying relatively simple rules. I do wonder if to better handle some cases like this the rules for including taxa could be tweaked a little to increase the likelihood of recommending the complexes.

While that explains why it can’t recommend the complex as a leaf node, I’m not sure why it isn’t recommending it as a common ancestor. @tiwane is it intended behavior that in e.g. the observation I linked above the CV is ‘pretty sure’ about the genus but not the complex, even though it only returns two species both in the complex and the sum of the scores reported for those species is >95%?

1 Like

iNat never claims to be “pretty sure” about a species, or complex, or anything below genus. Even when a genus only has one species, and that species is the only CV suggestion, it will still say “We’re pretty sure this is in the genus: [monotypic genus]. Here are our top suggestions: [that one species]”

The new inaturalist next app skips all the way past being ‘pretty sure’ at either genus or complex and makes its ‘top ID suggestion’ the species F. subsericea for that observation. So I’m not sure that not reccomending taxa narrower than genus is consistently the policy any more, or if it is maybe it should be reconsidered.

1 Like