CV suggestions have gotten much, much worse at San Jacinto Mountain

For me, my frustrations with the CV definitely picked up around when the summer flood of observations started coming in (maybe around late May/early June?)

To pick on a specific example that is currently the bane of my existence, let’s take Agelenopsis potteri. Over in eastern Europe and in some parts of Canada, that species is at least possible to ID via typical user-submitted photos, which means there’s a lot of species-level observations for the CV algorithm to get trained on.

But in much of the US, there are extreme lookalikes that are effectively indistinguishable or require an incredibly discerning eye to separate, making species IDs of genus Agelenopsis a much taller task in the USA. As a result, many of those other species in the genus have nowhere near enough photos to be put into the CV as competitors for suggestions.* We will be sidestepping the whole “but wait they actually can’t even be separated by external photos anyway” concern for now.

*This is another aside that is not directly relevant to Tony’s question, but I would like to get it off my chest. Many of the observations that do exist for those other species are of preserved specimens, which have a distinctly different look that I certainly wouldn’t trust the CV to try to extrapolate out to “the appearance of live spiders that haven’t been dunked in alcohol,” so even if there happened to be enough collected specimen observations to get into the algorithm, it wouldn’t help for all of the live photos that people are taking.

I have noticed a consistent uptick in observations with Agelenopsis potteri CV suggestions this summer that are completely unjustified for the level of detail visible in the observation (presumably because the CV algorithm looks at all the European and Canadian observations and goes “Oh, this must always be easily identifiable to species by image matching”), so I’ve had to kick more and more of them back to genus all summer long.

3 Likes

The last update made chironomids worse overall even with the inclusion of some new taxa. Its like the value put into range score was increased and the range maps generated to be more strict. A number of chironomids are having strange cases of not being reccomended at all in their range, so you have to click not expected nearby. Even when they are surrounded by many other observations. This is happening to species that rarely had that problem before.

This has been much more detrimental for places with few taxa in the model. Like Africa, South America, etc.

@chuuuuung what you describe is the largest issue with the CV now. It can only learn leaf taxa and forgets everything above. It is such an extreme pain it affects even how i identify. Unfortunately I’m not sure anything inside iNaturalist is working to fix this. Its not very transparent what is being worked on at any given time.

My current workaround is I just don’t identify identifiable species if It is in a very difficult genus and the only one easy to ID. I also will wait (when necessary) to ID two species in a genus at the same time before iding one. Glyptotendipes is an example. The plan is to get Glyptotendipes barbipes and Glyptotendipes testaceus in around the same time. I went about a year strategically not IDing Glyptotendipes testaceus to preserve the genus in the CV.

2 Likes

Also there are an annoying amount of differences between where you ID.

Analysis of one obs.


iNaturalist classic android


iNaturalist website on the observation


iNaturalist identify tab


iNaturalist Next Android.
Seems to be the best version of the CV so far being more accurate than the web and inat classic for Chironomids.

Seems the hexagons have an impact? I suppose any observations a bit close the coast, or in a strange hole will just not get reccomended S. poecilopterus becuase its not nearby?

2 Likes

What’s the URL of the observation?

https://www.inaturalist.org/observations/301717786

1 Like

@tiwane If you need more examples from San Jacinto Mountain, I can readily provide them. But they all come from a common cause, a bad new geomodel.

Here’s another example:

Ceanothus cordulatus
https://www.inaturalist.org/geo_model/62228/explain

If you zoom into the San Jacinto Mountain area, the polygon with Mt. San Jacinto in it leaves out a huge population of C. cordulatus to the east of Saddle Junction, and instead covers a huge area in which it is not found, in the Hemet, Menifee, Anza, area.

2 Likes

I’m so sorry you have to go through all that, dude.

The fact that these gymnastics are required to keep the CV from being trained into making incorrect IDs just reinforces the feeling I get that iNaturalist devs and leadership are focusing solely on getting more users and observations, at the complete expense of the identifiers who have to clean up the mess.

3 Likes

Yes, they do. I’ll note I’m not an engineer, though, so I don’t know what all the trade-offs would be for using smaller hexagons. I assume using small hexagons would require a lot more computation time and power, and these are also large enough that they should be bigger than obscuration cells. Having a finer map may mean more potential for revealing locaitons for threatened species. There are always going to be trade-offs regardless of approach.

FWIW, if you turn on the unthresholded map, you’ll see it’s close to meeting the threshold for the geomodel to include that hexagon:

The model uses a combo of elevation, observations of Stenochironomus poecilopterus, and other species which co-occur with Stenochironomus poecilopterus. Hard to say exactly what’s going on with this particular hexagon, at least at my knowledge level.

Thanks. If you come across any specific observations please let us know.

The goal with this approach was to improve it and also make it easier to improve as we work on it. If it’s worse, or needs improving, we’ll do what we can to improve it.

2 Likes

This is terrible for Central American marine species. Many that are confined to the Pacific Ocean are “expected” in the Caribbean and the Gulf of Mexico.

For example:
Stone Oyster (Striostrea prismatica) · iNaturalist

1 Like

This risks creating a new thread.

I have certain feelings about it. More so in the past few months. There are many many large issues, some years old, and quite a few that are complimentary to one another. I think a decline of identifiers and the loss of expertise on iNaturalist is increasingly becoming something I’m worried about.

After becoming the main identifier of Chironomids over a year ago. I truly understand the reasoning for some to leave and take their expertise elsewhere. Its just so much work with very little support from the iNaturalist team. What I mean by that is all users besides curators have the same avaliable actions. Relating to me, really no issues with Chironomidae were helped.

I couldnt bulk fix Diamesa which were 90% incorrect, i had to spend many hours over 2 weeks to push 3000 observations individualy to family by myself.

There’s no measures in place to manage a CV taxon out of control, even if the CV is 90% or more incorrect. There is no putting a stop or hold to it by pleading a case for it be removed from the CV. They are just allowed to balloon into unmanageable ever growing piles. Not even a warning message for those using the CV that it is reportedly increadibly inaccurate.

There’s no stopping the CV from forgetting higher taxa unless you avoid identifying lower taxa.

Theres no way to prevent a species from being learned by the CV. Even if it requires dissection, DNA, high power microscope. Everything above species is available to be learned.

With the new taxon image system, theres no way to lock the first taxon image when using the CV for any reason. This means images with the diagnostic feature of a taxon risks not showing up. Really there’s very little control of taxon photos.

Theres no way for users to edit geo-model assigned ranges. If the model has assigned a taxon an incorrect range. You can’t edit it manually to fix it.

There’s no hybrids learned.

There’s taxonomic ranks that are not added forcing you to not add certain groupings, or make a clunky work around.

There is no offical comprehensive list of what is needed for a taxon to be eligible for the CV. One has to go to github and look through code to find out.

There is so many other issues, the above is hardly comprehensive. Many of the above issues are complicated and probably very hard to solve.
But my real issue at this point isn’t even iNaturalist has problems, it’s a lack of supporting tools, procedures, and methods to combat problems.

For the example here, theres nothing anybody can do to fix this in the meantime becuase the range maps the CV uses to determine where to reccomend a certain taxon is completely controlled by an algorithm/chunk of code. No human intervention is possible after it releases its resulting maps to fix any mistakes it made. To fix this, the code needs to be updated by engineers, then released in the next CV release. Which could be months.

6 Likes

I would say “look at literally any Agelenopsis observation in the northern US where I have submitted a disagreeing ID”, but on second thought, the example I gave is an issue of “a genus that is unidentifiable to species in only part of its range.” That is a phenomenon that, as far as I can tell, the CV model completely fails to take into account.

However, that is a separate issue from the inconsistent geomodel maps that are the original motivation of this thread.

1 Like

I think we all may be talking about a couple of different changes to CV behavior. Among the taxa that I monitor (e.g. lichen moths and aquatic Crambid moths), CV has not declined in precision. It’s a more nuanced change in its behavior: I’ve noticed about over the past month or two that CV seems to be more confident in offering IDs, most of which are correct, but more often overlooks sibling/cryptic/rarer species. This is an over-confidence by CV and a bias toward more commonly IDed/uploaded taxa (maybe “over-fitting” the data in the terminology of computer modeling). I recall this corresponding to about the same time that staff (in a blog somewhere?) announced a change in the geographic modeling protocol. I have associated the two: More confident IDs but more errors when sibling/cryptic species are involved, especially when one is more numerous or widespread that the others.
EDIT: See @dianastuder’s link to the June 30, 2025 blog post below. Thanks, Diana.

4 Likes

I posted about a few more examples of this issue (not the geomodel maps, but the isolate leaf taxon dominating the CV like with the chironomids and spiders) a few weeks ago on this other thread:
https://forum.inaturalist.org/t/cv-not-suggesting-species-complexes/60301/15?u=tristanmcknight

And I shared two examples of the geomodel causing trouble because of rough hexagons ignoring part of a mountain two years ago on this thread:
https://www.inaturalist.org/posts/84677-introducing-the-inaturalist-geomodel

3 Likes

Somewhere is here

1 Like

For ease of reference, here are the relevant paragraphs of that recent CV update blog post which described the changes in the geomodel. Warning: These are written in serious modeling geek-speak.

"When we update the computer vision model, we also update the geomodel. With this release, we’ve updated our geomodel using a new training approach called SINR (spatially implicit neural representation). Our previous geomodels predicted species distributions based on spatial grid cells. All iNat observations were aggregated into multi label presence sets per cell, then a model was trained on these aggregations using multi label binary cross entropy over species presence in the cell. It produced sharp geo priors which make it useful in downstream computer vision tasks, and is straightforward to develop and debug.

In contrast, a SINR model learns directly from individual observations and carefully constructed pseudo absences, and uses negative sampling loss to distinguish where species are likely or unlikely to occur. It provides better generalization and avoids discretization artifacts, aligns with the work of our research collaborators, and is easier to adapt to interesting new directions for producing pseudo absences. It’s empirically stronger on mapping tasks, and we believe that as we keep working on it, it will catch up on the geo prior / CV task."

1 Like

The apparent failures of the newer geomodel with respect to cryptic species and speciose genera also highlight the previous feature requests to allow for CV training on genus-level observations. For instance, see this Feature Request and @zygy’s link therein to previous requests.

2 Likes

Eriogonum kennedyi is another example of the expected nearby polygons being crazy but in the opposite direction.

It seems like it should be easy to do a hexagon vs. observation analysis that shows “expected nearby” polygons issues. Species with relatively few observations probably should have more errors but should there be polygons with 300 observations in them already that are not an “expected nearby” polygon and should there be a huge amount of “expected nearby” polygons like the above map where that species definitely is not expected nearby? Some sort of analysis should be able to detect how prevalent these issues are and maybe lead to answers of how to refine. One obvious solution to part of the issue is to automatically change all polygons with a bunch of observations of a species to an “expected nearby” polygon. Likewise, maybe exclude all those that are some predetermined distance from observations as well.

3 Likes

Here’s another very strange geomodel map:

The species only has observations in the eastern United States, yet the geomodel says it is expected in Hispaniola (Haiti and the Dominican Republic) and even Baja California, which is bizarre. Clearly the model needs some tweaking.

3 Likes

Here’s a funny pair of geomodels for rare Arctostaphylos species in Central CA where one includes a bunch of hexagons where it is absent and the other excludes a bunch of hexagons where it is present. The calculated probability of a true vs false absence could probably be tweaked based on number of observations within a hex of close relatives. Hexes with no observations of anything are possibly false absences, whereas hexes with many observations of related species are much more likely to be a true absence.

4 Likes

Thanks for the examples. We found some potential errors in elevation encoding (in my layman’s understanding) which may be contributing to the problem so we’re looking into those.

5 Likes