CV suggestions have gotten much, much worse at San Jacinto Mountain

Speaking of Ribes problems in S California - iNat’s CV misidentifies all the spiny Ribes in the San Jacinto Mtns. (R. roezlii and R. montigenum, both with red flowers, both common in their respective habitats) as R. quercetorum (yellow flowers, does not occur in the San Jacinto Mtns. and scarce in valleys to the west). I’m no model builder and I understand there are inherent difficulties in dividing geography up into cells (of any shape), yet nonetheless you probably need to do that to use geography as part of the model. But it makes no sense to me that hexagons where a plant occurs with many verified observations would be excluded from the “occurs nearby” set.

1 Like

If you go to the taxon pages for those 3 species, go to the map, and check the geomodel by turning on the “Expected Nearby Map“ on the upper right corner, you’ll see it’s the exact same geomodel problem that’s been raised a ton in this thread already - the blue hexagon map for R. quercetorum completely smothers the entire San Jacinto Mountains, while the blue hexagon maps for R. roezlii and R. montigenum miss significant portions of the San Jacinto Mountains.

Really, all of the problems with CV and the San Jacinto mountains are just a complete microcosm of all of the issues with the geomodel right now, and the very real problems that occur when iNat’s computer vision suggestions put waaaaaaay too much faith in the geomodel given how obviously flawed it is.

The geomodel clearly does not deserve that level of faith, because right now it’s doing a lot of hammering of square pegs into round holes. Better to just turn the whole thing off until it works better.

I think the thresholds for whatever arbitrary cutoffs are used to determine whether a hexagon is included in the range map or not are far too strict for how incredibly massive each hexagon is - when each one is over 20 miles across by my estimation and can contain massive variations in climate and elevation, how are they supposed to be usable units for a geomodel?

1 Like

I just had another thought - all of these geomodels are predicated on the assumption that the underlying observations are, at least to a large extent, correctly identified.

There are taxa out there where that is very much not the case, especially in parts of the world and in taxa where a dearth of knowledgeable identifiers means misidentifications run rampant and remain uncorrected (and even reach RG), and therefore affect the geomodel.

Take, for example, one of CV’s favorite suggestions for blurry photos of unidentifiable tiny spiders, Oecobius navus. I have zero faith that observations outside of the ones I have personally reviewed are correct, and I only started checking O. navus observations in only the USA fairly recently in the grand scheme of iNat’s existence. There are going to be a lot of misidentifications in that data set. The geomodel is blue for nearly all populated parts of the world.

Although O. navus is a highly-distributed and highly-synanthropic species, so a very broad geomodel map is to be expected, I strongly suspect that the old computational axiom of garbage in = garbage out is also in play here.

4 Likes

Situations like this have also become more common. There’s only one species of Coenonympha like this in the entire continental US (two in the genus, the second is far, far away and totally different in appearance). But it will not give a species, only genus, and then it gives a species from a totally unrelated genus that isn’t close to looking similar.

Almost nothing in this upload batch had the correct species in the top 3 positions, or at all. It’s puzzling.

5 Likes

What does not expected nearby for that observation look like?

Helps to see how much geomodel is being factored in.

1 Like

I have compiled my thoughts on this continuing CV/geomodel problem in a new journal post:

https://www.inaturalist.org/posts/115290-cv-and-geomodel-predictions-visibility-identifiability-seasonality-apples-oranges

12 Likes

That’s a classic problem with almost all predictive range mapping. It is very hard to include biogeographic barriers in the models except on a population by population basis.

5 Likes

I’m glad to see this thread because I’ve seen what seem like some crazy suggestions as well lately. Not for everything, but certainly for some things. Sometimes it is a “way off the mark” suggestion and other times the correct ID is not listed at all (when I’m sure in the past it has been). It has just seemed to not be working very well over the past couple of months (at least). I’m glad to see this post and realize it is not my imagination…and also that it is being brought to the attention of iNat staff.

2 Likes

Post edited to fix a misunderstanding of mine, replies may no longer make sense

Chuck, this is a lovely post, and I agree with everything you’ve said.

There are some issues just in general with using iNaturalist observations as a source of training data for a computer vision model. That comes with a number of implicit assumptions that not always true - the chief ones that come to my mind is that (1) everything is identifiable to species from the information typically present in an iNaturalist observation and (2) that taxa included in the geomodel have enough knowledgeable identifiers across the taxon’s entire range to produce a representative sample of correctly-identified observations.

Your observations about the numerous types of sampling bias inherent in iNaturalist observation data are also an extremely important point. iNaturalist observation data will have strong biases towards big population centers and wild spaces close to said population centers, biases towards times of year when people are likely to go outdoors, biases against places that are plain old hard to get to, biases towards charismatic organisms that are more likely to be observed and identified, biases towards organisms that are easier to ID, and there will also be seasonal variations in how easily identifiable an organism is based on lifecycle stage.

I cannot shake the feeling that this SINR geomodel is a house of cards built by stacking biases upon faulty assumptions upon factors not taken into account.

4 Likes

This is not possible as a number of taxa would have no eligible data for a range map. I think the Chironomid Omisus has 1 RG observation.

Edit. 2 RG observations couldn’t make a range map like this

I think we are misstating what is meant when the Geomodel “requires 50+ RG observations”. As I understand SINR (which is haltingly), this constraint is for a species to be included in the training set of all species (all plants/animals) on which a species distribution model is built. It is not important to the predicted range that a given species to be modeled has <50 observations. The predicted range (from all species in the SINR training set) then serves as an underpinning for the actual set of observations (and photos) utilized in the CV process of offering IDs.

I hope I’ve properly characterized this “50+ RG” criterion.

3 Likes

Ah, OK, that’s definitely on me for misunderstanding that as I was quickly reading through your post.

Still seems like an oversight regardless, as I’m sure some of the generally-unidentifiable-to-species taxa would be useful for refining the geomodel.

I actually have used that, when I was the first to observe a taxon within the “expected nearby” range. In that situation, its top “visually similar” suggestion was even more visually similar than its top “visually similar - expected nearby” suggestion.

My latest exploration of these issues: “It’s the Hexagons”:
https://www.inaturalist.org/journal/gcwarbler/115452-it-s-the-hexagons

6 Likes

Glad to see that Tchester brought this up! Suggestions have been way off on really easy things lately, or sometimes it will suggest the right genus and then throw out a bunch of random suggestions that do not include that genus like this. It should be an easy one.

Parish’s catchfly (Silene parishii) from Mount San Jacinto State Park, Riverside County, US-CA, US on July 31, 2025 at 12:37 PM by Susan Fawcett · iNaturalist

2 Likes

No Silene are ‘in range’ apparently. But it looks so much like the out of range Silene it reccomends the genus anyways.

Something like that.

It’s trying to find the common ancestor of visually similar results, I believe. So I suspect there are a bunch of visually similar Silene, but none “expected nearby” in the geomodel. What’s the URL of the observation?

Yeah, if you turn off expected nearby from results, two of the top suggestions, based only on visual similarity, are in Silene:

2 Likes

I have seen this recently in the Santa Catalinas (Tucson, AZ) with the very common Taxiles and Silver-Spotted Skippers and the Nais Metalmarks. (Recently because these butterflies just came out.) It will suggest Lon but not include Taxiles in the species list and so people are selecting from the more specific list that is very incorrect like not skippers at all or not butterflies found anywhere nearby. (uploaded through the website for me btw so that screenshot is very similar to the kind of thing I have seen.)

In the spring, I was wondering about the wasp and bee suggestions because it seemed like the CV could get to Bee and then just offered one particular furrow bee no matter what. It was really frustrating knowing that this was not true of pollinator suggestions last summer.

I wonder, too, if this is part of what I’ve been seeing as an identifier this year where some pretty experienced observers have been punting to a general ID (Butterflies or Bees) when I know in the past those observations would have been to species level. Bluntly, I have done that for Bees and try to go back later because they are not all furrow bees and it is super frustrating. I did not pay attention to the terrain to know if this was the hexagon-on-a-mountain problem but have to assume it is for my observations down here with the sky islands.

1 Like

Regarding the butterflies you mentioned - yes, absolutely the same issue. Go to https://www.inaturalist.org/taxa/1081330-Lon-taxiles . On the map, click on the layers icon in the very upper right, and select “Expected Nearby Map.“ You will see this, with the hexagons not covering most observations in the Tucson region:

The analogous geomodel map for Silver-spotted skippers is basically the same, also not covering those mountains. The Nais Metalmark geomodel map is even worse.

If you check other species that aren’t being suggested when they should, you will probably see similar issues.

As for the bees/wasps, you’ll have to do some digging on your own there - I assume you have at least some idea of which species “should” be showing up in the suggestions but aren’t, and you can do similar checks of geomodel maps.

If the “one particular furrow bee“ you mentioned that the CV is pushing is either Halictus ligatus or H. tripartitus, those species have geomodel maps that cover pretty much all of Arizona, and I’d have to assume that the other species you’d expect suggestions for have similar geomodel failures to the butterfly examples I screenshotted. That would cause the CV to push the furrow bee IDs.

1 Like