I’ve been appalled lately that the iNat CV is now offering completely-out-to-lunch suggestions for some observations at San Jacinto Mountain, with the true identification not one of the suggestions. I mainly use that iNat CV to save typing, but lately it hasn’t been doing that.
This example is TYPICAL of what is happening now in some significant number of observations:
The plant is pine drops, Pterospora andromedea, which is very distinctive, and which the iNat CV used to nail all the time. It looks nothing like most of its suggestions, and especially not of Rumex salicifolius.
I could easily find 20 to 30 additional examples of such crazy suggestions for different species.
It still comes up with the correct suggestion perhaps the majority of the time, but it used to have an accuracy of 80%, and now it has dropped significantly from that.
Diana, you nailed the source! Those hexagons have excluded most of the territory at San Jacinto Mountain that contains pinedrops! I bet this is the source of the problems for other species as well.
Clearly, for San Jacinto Mountain, the previous geomodel approach worked much, much better.
The polygon approach is especially unsuitable in areas with large elevation differences. For example, the polygon for pinedrops that touches San Jacinto Mountain includes low elevation desert areas, so pinedrops might be suggested for those regions, but won’t be suggested for the high elevation areas south of San Jacinto peak where it lives.
I suspect this is a general problem with the new geomodel, with a polygon containing the San Bernardino Mountain high country and just touching San Jacinto Peak. That would nicely explain why the geomodel no longer suggests a number of species for San Jacinto Mountain that live there.
The last release said “It offers better alignment with expert range maps and more robust predictions for sparsely observed species.” If the Ribes amarum actually were sparsely observed regarding it’s range, you could see this as a benefit. How does it know if the gaps between observations are because it is sparsely observed or because it doesn’t grow in the gaps though?
Either way, they know the model has issues and said “We’ll continue to refine how the geo model supports the computer vision system and hope to see quick improvements with the geo prior improvement in the next few releases, without losing the mapping/range evaluation wins.” So, hopefully the worst of the issues from this round will be fixed in the next.
When new methods are first used, they are often worse than what came before. The iNat team has made remarkable technical progress over the years, but there have been numerous instances where something new was not impressive until the details had been fiddled with several times. These geomodels will get better.
First things first: The single pine drop image is probably not sufficiently large or detailed enough for CV to properly place it. The first image of an organism should be close enough and detailed enough to show identifiable details. I downloaded and cropped the pine drop image to show the upper half of the plant with the inflorescences, and did a test upload; CV readily IDed it as Pterospora andromedea.
I see that many of that observer’s uploads accomplish this, but others are general images of a plant from some distance. I always suggest, when uploading, to think in terms of what details might be most useful in a botanical key (close up of a inflorescence, flower, foliage, etc.), and offer those images up as first images in an observation. You don’t need to know in advance what details might be important, just get close!
I have noticed something similar for several of our preserves here in Pinellas County, Florida. Previously the CV recommendations were pretty spot on (at least getting you in the right direction) but now they’re often very hit-or-miss or even very wrong. It’s pretty common that it will predict “We’re pretty certain it’s in this Genus:” and then none of the taxa listed are from that genus. Often times I have the CV model making repeat predictions for species that are wildly out of range (for instance, many Hypericums repeatedly predict as a Hypericum that’s narrowly endemic to the Florida pandhandle, more than 200 miles away from my observation).
If you select “include options not expected nearby” pinedrops is the top suggestion. This is a very typical quick snapshot of pinedrops and as such the CV algorithm itself has no issue recognizing it, but pinedrops is being ruled out by the geomodel.
Thanks y’all! I’m apologize that I can’t jump into this right away since I haven’t had a chance to dig into it, but I take stuff like what’s reported in this thread very seriously and will be doing an in depth investigation this week.
Whatever algorithmic logic came up with these hilariously bad geomodel hexagons was clearly not designed properly for the constraints of reality. And what’s worse, we’re seeing direct evidence in this thread alone of it screwing up both in the “the modeled regions are too big” and “the modeled regions are too small” directions.
I have to suspect that the bones of this algorithm either came from some other use case that was not explicitly “estimating the established ranges of wildlife”, or was poorly optimized for iNat’s intended use.
The bitter gooseberry example suggests it either (1) isn’t properly handling disjointed ranges or (2) since the geomodel is trained using just observations and an elevation map, it is not accounting for the fact that the mountains around San Luis Obispo and Bakersfield might be the right height, but there are other reasons bitter gooseberry haven’t become established there (i.e. unfavorable climates, soil types, etc.). Elevation alone is not enough to properly draw conclusions about ecological niches.
The pinedrop example indicates that the hexagons are just too large to properly account for some of the hyper-local ranges that exist, and at least in this case the arbitrary cutoff values for [insert algorithm calculation here] are causing hexagons that clearly have the taxon in question being excluded from the geomodel area.
Getting back to the original complaint, combining (1) an iffy computer vision model with (2) an iffy geomodel is just compounding algorithmic inaccuracy with more algorithmic inaccuracy, which is clearly just making the suggestions worse in situations like these.
It would be nice if you guys weren’t beta-testing in the production environment.
Yes, the CV suggestions have become significantly worse this year. One thing I notice is that it manages to both stop suggesting options that aren’t that common, but suggest options that are common “somewhere nearby” (usually a county away or in the same state).
For instance there are plants that are uncommon that it won’t even suggest the right genus anymore, or at best favour options that are wrong and have the right option way down the list.
The “expected nearby” is probably one of the biggest problems I have though. Suggesting out-of-county or region options as the first selection when they are not at all likely. It’s not great.
I can’t give you an exact date. I just noticed there was a point in time this year where things decided to go awry.
I am not sure what changes have been made to the CV in general. I notice in particular my phone photos of plants (which are not always close and optimal) are often no longer identified correctly. In previous years it always nailed those.
If I see examples in the near future I’ll report back.
One thing I do remember it used to do more was suggest the genus as a separate option for the top-level suggestion. Now it seems to confidently suggest one species (which may or may not be correct) and have no genus as a quick backup choice.