Apparently, iNat’s AI is unable to learn about the geographical limitations of certain endemic species. For me, this is AI in the sense of „artificial idiocy“ – unfortunately, I have to say this that harshly.
Example from the Canary Islands:
Micromeria herpyllomorpha, an endemic species that grows exclusively on the island of La Palma.
However, it has now become the standard AI proposal for any Canary Micromeria, especially for plants from Tenerife.
This leads to an ongoing falsification of the species’ distribution and much extra work (e.g. for me) to correct all this.
Any idea how this could be solved ?
iNaturalist users influence future computer vision suggestions by adding identifications and observations.
I have big doubts that this will work.
Recently I had corrected all the wrong IDs. But that didn’t last long, because new ones are constantly coming. So this looks very much like a vicious circle to me.
To me, AI’s “geographical blindness” seems to be an inborn error.
The way to improve the CV(stop AI making false suggestions) is to identify things correctly, or identify enough obs so that species get added to the cv, then the next update will eventually correct the issues. It is a computer program that has a false assumption and just needs to be retrained, one has to not think about it in terms of sentience, you can’t just tell it that it’s wrong.
Those who do not put work to correct the system have no right to complain about the system. (though all/any of your effort is appreciated)
As long as other users are not agreeing to the incorrect id’s the cycle should be avoided. Only observations with a “community taxon” are used in the cv training, so if an observation only has a CV suggested (incorrect) id added by the user at the time of training, that won’t be included, and wont feed the cycle.
Sorting through incorrect research grade is thus far more important.
I have corrected multiple cases of incorrect species being recommended in my area, such as Nephroma helveticum instead of N. tropicum, or Pulvigera lyalii instead of P. papillosa both took a few months of checking on them and clearing errors, but now there are no more errors
It can and does learn about geographic limitations of species, but with a limited level of granularity. It divides the planet into little boxes and will only suggest something as occurring “nearby” if it’s been seen within the same box. However, for hyper-endemism as in “found on one of several close islands” or “found on one of several close mountains”, it can’t distinguish between those locations if they fall within one box. This has come up a lot regarding California mountain endemics before, and an alternate method of calculating “nearby” was attempted, but it just made things worse. This is an active area where improvements are being considered. But the automated suggestions that pop up are based on image-matching, not true “AI”, and incorporating any new Geo-Model into the suggestions is much more challenging than it seems
As for how to get the other species suggested instead, one major issue is that a bunch of the other similar species haven’t been observed enough times for the CV to be trained on them yet. There’s a note on several of their species pages that says:
“The current Computer Vision Model does not know about this taxon, so while it might be included in automated suggestions with the “Expected Nearby” label, it will not have the “Visually Similar” label. While the requirements for model inclusion change with each model, generally inclusion is based on number of observations, so to increase the chance of this taxon getting included during the next model training, add or identify more observations of this taxon.”
So they need to be observed and identified more times before the CV gets trained on them.
So you’re telling me I have no right to complain when things go wrong? That’s not a fair answer.
I’ve done already so many ID corrections - so I did put work to correct the system - or what else do you mean?
That statement does not apply to you, since you have put in effort. And I noticed your effort and thanked you for it.
That being said, more work could be put in as there are still issues:
I found some observations of Micromeria herpyllomorpha outside of it’s range (now corrected)
And there are more than 10 species of Micromeria on the Canary islands that are not in the CV.
When these are added, the situation will improve, as the CV only recommends leaf taxa(a known downside) it cannot say oh it is in this genus. It instead “thinks” oh it is most similar to this thing that i know about that occurs sort of nearby.
The issue does seem to be largely due to the geomodel streching out to other islands due to past incorrect observations, and the other islands Micromeria species not being included in the CV. ( Requirements for a species to be added to the CV from iNat help pages: There must be at least 100 photos of the species and 60 observations of the species(that are at that species for the community taxon), and we don’t choose more than 5 photos from an observation to train the model)
Micromeria varia is the closest to being added, with 69 photos of the species, and more that could be identified(since my count only included 29 observations that meet the requirements and there are 93 total), this could probably be added in the next update if done soon enough
Thank you for your friendly words; this should be a peaceful and constructive discussion.
Here is the current status of the Geo Model for the species in question:
The good news is that it looks like the Hexagons for the island where the species actually occurs do not intersect any other islands. The bad news is that the species is currently being suggested in the other islands’ Hexagons.
I’m sure someone else with deeper knowledge of the Geo Model can answer this, but I know the model was, at least for a while, including additional Hexagons in a species’ Expected Nearby range if they seemed “suitable” for the species to occur in. I’m not sure if this is still the case, or if the other islands’ Hexes are being included because of the misidentifications that have occurred there.
Again, this is mirroring similar discussions about Mountain endemics in California where similar issues happened. In conclusion, “It’s the Hexagons”. It’s always the Hexagons.
Having been on iNat since before computer vision was implemented and then seeing it evolve throughout the years, and having helped with resolving many issues throughout that time, I can tell you it does work.
But yes, it can take a lot of work, and it can be frustrating, and it can take some time. The model is updated regularly.
Oh yeah, it works. I’ve personally gone through a moth genus and corrected a 5-digit number of IDs and after a year or so of work, the CV now gives correct suggestions nearly all the time for that genus. It wasn’t an overnight fix though.
I see two fundamental problems with the Micromeria example:
a) Micromeria is a difficult genus.
- For La Palma it’s easy, there is only 1 species.
- But for Tenerife it’s highly complicated, several species and recent taxonomy changes or controversies. So for the time being one can only say that a Micromeria cannot be M. herp. - but mostly not what the correct name should be instead.
b) The geographical granularity.
The shown hexagons give cause for hope. And the demarcation should actually be easier for islands, since the coast is the undisputed border there, while it is more difficult for mountains (e.g. California). But I wonder if the mentioned
could make some trouble ?
We’ll see what happens in the long run when I continue with my corrections (which I interrupted to start this discussion).
Adding correct IDs is the best way to fix the problem, but note that there’s a delay between when you make the IDs, when the updated IDs are exported for training a new model, and when the new model is released. That whole process takes at least two months.
Oh hopefully not - the name is now considered obsolete.
Unfortunately, it took powo quite a long time to publish the newly recognized names. And of course, iNat needs to be updated first. I’ll set some corresponding flags soon. This is going to be a very difficult operation, and since the new rules, curators no longer seem to like making big swaps, but that would be necessary.
If you can add that as a copypasta (using a text expander or whatever) then more people who care about the taxon will know and - like ripples in a pond - your effort will spread wider. And it will no longer always be you who makes the needed corrections.
Sorry - don’t understand what you mean by
?
When you add a correct ID, if you also leave a comment (copy and paste or however you prefer to do it) you mentor others to help you. I try to keep my English translatable, but sometimes I slip - sorry.
That may be useful in other geographical regions. However, as for the plants of the Canary Islands, the bulk of the identification work has been on me for years now. Other competent users have withdrawn or are doing relatively little.
And who wants to do a lot of stupid correction work?
Even I don’t know when my frustration threshold will be reached.
So in this case (M. herp.), any extra comment would cost valuable time and will rarely encourage others to help.
Exception: thx @dgwdoesthings
Those who want to stop seeing stupid mistakes
What @DianaStuder was suggesting would take a fraction of a second for each observation, especially since this is a common occurrence you can set up standard “copy pasta’s”. I dont think it is fair to assume that putting information on why you are disagreeing will rarely encourage others to help. (The existence of that information to a situation I was unaware of by you complaining on the forum got me to help and I’m sure there’s more out there.)
This can be as simple as a document you maintain where you copy set responses from or a simple downloaded program that inputs a block of text with a keystroke. There is info elsewhere on the forum on how to set that up.