I found that this is suggested almost everywhere worldwide with the exception of some extreme climates, regardless of observations. Diving deeper I found a lot of observations appear to be mis-IDs. I am working and reviewing many of them to the best of my novice ability. I understand the CV updates monthly, but the fact is it shouldn’t have been showing that suggestion to begin with.
TL;DR Is there something individuals can do to improve the geomodel besides correcting identification and adding more IDs to reach 100 of a given species?
Is there or can we make a running list of species with extensive misID. I have noticed mention of a few in another thread. I wonder if it would be especially helpful for IDers to work on that instead of sorting unknowns etc…
A related question as well: when does the geomodel actually update? I had a similar but less extensive problem with the katydid Ephippitytha trigintiduoguttata, and I cleaned up several hundred incorrect sightings. However, that was more than two months ago and the geomodel still hasn’t corrected, which means that it continues to suggest the species well out of its expected range, and I’m not entirely certain if it will ever fix itself:
There were observations all across that blue area which I’ve corrected, but it feels like I’m fighting a losing battle! (And for those wondering, that single observation down south is of a stowaway)
FWIW, I’m viewing the geomodel like a fine wine…it needs to age some and may improve with time as they tweak it. Right now, I personally have no confidence in any of the predicted ranges coming out of the geomodel; it is just too simplistic. Several of us have offered suggestions on alternative variables or datasets on which to base such a model. That’s all in the future. For the time being, I just don’t even bother looking at them except for amusement.
It is worth being absolutely clear what the geomodel is and isn’t. Even if it does update, it will not necessarily restrict the blue area to ‘where the red dots are’. It is based also on things like altitude etc. Still got some improving to do no doubt, but a vast improvement on the ‘seen nearby’ approach used previously.
Most marine mollusk species that live only in the Gulf of Mexico and Atlantic have geomodels that have looked or still look like this, and it’s leading to a noticeable amount of misidentifications in the Pacific where those species don’t occur at all.
Help me understand how this is happening: Is iNaturalist’s Computer Vision offering the erroneous suggestions (e.g. a Pacific Coast location in your example) based on the geomodel that are subsequently being agreed to by OPs? I don’t think I understand fully how the geomodel is being rolled out and incorporated into CV.
Yes, the Computer Vision as others have mentioned already is far from perfect. Practically every identifier has one or more species that has a large suggested range making large amounts of incorrect identifications that need to be fixed. If I remember correctly Bombus has a lot of those. Here is an example of a species with a widely incorrect range:
Latrodectus mactans most western range is Texas and yet is predicted all the way into Central America. VA, is also about as northern as the species would get yet it is predicted in MA.
Generally, just think of the CV as a helper and do your own research on the species it suggests before choosing them.
@loarie I have downloaded and attempted to read the papers by Aodha et al (2019) and Cole et al. (2023) but I’m hampered a bit by a lack of familiarity with some of the modeling terminology. One word used frequently by Aodha et al. is “prior” used as a noun to describe certain data inputs such as “presence-only geographical priors” and “spatio-temporal priors”. Could you explain in layman’s terms what that noun refers to?
This is indeed an ongoing problem with many species. It normally happens when a species appears frequently in textbooks or nature guides from well-studied places and people mistake it for a ubiquitous species. This happens a lot with Ulva lactuca and Corallina officinalis (algae). It also tends to happen when a particular species is very common in one country and both the AI and some identifiers expand it after seeing it in other geographical areas, missidentifying local native species with species from their geographical area of expertise. This happened with Sargassum fluitans all over the world. To my surprise, this wrong data even made its way yo Gbif. After contacting them so as to let them know that the distribution in Gbif was wrong (which I did because there were way too many wrong identifications) as a result of missidentifications in inaturalist the response of the “curator” who got back to me was to manually change each observation to a genus level. We managed to cast it out from Europe and some parts of Asia. So I guess there isn’t an easier way to bypass this. What I also tried was to add a comment in my suggestion (as I normally do when I’m suggesting something else) so that people would learn that that species is not present in the area and avoid the mistake in the future (as well as learning something new about nature, which is always nice!)
The geomodel is what decides the “Expected Nearby” tag for the CV.
I believe this specific example started because the geomodel partially uses altitude to give suggestions, and beaches in the Atlantic are the same altitude as the ones in the Pacific. This may have lead to the geomodel (and therefore CV) to suggest an Atlantic species for a similar-looking Pacific species. An observer could go with this ID and an identifier might incorrectly agree with it. Alternatively, an identifier may suggest the wrong ID first, and the OP would agree with it. Either way, this would lead to a research grade observation in the area, which tells the geomodel that the species supposedly occurs there, leading to a positive feedback loop.
Basically, the geomodel incorrectly gives a suggestion, at least 2 users agree with it making the observation RG, and the geomodel uses this to give even more incorrect suggestions.
I think the main problem in the Atlantic/Pacific example is the that even though the beaches are at the same elevation, the geomodel doesn’t recognize that they are completely different ecosystems.
This is also part of the case for Pacific mollusks, with many of them being identified as their Atlantic counterparts. For example, the Atlantic Anadara brasiliana has 1271 observations. For a while it was known as the only Anadara with small beads/nodules covering its shell (at least, before I went through identified the similar Anadara chemnitzii). In the Pacific, however, numerous species have beads as well, so some identifiers have made the mistake of identifying them as A. brasiliana or A. chemnitzii (myself included)
By the way, here’s a Pacific observation where the CV suggests 3 Atlantic species as “expected nearby:”
Anadara brasiliana Anadara chemnitzii (a species recently added to the CV)
and Noetia ponderosa
Priors in general are a way to incorporate previous knowledge/understanding/theory/hypotheses into models. They are often starting points, or are used to calculate starting points, for certain parameters when fitting models iteratively (often millions of times). The idea is often that they are a reasonable starting point so that they get the model “in the ballpark” since the parameter space of a model is so huge that it can never be fully explored when training the model to search for an optimal solution.
You can also understand the value of priors by thinking about the other options for fitting a model. You could start with an arbitrary value or a random value as well. In some cases, an arbitrary value (like 1) might work ok, but in many cases it won’t. A random value would often lead to garbage output.
I haven’t read the papers you’re referencing, so I don’t know specifically how they are applied, but a
probably describes data composed of 1s (known presence) in whatever spatial grid is being used for the modeling process. The model will start from that and generate a series of probabilities for all the locations in the spatial grid (something between 0 and 1) I would think. But this is just based on my general knowledge of how some types of models work, not the Geomodel specifically.
This is almost certainly because of a long-standing issue of iNaturalist known to marine biologists: iNaturalist doesn’t really recognize oceanic areas like it does terrestrial areas. It’s a bias that causes a lot of mis-ID’s and problems in the marine realm. As far as I can tell the staff have no plans to change this since this is a years-old problem. Ironically, a lot of times the CV will not make suggestions of a species that is right near it, making you manually type it in, but will make a suggestion that is totally inappropriate. Then inexperienced users get ahold of it and it causes a feedback loop. Sargassum fluitans was already mentioned, but Clathria prolifera has been a nightmare for about a decade or more due to this issue.
I wonder if a potential solution could be to give the geomodel a negative bias when a observation misidentified and then corrected.
For example:
A Pacific species gets misidentified as an Atlantic one, leading to the geomodel thinking the Atlantic one might be found in the Pacific.
When the wrong ID is corrected, the geomodel wouldn’t just forget about that area but also leave behind a negativ imprint so that when someone add another false identification, the area doesn’t immediately gets added back to the model.
In general I think it would be good to have the model be rather conservative and not extend the range because of one observation (honestly not sure if it does)
But on the other hand something it might be interesting taking into account observations that get confirmed not just once but multiple times. This could be useful for species extending their range (due to shifting climate or human transportation).
So if a species gets observed in an area previously not covered by the model but gets multiple confirming ID, it could still be added to the model.
The problem is, with common species like Clathria prolifera and the way the AI vision works, I don’t think there is a way to correct this within the programming of the model itself. The model is working exactly as programmed. This is one of the inherent flaws of modern “AI”. It doesn’t have the sense to read up on native ranges or the latest paper about an introduction of a non-native species. It’s listening to beachcombers who don’t know a sponge from a seaweed and then just endlessly telling other beachcombers that everything is the most common suggestion it gets. As far as I can tell the only way to fix this is for an actual person to come along and repair the thousands of mis-IDs. But that’s very taxing, so it often doesn’t get done.
As for the marine issue, what really needs to be done is for all the user-made polygons that were once a thing and got grandfathered in to be deleted in their entirety and to start over with better ones. For example, there are multiple “Caribbean” polygons, but none of them actually encompass the Caribbean and most of the edges of the polygons for these user-made zones are way off. The admins have mentioned that the polygons used a lot of data, so that’s why new ones can no longer be made, but I’m not certain what happens if they delete the ones that already exist and replace them with better ones.
This would be “throwing the baby out with the bath water.” I have made several very precise polygons for certain projects. I recognize that there are some fuzzy/flawed/incorrect polygons on iNat all over the World, but I would certainly not want my carefully crafted ones discarded. A finer surgical solution would be desirable.