There are a number of discussions going on about inappropriate species being suggested by the app and accepted by novice users, particularly outside of data-rich areas such as California. The computer vision suggestions are based on visual similarity, and while several changes have been made to address it, this problem still persists.
This request is for better incorporation of geographic location into the suggestions presented by the app, either to reduce the number of suggestions of species not found for thousands of km, and/or to make it even more obvious that some suggested taxa are not expected to occur in the area.
Positive changes that have already been made include only marking species as “Seen Nearby” if there is a Research Grade observation from the region, and we can now see which IDs have been added from the Computer Vision suggestions. However, based on the discussions I’ve seen and my experience as an identifier in Brazil, further change is still needed.
I agree that the suggestion of species from the other side of the globe is frustrating, but I think in (rare) cases it can be ‘useful’, e.g. tracking the spread of invasives whether due to human-mediated transport or expansion via climate change.
I agree wholeheartedly that geography should have greater weight for the suggested identifications. I routinely see suggestions (especially for insects, but also for more commonly identified groups such as birds) where the suggested ID is correct (usually to family or genus), but the suggested species-level identifications are all over the place, frequently of species from different continents. I strongly suspect that many naive observers suggest the first species level ID that they think looks right, with no knowledge of where that species is found, resulting in very out-of-range records. And yes, I’m aware of the “seen nearby” flag, but that doesn’t always seem to work well in my experience. At least, it seems like the “nearby” records need to be quite close, geographically, for the flag to show up. Some sort of way to down-weight geographically distant species from the suggested IDs would be awesome, and would, I think, greatly improve initial identifications.
I’d like to revive this and I’m mostly repeating much of what has already been said here, but I’d like to throw some additional weight behind this request. I constantly see visually similar species from disparate continents being identified far outside their native range. A great example of two visually similar species is Enemion biternatum (eastern North America) and Isopyrum thalictroides (Europe), which are constantly being submitted on the wrong continent.
Could there be a warning built into the app that says something like “This organism has not been reported within 500 km of your location. Are you sure?” This could help clean up some of the most absurd observations, which become more common during CNC and school bioblitzes.
Are there any massive species distribution datasets that iNat can pull in to provide appropriate IDs based on a locality?
EDIT: I was able to find this, which provides some large databases over large geographic swaths. However, integrating all of it into iNaturalist would be worth a PhD imho. As per usual (and since I can see replies covering it), there isn’t a sufficient dataset for all sorts of things, but biology/ecology is part art, part science, isn’t it?
Another example that appears to have started with the latest computer vision update: Astragalus danicus (Eurasia) and Astragalus agrestis (North America and Northeast Asia). They are very similar (differing in the density and length of hairs), and a case could be made for treating the latter as a variety of the former, but I would prefer hearing this from a taxonomist, and not a computer.
Indeed this feature is quite annoying and filling iNat with so many crazy and of course wrong ID suggestions…
I am tired of seeing really weird/crazy things in IDs of phasmids (it is my expertise). I even tested by myself the suggestions given in 10 cases, uploading my own pictures of high quality where the insect is shown in a very clear way and not in a strange position. Of course giving location, but somehow it seems it has no effect on the suggestions. These are the results:
In every single case, only 1-2 out of 8 suggestions were within the right order and continent! And, of course, never the correct order was among the suggestions.
Btw, not a single case where I had as a suggestion just the order, Phasmida in this case. This would have been a correct suggestion in every case. And I guess in many other invertebrate orders have similar problems with this feature. So, I wonder if it really makes sense to allow suggestions below “Order” (at least for invertebrates). I have the feeling that it would be easier from a technical point of view and most of the weird IDs suggested would be avoided.
It is not actually that surprising that you got these results. Keep in mind that only species with over 100 submitted, identified photographs are included in the dataset used for training.
Right now there are 14 phasmid species with 100+ observations, and likely a few more below that observation count but with enough photos.
Nor is it surprising that giving a location does not impact the list of suggestions. You can quibble with the design (that is the entire point of this request) but the algorithm is a perceived visual similarity tool. It simply lists the taxa which it has been trained on that it thinks the photo most resembles.
Yes, getting better inclusion of location in the algorithm would help, and even more an update from the site on what their plans are here would be nice.
But no one should assume adding this is a trivial task, in terms of programming, data entry, data management etc.
Before any work can move forward, the site needs to decide which of at least the 4 different ways distribution data is stored in the site (range maps, checklists, atlases, submitted observations) will be used as the source, and then assuming submitted records is not chosen, then a massive effort to populate those ranges is needed.
Require location data to be entered before ID. When uploading via the desktop site, ID is the first field and location the third. Therefore AI can’t offer “Seen nearby” because it doesn’t know where it is.
I became aware of this when I got a new camera that didn’t have GPS and have to now enter my positions manually. Beforehand the metadata in the photo was automatically entered before the ID.
AI should offer suggestions preferably to family, or genus at the most. It should not offer species-level suggestions. This would stop the range creep of wrongly identified species.
Sorry for not responding here in forever. We discussed how to address this recently and are in the early stages of some redesigns. First (and easiest) change would be to only show “seen nearby” taxa by default. If nothing has been “seen nearby”, show the results the way we do now. I suspect that’d cut down a decent number of incorrect IDs. But some design work is needed, which we’ve sketched out.
It’s not too far along, we’d start out with mobile first, most likely. Currently the plan is to have an option to toggle between the two in the app. I imagine that could be made “sticky” so you just have to toggle it once and you’ll then see the standard list by default.
Has anyone suggested the removal of the “Seen Nearby” suggestion for observations that are outside of atlased ranges (when applicable)? Or remove the suggestion altogether if outside the atlased range? “Nearby” is somewhat relative; 100 km is a large distance for many species. This seems particularly relevant for observations with very restricted ranges with visually similar relatives found “nearby” (e.g. terrestrial gastropods or island endemics). Naturally taxa would need to be atlased for this to work and maybe there’s some issues with atlases anyway. But atlasing taxa seems like a way to make “nearby” more objective and species-specific. Not sure how hard that would be or how intensive it would be to check for each observation.
The biggest issue with that is atlases only work on land since they follow political boundaries, so it is an approach that does not work for two thirds of the Earth’s area (a smaller percent of observations yes, but still a bunch).
It is also unclear how much of a server hit having so many atlases would incur. By this I mean not the lookup on an individual record but the maintenance tasks to maintain them and the out of range functions.