Better use of location in Computer Vision suggestions

I agree wholeheartedly that geography should have greater weight for the suggested identifications. I routinely see suggestions (especially for insects, but also for more commonly identified groups such as birds) where the suggested ID is correct (usually to family or genus), but the suggested species-level identifications are all over the place, frequently of species from different continents. I strongly suspect that many naive observers suggest the first species level ID that they think looks right, with no knowledge of where that species is found, resulting in very out-of-range records. And yes, I’m aware of the “seen nearby” flag, but that doesn’t always seem to work well in my experience. At least, it seems like the “nearby” records need to be quite close, geographically, for the flag to show up. Some sort of way to down-weight geographically distant species from the suggested IDs would be awesome, and would, I think, greatly improve initial identifications.

7 Likes

Reopening until implemented (or the staff decide not to move forward with it).

I’d like to revive this and I’m mostly repeating much of what has already been said here, but I’d like to throw some additional weight behind this request. I constantly see visually similar species from disparate continents being identified far outside their native range. A great example of two visually similar species is Enemion biternatum (eastern North America) and Isopyrum thalictroides (Europe), which are constantly being submitted on the wrong continent.

Could there be a warning built into the app that says something like “This organism has not been reported within 500 km of your location. Are you sure?” This could help clean up some of the most absurd observations, which become more common during CNC and school bioblitzes.

7 Likes

Are there any massive species distribution datasets that iNat can pull in to provide appropriate IDs based on a locality?

EDIT: I was able to find this, which provides some large databases over large geographic swaths. However, integrating all of it into iNaturalist would be worth a PhD imho. As per usual (and since I can see replies covering it), there isn’t a sufficient dataset for all sorts of things, but biology/ecology is part art, part science, isn’t it?

1 Like

Maybe GBIF dataset or the checklists can be used for it.

1 Like

Another example that appears to have started with the latest computer vision update: Astragalus danicus (Eurasia) and Astragalus agrestis (North America and Northeast Asia). They are very similar (differing in the density and length of hairs), and a case could be made for treating the latter as a variety of the former, but I would prefer hearing this from a taxonomist, and not a computer.

1 Like

Indeed this feature is quite annoying and filling iNat with so many crazy and of course wrong ID suggestions…
I am tired of seeing really weird/crazy things in IDs of phasmids (it is my expertise). I even tested by myself the suggestions given in 10 cases, uploading my own pictures of high quality where the insect is shown in a very clear way and not in a strange position. Of course giving location, but somehow it seems it has no effect on the suggestions. These are the results:

Untitled-1

In every single case, only 1-2 out of 8 suggestions were within the right order and continent! And, of course, never the correct order was among the suggestions.
Btw, not a single case where I had as a suggestion just the order, Phasmida in this case. This would have been a correct suggestion in every case. And I guess in many other invertebrate orders have similar problems with this feature. So, I wonder if it really makes sense to allow suggestions below “Order” (at least for invertebrates). I have the feeling that it would be easier from a technical point of view and most of the weird IDs suggested would be avoided.

3 Likes

It is not actually that surprising that you got these results. Keep in mind that only species with over 100 submitted, identified photographs are included in the dataset used for training.

Right now there are 14 phasmid species with 100+ observations, and likely a few more below that observation count but with enough photos.

Nor is it surprising that giving a location does not impact the list of suggestions. You can quibble with the design (that is the entire point of this request) but the algorithm is a perceived visual similarity tool. It simply lists the taxa which it has been trained on that it thinks the photo most resembles.

Yes, getting better inclusion of location in the algorithm would help, and even more an update from the site on what their plans are here would be nice.

But no one should assume adding this is a trivial task, in terms of programming, data entry, data management etc.

Before any work can move forward, the site needs to decide which of at least the 4 different ways distribution data is stored in the site (range maps, checklists, atlases, submitted observations) will be used as the source, and then assuming submitted records is not chosen, then a massive effort to populate those ranges is needed.

8 Likes

I suggest these two interventions:

  1. Require location data to be entered before ID. When uploading via the desktop site, ID is the first field and location the third. Therefore AI can’t offer “Seen nearby” because it doesn’t know where it is.

I became aware of this when I got a new camera that didn’t have GPS and have to now enter my positions manually. Beforehand the metadata in the photo was automatically entered before the ID.

  1. AI should offer suggestions preferably to family, or genus at the most. It should not offer species-level suggestions. This would stop the range creep of wrongly identified species.
3 Likes

Sorry for not responding here in forever. We discussed how to address this recently and are in the early stages of some redesigns. First (and easiest) change would be to only show “seen nearby” taxa by default. If nothing has been “seen nearby”, show the results the way we do now. I suspect that’d cut down a decent number of incorrect IDs. But some design work is needed, which we’ve sketched out.

14 Likes

And some onboarding to explain ‘seen nearby’?

If it is not too far along, has there been any thoughts of making this a configuration option in your account settings, so you can opt in or out of filtering the list based on seen nearby.

Just blanket applying may cause issues for users either from less inatted areas, or at range limits where things are more rare.

2 Likes

It’s not too far along, we’d start out with mobile first, most likely. Currently the plan is to have an option to toggle between the two in the app. I imagine that could be made “sticky” so you just have to toggle it once and you’ll then see the standard list by default.

2 Likes

Has anyone suggested the removal of the “Seen Nearby” suggestion for observations that are outside of atlased ranges (when applicable)? Or remove the suggestion altogether if outside the atlased range? “Nearby” is somewhat relative; 100 km is a large distance for many species. This seems particularly relevant for observations with very restricted ranges with visually similar relatives found “nearby” (e.g. terrestrial gastropods or island endemics). Naturally taxa would need to be atlased for this to work and maybe there’s some issues with atlases anyway. But atlasing taxa seems like a way to make “nearby” more objective and species-specific. Not sure how hard that would be or how intensive it would be to check for each observation.

The biggest issue with that is atlases only work on land since they follow political boundaries, so it is an approach that does not work for two thirds of the Earth’s area (a smaller percent of observations yes, but still a bunch).

It is also unclear how much of a server hit having so many atlases would incur. By this I mean not the lookup on an individual record but the maintenance tasks to maintain them and the out of range functions.

I like the idea over-all, but it would have to be implemented with extreme care.

It should not exclude species based on location, as there is an ever increasing amount of invasive and non-native species spreading all through the world.

The ID suggestion should include non-local species, but perhaps with a geographical note along with it.

Something like: “This looks like XXYZ, which is generally found in eastern North America”

2 Likes

Has anything been done regarding this issue yet? Since the “seen nearby” feature already exists, wouldn’t it be an easy fix to restrict auto-suggestions to these ones only? If the users can tell that it’s something not on the list, they can still search it normally. The problem is that an overwhelming amount of users just pick the first thing on the list, which is often something from the wrong continent. As it stands, the system prioritizes species from high population centers (California, etc.) over anything else, regardless of the “seen nearby” function. The algorithm is clearly not good enough to detect an invasive species unless it is an extremely common species somewhere else anyway, so I wouldn’t worry about that being an issue. That is more of a rare occurrence and best left up to individual users.

Hi @saucegandhi, welcome to the iNat Forum. The staff recently responded here in this topic with this update - see above:

3 Likes

Thanks, I missed that one.

1 Like

The ‘Seen Nearby’ is also skewed bycultivated plants not being marked as casual. Then a domino effect from that first wrong one …

2 Likes