Better use of location in Computer Vision suggestions

I’d like to revive this and I’m mostly repeating much of what has already been said here, but I’d like to throw some additional weight behind this request. I constantly see visually similar species from disparate continents being identified far outside their native range. A great example of two visually similar species is Enemion biternatum (eastern North America) and Isopyrum thalictroides (Europe), which are constantly being submitted on the wrong continent.

Could there be a warning built into the app that says something like “This organism has not been reported within 500 km of your location. Are you sure?” This could help clean up some of the most absurd observations, which become more common during CNC and school bioblitzes.

7 Likes

Are there any massive species distribution datasets that iNat can pull in to provide appropriate IDs based on a locality?

EDIT: I was able to find this, which provides some large databases over large geographic swaths. However, integrating all of it into iNaturalist would be worth a PhD imho. As per usual (and since I can see replies covering it), there isn’t a sufficient dataset for all sorts of things, but biology/ecology is part art, part science, isn’t it?

1 Like

Maybe GBIF dataset or the checklists can be used for it.

2 Likes

Another example that appears to have started with the latest computer vision update: Astragalus danicus (Eurasia) and Astragalus agrestis (North America and Northeast Asia). They are very similar (differing in the density and length of hairs), and a case could be made for treating the latter as a variety of the former, but I would prefer hearing this from a taxonomist, and not a computer.

1 Like

Indeed this feature is quite annoying and filling iNat with so many crazy and of course wrong ID suggestions…
I am tired of seeing really weird/crazy things in IDs of phasmids (it is my expertise). I even tested by myself the suggestions given in 10 cases, uploading my own pictures of high quality where the insect is shown in a very clear way and not in a strange position. Of course giving location, but somehow it seems it has no effect on the suggestions. These are the results:

Untitled-1

In every single case, only 1-2 out of 8 suggestions were within the right order and continent! And, of course, never the correct order was among the suggestions.
Btw, not a single case where I had as a suggestion just the order, Phasmida in this case. This would have been a correct suggestion in every case. And I guess in many other invertebrate orders have similar problems with this feature. So, I wonder if it really makes sense to allow suggestions below “Order” (at least for invertebrates). I have the feeling that it would be easier from a technical point of view and most of the weird IDs suggested would be avoided.

4 Likes

It is not actually that surprising that you got these results. Keep in mind that only species with over 100 submitted, identified photographs are included in the dataset used for training.

Right now there are 14 phasmid species with 100+ observations, and likely a few more below that observation count but with enough photos.

Nor is it surprising that giving a location does not impact the list of suggestions. You can quibble with the design (that is the entire point of this request) but the algorithm is a perceived visual similarity tool. It simply lists the taxa which it has been trained on that it thinks the photo most resembles.

Yes, getting better inclusion of location in the algorithm would help, and even more an update from the site on what their plans are here would be nice.

But no one should assume adding this is a trivial task, in terms of programming, data entry, data management etc.

Before any work can move forward, the site needs to decide which of at least the 4 different ways distribution data is stored in the site (range maps, checklists, atlases, submitted observations) will be used as the source, and then assuming submitted records is not chosen, then a massive effort to populate those ranges is needed.

8 Likes

I suggest these two interventions:

  1. Require location data to be entered before ID. When uploading via the desktop site, ID is the first field and location the third. Therefore AI can’t offer “Seen nearby” because it doesn’t know where it is.

I became aware of this when I got a new camera that didn’t have GPS and have to now enter my positions manually. Beforehand the metadata in the photo was automatically entered before the ID.

  1. AI should offer suggestions preferably to family, or genus at the most. It should not offer species-level suggestions. This would stop the range creep of wrongly identified species.
4 Likes

Sorry for not responding here in forever. We discussed how to address this recently and are in the early stages of some redesigns. First (and easiest) change would be to only show “seen nearby” taxa by default. If nothing has been “seen nearby”, show the results the way we do now. I suspect that’d cut down a decent number of incorrect IDs. But some design work is needed, which we’ve sketched out.

17 Likes

And some onboarding to explain ‘seen nearby’?

If it is not too far along, has there been any thoughts of making this a configuration option in your account settings, so you can opt in or out of filtering the list based on seen nearby.

Just blanket applying may cause issues for users either from less inatted areas, or at range limits where things are more rare.

2 Likes

It’s not too far along, we’d start out with mobile first, most likely. Currently the plan is to have an option to toggle between the two in the app. I imagine that could be made “sticky” so you just have to toggle it once and you’ll then see the standard list by default.

2 Likes

Has anyone suggested the removal of the “Seen Nearby” suggestion for observations that are outside of atlased ranges (when applicable)? Or remove the suggestion altogether if outside the atlased range? “Nearby” is somewhat relative; 100 km is a large distance for many species. This seems particularly relevant for observations with very restricted ranges with visually similar relatives found “nearby” (e.g. terrestrial gastropods or island endemics). Naturally taxa would need to be atlased for this to work and maybe there’s some issues with atlases anyway. But atlasing taxa seems like a way to make “nearby” more objective and species-specific. Not sure how hard that would be or how intensive it would be to check for each observation.

The biggest issue with that is atlases only work on land since they follow political boundaries, so it is an approach that does not work for two thirds of the Earth’s area (a smaller percent of observations yes, but still a bunch).

It is also unclear how much of a server hit having so many atlases would incur. By this I mean not the lookup on an individual record but the maintenance tasks to maintain them and the out of range functions.

I like the idea over-all, but it would have to be implemented with extreme care.

It should not exclude species based on location, as there is an ever increasing amount of invasive and non-native species spreading all through the world.

The ID suggestion should include non-local species, but perhaps with a geographical note along with it.

Something like: “This looks like XXYZ, which is generally found in eastern North America”

3 Likes

Has anything been done regarding this issue yet? Since the “seen nearby” feature already exists, wouldn’t it be an easy fix to restrict auto-suggestions to these ones only? If the users can tell that it’s something not on the list, they can still search it normally. The problem is that an overwhelming amount of users just pick the first thing on the list, which is often something from the wrong continent. As it stands, the system prioritizes species from high population centers (California, etc.) over anything else, regardless of the “seen nearby” function. The algorithm is clearly not good enough to detect an invasive species unless it is an extremely common species somewhere else anyway, so I wouldn’t worry about that being an issue. That is more of a rare occurrence and best left up to individual users.

Hi @saucegandhi, welcome to the iNat Forum. The staff recently responded here in this topic with this update - see above:

3 Likes

Thanks, I missed that one.

1 Like

The ‘Seen Nearby’ is also skewed bycultivated plants not being marked as casual. Then a domino effect from that first wrong one …

5 Likes

This suggestion was made 18 months ago but hasn’t been implemented, and is surely one of the biggest weaknesses of the system. While suggestions for some taxonomic groups are good, some are impossible. For example, many, or even most, British Hoverflies (Syrphidae), even with very clear images, get suggestions of species or genus that are only American.

4 Likes

I’m going to share my 2 cents here.

As far as I see it, the main priority of CV is for helping and providing appropriate identifications. It has a second role of speeding up IDs, as well (often times waiting for CV to load is far quicker than typing a name and selecting it from the list).

There have been many times personally where CV suggested a species that was not “seen nearby” and it has been correct. So it’s actually been a big help especially when I was in Australia; a lot of the species were Asian and weedy there, but not otherwise reported on iNat yet.

My ideal vision for CV would incorporate all of these outcomes. The difficult question is “how can you control what is an appropriate suggestion that isn’t seen nearby?”, while retaining the remaining functionality. And I think it could work by using a priority list for how it displays results. Not just cutting “not seen nearby” taxa entirely, because that would be foolish and it removes the ability for CV to provide very helpful input on what something looks like without location bias. One of the main problems I see is the whole “it hasn’t been reported here, so therefore it isn’t”, and people have lower expectations about new count record or “rare” options than they should. But at the same time this still needs to meet the most important focus of CV, which is not a replacement for ID skills, but accurate suggestions.

Here’s what I feel could work. CV runs the images, and develops a number of suggestions, that would be displayed in the following order of priority.

1st priority. Genus name for the top suggestion if that species is “seen nearby”, and if that genus is not monotypic (if it is monotypic, move to 2nd priority). I say genus and not species, because often times the genus ID is vastly more accurate (by proxy of it being less specific). Species IDs tend to incur the most incorrect suggestions in my experience, especially in genera with a lot of local options.

2nd priority. Species name for the top suggestions, if that species is “seen nearby”. This provides a support to the 1st entry, but without dismissing species suggestion entirely. Very important in my eyes.

3rd priority. This is what I’ll call the “general” suggestion. This would be the family ID of the most similar taxa. By default this would incorporate “seen nearby” taxa, but if not, then it’ll just choose the next most likely decision without location bias. For instance, it looks like Ericameria and Isocoma, which are both Asteraceae, so Asteraceae will be chosen as on option. This is also helpful because for someone like me who might know the plants, and can see that the top option is wrong but is at least mostly right, it is useful having the family ID quickly available to click on.

4th priority. Other suggestions that are similar, but not “seen nearby”. For instance, I post a mint and it closely resembles a species from Europe, that isn’t known from California.

The 2nd and 4th priority lists are not restricted to one entry, though 1st and 3rd would be. At least in my head this makes the most sense in terms of how CV works best.

4 Likes