Better use of location in Computer Vision suggestions

The CV does know T. canadensis, so I’m really not sure why it suggests bacharidis here. But otherwise yeah.

I don’t believe so… Here’s a related staff response: https://forum.inaturalist.org/t/helping-the-computer-vision-is-this-wrong/13825/26

1 Like

I’ve brought this up several times on the forums, and I’ll do so again here… in its current implementation, the Computer Vision IDs greatly diminish the accuracy of the data on iNaturalist, such that experts are dissuaded from contributing. Here’s a twitter thread from a contributor, Dr. Derek Hennen, expressing this same sentiment and pointing out how he’s using this site less because of the problems caused by inaccurate IDs.

I know this is an issue that’s been discussed, but is there any progress on improving the CV IDs, particularly with respect to biogeography?

[edit: this was merged into this thread, from a separate thread originally titled "Computer Vision IDs are hurting the reputation of iNaturalist.]

2 Likes

I just wanted to give a quick update on functionality changes to better use location in CV suggestions.

On iNaturalist, we currently use location data to boost visually similar species that are also seen nearby, but we don’t do anything to demote visually similar species that aren’t seen nearby
Ken-ichi gives a good overview of this in this talk he recently gave to TDWG:
https://www.youtube.com/watch?v=xfbabznYFV0

We’ve learned from model evaluation experiments that demoting visually similar species provides better predictions on average than our current approach of not doing that. But we’ve held off because we want iNaturalist to also work well in situations where location data might not help, such as a garden filled with ornamentals or a remote location without much nearby data to draw from.

We’re currently working on altering the CV suggestions on iNaturalist so that by default it will demote visually similar species that aren’t seen nearby. But there will be a new toggle to have the CV ignore location data to accommodate these situations where location data doesn’t help (e.g. gardens, captivity).

We’re rolling this out in Android first as part of a new more elaborate ‘species chooser’ which we’re currently testing internally and hope to have in beta some time in the next month. Why Android? That’s just where we have the development resources right now. Once we’ve figured out how to make it work there, we’ll move on to changing the default/adding the toggle on the website and getting it on to the iOS app in some form.

On Seek, we currently don’t incorporate location data into the CV suggestions because Seek suggestions work offline and doing so requires getting location data on-device (there are a few exceptions related to the camera roll and older versions of iOS which use ‘online’ CV and thus location from the server). We’ve recently made progress on getting location data incorporated into the offline Seek CV suggestions (we have a working Android version) but we don’t yet have a release date. When this update is released, Seek will work in the same way as our plan for iNaturalist: i.e. demote species not seen nearby by default and have an option to ignore location data.

Thanks for bearing with us and your patience. We hope these features will help towards reducing the number of wrong IDs suggested by the CV and will thus help alleviate identifier burnout.

25 Likes

There are two competing streams for ID on iNat.

  1. A garden plant, not marked as Casual, so iNat offers visually similar and nearby - which is not necessarily helpful or useful. Especially to someone new to iNat.

  2. A (genuine) wild plant - where nearby is more useful than visually similar.

I see weird things on the distribution maps - which I try to tweak where I can.

2 Likes

One simple thing that may help in the interim is removing the language “We’re pretty sure this is…”

To inexperienced users, seeing that from the AI is difficult to distinguish from “This is…”

Pretty much every time I upload a pile of images I get a wildly incorrect “We’re pretty sure this is…”

Maybe something like, “Our top suggestion” would encourage fewer blind acceptances.

All that said, I greatly appreciate how quickly iNat has improved, and can hardly imagine how impressed we all would have been with the AI not so many years ago.

6 Likes

100% agree with this; I think a lot of people associate any kind of AI or computer software with infallibility, so when given a list of 10 suggestions they assume that one of them has to be correct

6 Likes

@loarie, thanks for sharing about this update. This seems to me like a reasonable solution, and I’m looking forward to seeing the implementation. It will help a lot! I like the idea of having the button to override location restrictions - that may also help for IDs of common (mostly introduced) species in parts of the world that haven’t accumulated many local observations yet. Just hopefully random people won’t be pressing that button too frequently.

1 Like

Thank you. It is funny to me when I see an ID for Sierra Juniper (J. grandis) in Virginia or in Poland as well as a suggested ID of J. chinensis in the middle of Idaho.

Hi all - we just released a minor change today on the web to remove non-nearby suggestions from the computer vision suggestions by default. We released something similar on Android a week or two ago and a similar change on iOS is coming soon.

We hope will reduce the frequencies of people choosing suggestions that don’t occur nearby and are thus likely wrong. But this is a minor change and doesn’t really get at the heart of whats driving these suggestions so you still may see weird things like common ancestor suggestions (ie “we’re pretty sure its in this genus”) that aren’t seen nearby or may not be optimally calculated. We realize we’re probably due to revisit our whole approach to how we present suggestions (which is coming up on being 4 years old!) - ie the common ancestor and 10 suggestions and how we combine distribution (‘seen nearby’) with the computer vision probabilities (‘seen nearby’). We’ve started some experiments to help guide this work over the coming months, and please bear with us.

But let me explain how this change works with a very simple example using the clade of Giant Southern Pill Millipedes. This order has 5 families with very distinct geographic distributions


But the computer vision model that’s in production right now only includes two taxa in this clade (ie the only possible ‘visually similar’ suggestions). They are Family Zephroniidae and the principle genus in the Family Sphaerotheriidae (green circles below). Note: if we could snap our fingers and have a new model trained up on the current database today it would also include Arthrosphaeridae and the only genus in the Procyliosomatidae (Procyliosoma) - which points to a parallel solution to these issues which is to continue growing the training dataset!

In any case I hope you can see that this set up where we have several geographically disjunct choices that look similar but only a few of them available as choices in the model is ripe for the kind of issues discussed in this thread where non-nearby taxa are continually being suggested by the computer vision algorithm and clicked on. In fact this is what we’re seeing with identifications of Zephroniidae and Sphaerotherium being suggested far outside their ranges in places like southern India etc.

For example, this observation in South Africa was ID’d as the Southeast Asian family Zephroniidae as a result of non-nearby computer vision suggestions.


But with the new change you’ll see that the suggestions now exclude such non-nearby taxa by default. The remaining pill millipede suggestion, the Visually Similar / Seen Nearby Genus Sphaerotherium is in fact the correct choice:

If you click on the new ‘Include suggestions not nearby’ toggle, you’ll now see these non-nearby suggestions as before, including the suggestion of Zephroniidae which led to this mis-ID

Similarly, this observation from Sri Lanka is now properly ID’d as Arthrosphaera.

But it has an early ID resulting from a computer vision suggestion of the non-nearby Zephroniidae. Again with this change Zephroniidae isn’t suggested even though the correct ID Arthrosphaeridae/Arthrosphaera isn’t suggested because its not available as a suggestion in the current model:

Again if you click on “Include suggestions not seen nearby” you’ll see these non-nearby suggestions of Sphaerotherium and Zephroniidae as before:

Even though hiding these non-nearby suggestions by default was a very easy fix to make we were hesitant to do it because we suspect these changes may lead people astray when they’re in settings where the distributions don’t help a lot with identifications such as gardens. For example, if someone is looking at a garden plant that is commonly observed enough globally to be a suggestion in the computer vision model but isn’t represented by nearby observations (maybe its rarely planted in that part of the world) we are now hiding the correct visually similar suggestion by default. The user will have to know to click on “Include suggestions not seen nearby” to reveal it. We expect that in these use cases the user might click on an incorrect visually similar/seen nearby native plant instead of the correct visually similar garden plant. So hopefully we aren’t just replacing one set of mis-identifications with another.

As mentioned above though, we suspect this will be the the last tweak to our existing setup before a more major revamp to how we go about mashing up distribution data and computer vision suggestions and rolling probabilities up and down the tree to come up with a suggestion or set of suggestions. So please let us know if you think this minor tweak is an improvement to the status quo or not. And stand by for hopefully deeper tweaks to how iNat makes suggestions in the coming months.

27 Likes

This looks very promising @loarie, thanks to you and the iNat team! Let’s see how it plays out. Is there any way to compare its effect systematically, such as proportion of computer vision suggestions with later disagreements before and after? Disagreements would have to be within x months because older observations have had more time to accumulate additional IDs.

Marking this as “solution” although it’s great to see there are some even greater changes in the pipeline.

2 Likes

I agree, looks promising thanks for doing it. To address Loarie’s concern above that “captive/cultivated” species may be misIDed at a greater rate with this change, what about having the default toggle to include visually similar taxa when the observer checks the box that the observation is captive/cultivated?

From my experience, when I have an observation that is captive/cultivated, the CV doesn’t work as well anyway, and I know to discount suggestions that are far away geographically, but I can easily see how a new user might not know this and I would expect that the error rate to go down with that change that you made in effect.

Finally - unrelated question: How many taxa make up the current iteration of the CV model? Would be be possible to add an icon showing that a taxa is included in the CV, maybe on the taxon page under taxonomy? Would that be useful?

2 Likes

There have been a few odd AI behaviors noted today:

https://forum.inaturalist.org/t/did-something-change-with-the-ai/20834/14

1 Like

It looks like the cause of that has since been identified and fixed, and was unrelated to this thread on use of location :)

1 Like

It is not just cultivated plants where “Include suggestions not nearby” is useful. Recently I posted a observation of a bug which the AI suggested was a European Firebug. I rejected this because the species did not occur in Australia according to national databases. Turned it was indeed a European Firebug, which has been naturalized here since 2018. Very embarrassing - I should have investigated further since there were already many local observations, and it was not likely they were all wrong. But newly invasive species will unfortunately now be missed unless you click the “Include suggestions not nearby” button, which most people will not do. Perhaps species which are a close match should be included in the suggestions but with a warning that they are outside of the previously known range? It could also activate a siren and flashing lights in the office of the local bio-security authorities. The early discovery of one invasive species would by itself justify the existence of inaturalist.

4 Likes

In that case, I would think that the species would already be detected as being “seen nearby” - unless that has to wait until the next iteration of the AI model.

4 Likes

I think this is a really good change, and the benefit it will have with native species outweighs the drawbacks with cultivated and introduced species. In the reptile world it’s really common for things to get misidentified since a lot of species within a genus look really similar, and a lot of people never come back and fix their initial identification so it ends up taking three correct identifications to get it to research grade instead of just one agreeing with the AI.

I would think common garden plants will probably have nearby observations just about everywhere, so that should help with there already being nearby observations. It’s just going to be getting that first observation of a cultivated/invasive species that’s a little harder.

5 Likes

It looked like this change happened but has now been undone? Is there another thread where that is discussed?

Working for me on web, iOS (version 3.2, which isn’t at full release yet), and Android, eg

1 Like

Aaand, version 3.2 of our iOS app has been fully released. So current versions of the iNat Android app, and iOS app (plus the website) show only “Seen Nearby” computer vision results by default. See @loarie’s explanation here. I’m going to close this request.

6 Likes