Possible increase in CV errors around organism range/location

I consistently review and identify observations in my area (primarily Montana, but sometimes dipping into other Mountain West states), so I’ve gotten familiar with commonly observed species. Usually the CV is pretty good about surfacing local organisms first in its suggestions, but lately I’ve seen more observations than usual tagged as species that are plausibly similar in appearance but wildly different in range–often not even native to the Americas! I’m not familiar with how often the CV model updates and whether there was a change recently, but as a database admin in my IRL job it pinged my spidey senses.

Anyone else noticed this? Is there a way to fix it or give feedback for the next CV update? Can variable weights be manually tweaked in the model at all or is it black box machine learning?

A couple examples of what I’m talking about (and if I’m wrong and the original ID’s were correct please let me know):

Red deer ID in Glacier NP
Common yellow conch ID in Lolo, MT
Roseleaf bramble ID in Missoula, MT

4 Likes

Two of the three examples were made using the Seek app, which might be helpful to understand what is going on.

4 Likes

That’s definitely important context to add, thank you!

3 Likes

The last couple of times I’ve uploaded observations, I’ve noticed that more of them than usual just have the “visually similar” tag without “seen nearby” and the suggestions are wildly out of range. Maybe about 1 in 10 of the observations I upload has been getting suggestions like that. Usually it’s much better.

5 Likes

I notice exactly what you said, but I’ve noticed it for a longer period of time. I was thinking it went back to the last time the CV was re-trained.

1 Like

I am using the site basically every day, but lately (not sure how long back, maybe few days) I noticed that there are fewer suggestions with the ‘seen nearby’ label.

Here is one case that I found particularly striking - I would have expected the CV had no problem to suggest the correct species (Gryllus campestris), and in fact there are two species in the correct genus suggested, but those are from a different continent.

Gryllus campestris has >5000 observations, with hundreds in the closer vicinity, so it should be well trained:

6 Likes

and here another example - location is given, but still there is not a single suggestion for ‘seen nearby’ and, also again, the correct genus is suggested twice (Vincetoxicum) - however, the very frequently reported Vincetoxicum hirundinaria is not included

5 Likes

I wonder if whatever is causing these issues also led to this person picking Neopanorpa rather than Panorpa. I’ve used the CV for several scorpion flies and don’t remember it ever having suggested Neopanorpa before

https://www.inaturalist.org/observations/131308174

1 Like

I remember that observation :grin:

I saw that it had many comments and was hoping for an interesting discussion on Panorpa identification xD

I am wondering about something that may be related and may just be coincidental but… I use a program to tag/index my photos with keywords on my computer in their RAW format and then export and upload to iNaturalist. (I use Lightroom) Well.

What I noticed was that if I tag a photo in the program, for example if I tag it with Fragile Forktail then that keyword shows up as one of the tags on the photo in iNaturalist and is the first identification suggested by CV/iNaturalist. So I have learned to not add my keywords until after I upload so that for species that I am unsure of, I can get reasonable suggestions.

I wonder if anyone else has seen that and if it may be affecting your CV suggestions?

1 Like

Sure, that’s how system works.

1 Like

Meanwhile we have the newest CV model active

1 Like

As @DianaStuder just mentioned (thanks!) there is a new CV model. Here is the link to the announcement, including details on how it was trained.

What immediately caught my eye was a part under the “Future Work” heading.

First, we are still working on new approaches to improve suggestions by combining visual similarity and geographic nearness. We still can’t share anything concrete, but we are getting closer.

That sounds to me like they are aware of and working on addressing the issue we’ve been discussing!

3 Likes

well, if anything changed after my post, it got worse - regarding these two specific cases, at least
(I used same photos and locations for this test)

First one: prior, there were at least grylloids on top, and even CV was confident about the superfamily).
Now, it is mainly beetles…

Second one: now, even the correct genus is not included at all in the suggestions

It remains to be seen whether this results inn an influx of mis-IDs in the near future, when the CV suggestions worsen

1 Like

I’d be happy to take a look if you email those photos to help@inaturalist.org along with approximate dates and locations. It’s easy to diagnose an issue without being able to replicate it on our end.

1 Like

When I cropped the photos, the IDs look good. Most photos on iNat are pretty tightly cropped, it’s best to crop to the organism as much as possible to match the photos the model is trained on.

2 Likes

So, for American pika, just a landscape of mountain scree slopes with no actual pika visible…

2 Likes

These two photos were used as examples to point out a certain CV behaviour:
That is, the suggestions were not completely off, but included some members from the correct genus, however, without including those that were seen nearby.
My expactation would have been that, if there is already a tendency by the CV (e.g. correct Superfamily), then the list of species should have included the top-observed species seen nearby (in these cases, not Gryllus or Vincetoxicum from other continents, but from Europe).


The other reason for providing you with these images was the deterioration of the quality of the suggestions after the CV update.

1 Like

i think there may be a misunderstanding of how the computer vision does what it does. it always starts with what it thinks are the best visual matches and then adjusts for whether those matches were seen nearby or not. from there, the system will send its top 10 suggestions to you. your browser will display the top 8 of those suggestions. or if you’ve chosen to filter out observations not seen nearby, you’ll get as many – up to 8 – of those same 10 suggestions as have been seen nearby.

it’s hard to reproduce suggestions in the upload screen without having the original photos (it’s easier to get the same suggestions that others get from the observation detail screen), but it looks to me like the system just didn’t think your expected taxa were obvious visual matches to your photos, and even after adjusting for proximity, your expected taxa still didn’t make the top 10 cut for their respective photos. (this is even hinted at where the suggestion for your plant start with a prompt that says the system isn’t sure what this is.)

what’s probably happening is simply that with each training, the model is trained on more and more taxa. so suppose that 3 years ago, the system was trained on only 10 of the most common crickets, if it encountered a photo that resembled a cricket, maybe 9 of its 10 suggestions would be crickets, and those crickets would just happen to be the most commonly observed ones, which would make it relatively likely that your new cricket would get a match in one of these suggestions just based on frequency probability.

suppose that now, the model has not just those 10 most common crickets but also 90 other crickets with are less commonly observed. then when the system makes its suggestions, if the photo being evaluated doesn’t have any obvious characteristics to visually match it to any particular cricket, then the computer vision’s top 10 suggestions (effectively, random selections from what is now a bigger set of crickets) might include, say 2 really common crickets and 7 not-so-common crickets. so because those suggestions include the not-so-common crickets, it’s less likely that one of those will match your cricket just based on frequency probability. nothing changed in the computer vision’s approach. it’s just that the starting set now includes a lot of less common taxa.

what tiwane’s cropping suggestion does is it provides the computer vision clearer details so that it can make better visual matches to begin with so that the suggestions are less likely to end up being just effectively random guesses.

i think you’re expecting that if you choose to show only nearby suggestions, the computer vision’s top 10 suggestions should include only nearby suggestions. but that’s not the case. the computer vision always just returns it’s top 10 proximity-adjusted visual matches, and then the page will filter those 10 suggestions based on whether or not you’ve selected the nearby option. (that’s why when you select the nearby option, you consistently end up with many fewer suggestions.)

if you wanted to change the way the computer vision makes those top 10 suggestions though, you could make that a feature request.

6 Likes