Did computer vision get worse lately?

I was just uploading some stuff, and so this isn’t the best example, but it might be a good starting point to explain the problem.

To be fair, this is obviously an unusual observation: my intended observation was the wasp galls, which are out of their element. But the AI picked some curious options…snails, rabbits, or several types of psychedelic mushrooms. Maybe the AI is thinking of opening a very experimental French psychedelic fusion restaurant?

Actually, let me clarify in case that sounded harsh or sarcastic: with lots of normal observations, it is still as good as it ever was. With common flowers from normal angles, it guesses them. But when it is something like this, or perhaps a bird in flight, it throws up a number of guesses seemingly without connection.

It’s a photo without obvious focus, many other observations also have objects hiding somewhere, system doesn’t see the difference and thinks it’s just again a rabbit behind grass or mis-ided shroom again somewhere in the grass (and it does look like shrooms from this preview), they just look similar for the programm. It was always working this way with such photos in my experience.


Could well be rabbit droppings!


Can you please send the original photo to help@inaturalist.org? I’d be curious to test it out. Please make sure its metadata is intact.

Those seem like logical suggestions to me. That totally looks like a photo of rabbit droppings, mushrooms, or snails. Cropping can help a lot, I suspect you might get very different results if you cropped the photo to the subject - most gall photos on iNat are pretty tight shots, in my experience. Remember that iNat computer vision isn’t trained to identify taxa, it’s trained to identify iNaturalist photos of taxa. It processes at your photo and spits out results that say “this looks like other iNaturalist photos of [X taxon]”.


I had a bunch of Orthosia and all of them auto-suggest O. hibisci now. I will see if I can replicate the problem and send the image, but I can’t recall which Orthosia photo had the issue.

Here it is, the “Bambi & bees” French restaurant.

When I first looked at this photo of a bee, a part of me was happily shocked and said to me “what a snout, it’s Bambi ”!

And iNat computer vision was very human-like : it suggested Mustela frenata, with a quite Bambi-like snout indeed.

Yet it was not human in that another part of me was puffing, of course.

Therefore, would it be wild to suggest that contrary to computer vision, humans might use ‘two brains’ to address things in rather opposite ways, one looking for a best fit, the other for a best misfit ?

And clearly here misfit is in nothing like a poor fit !

I wonder if the misfit-detecting brain rather found a hit through data libraries for imagined monsters.

Do computer vision algorithms ever do that ? Do curated libraries exist for monsters ?

Material is plenty. I loved a very smart and expert book, ‘After man’, and of course there is plenty of science fiction, magic and carnival masks out there, in the internet and in history.

It would be only left for us to use a robust algorithm that compares fit and misfit library scores.

And perhaps more prosaically, when computer vision identifies dark elongated stones as seals, maybe mineralogy libraries would be helpful too.

1 Like

Thanks for the update.

The link to the year-old note on training the model raises the question of when do you anticipate completing an update? I see that it is extremely resource-intensive to run but I am curious about what cycle you are thinking of going forward. I assume there is a trade-off between capturing the amazing new data and the time required to run the model.

Tony - no need to reply, Chris provided the answer below.

1 Like

I’d have to find the post, but they recently wrote any new run is on hold, as it requires physically building new servers, and until iNat staff are cleared to return to in-office working, this is not possible.

Post here https://forum.inaturalist.org/t/computer-vision-training-status/21083/10


got it, thanks for the link

Just to clarify – there is a model currently being trained. It’s using the images posted to iNat before September 2020. The model after that one is the one without a start date.


Just to avoid the slightest misunderstanding :

  1. I hope it is clear that in my post above I am not trying to make fun of anybody, not people in computer vision, not people working in restaurants ; the only joke (about French cuisine) –inherited from the other post further above- seems basic, fun and inoffensive (and I am half French) ;
  2. why is my post relevant to the topic here on “possibly worsened computer vision” ? Because the results of computer vision in iNat are so overtly disparate and so often disparately and desperately unlike human cognitive patterns, that saying “bad” or “crazy” is not very different here. Some IDs are impressively good, some others are hopeless. If hopelessness increases, as suggested in this topic, it is worth going back to very basic questions about why there seems to be hopelessness at all.
  3. Up to here we have talked about situations where human and computer vision are concordant and right, or about human being right or more meaningful, and computer vision wildly wrong. But my post above is about both of them being wrong in the same way. If such a finding is addressed in depth somewhere, and notably with reference to the iNaturalist software, I just wonder if anyone can direct me to papers or more knowledgeable people. Please !
1 Like

Is it taking already added ids too seriously or just going weird things like that? I have no idea what’s confusing it in this case, it’s pretty obvious it’s a moth or at least a winged insect.


Would you be available to help me understand why there is no answer to my questions above restated here ?

Another way to put the question at a general level could be as follows : are there negative data in the training dataset so that taxa are not only compared to one another but also to completely irrelevant objects ?

If not, is it reasonable to consider additional “negative” training dataset(s) that could be analyzed separately ?

The AI model is trained by showing it a bunch of observation photos and the identifications iNat users have given to them, and it “teaches itself” to distinguish among them with relatively high accuracy. There are no “negative” or “irrelevant” data in the training dataset, and there are lots of “relevant” data that are left out (taxa without at least 100 observations). As far as the computer is concerned, the only possibilities are the subset of organisms that iNat staff have shown it.

I suppose they could add non-identifiable or non-organism training data, and then sometimes the computer vision would say “this is no organism at all!” But what is the value of that? If a person is uploading a photo of a chair, they are not engaging with iNaturalist in good faith, and the suggested non-ID is not helping them. If it’s an unusual organism (such as one that isn’t in the training set) or a low quality photo, then “no organism” could deprioritize possible correct, or partially correct, IDs. Notably, despite the low-quality photo, iNat’s computer vision seemed to pick out the face of your “bambi-bee” and suggest an animal, which is partially correct.

Unfortunately, it’s impossible to know with certainty why the AI has made one suggestion over another, because the training process writes its own code, which is usually unreadable to humans.


Thank you for your detailed answer. Yet the question of considering negative data per se and even the analogy with monsters seem useful in general, to make progress, cf “What monsters might be lurking there?”. I will try to provide more detailed arguments soon.

… I tried it now (by copying and saving the image from your post ) and now it looks reasonable as far as I can tell :

I have no idea what happened, but I wonder if small details can change the proposed IDs in absurdly divergent ways.

I guess that it would have been more useful and meaningful if both moths and plants were proposed as possible alternative IDs from the beginning, instead of proposing only one possibility, and the wrong one initially.

Similarly in my Bambi-bee example above, computer vision proposed only mammals, the wrong ID, whereas humans could easily see both the wrong and the correct ID (a mammal and an insect).

This monolithic behaviour contrasts with other computer vision proposals of widely divergent taxa, like in the other example above.

This is a reply to :

It is partly similar to human, but not correct and not human like in the good meaning.

So I have tried to clip the bambi-bee image around the honey bee’s head either above it or on the side or below which would barely affect human interpretation, but here is what happened with computer vision !

Butterfly IDs appeared next to mammals with some types of clipping (example : left image) but not with others (middle and right images) ; only mammals appeared with the greatest zoom-in (middle), but only mammals and only mustelidae with high confidence after clipping from above (right).
The only explanation is “God nose”… I apologize but these results are not in favor of robustness …

1 Like

Well, it looks like a dog head, with 2 ears and snout, all brown, so I get AI responce. Bigger differenes can be seen in image of full organism, but zoomed in and zoomed out pretty far.


@melody_86 if you truly see only a dog snout there, I would be surprised, but in any case, whatever you see, you do not change your mind just after clipping the background area in different ways !

In fact depends, hah, but really, its mandibles’ part is very long for a regular bee photo, so, maybe cropping another pic and try it?