Did computer vision get worse lately?

Here it is, the “Bambi & bees” French restaurant.

When I first looked at this photo of a bee, a part of me was happily shocked and said to me “what a snout, it’s Bambi ”!

And iNat computer vision was very human-like : it suggested Mustela frenata, with a quite Bambi-like snout indeed.

Yet it was not human in that another part of me was puffing, of course.

Therefore, would it be wild to suggest that contrary to computer vision, humans might use ‘two brains’ to address things in rather opposite ways, one looking for a best fit, the other for a best misfit ?

And clearly here misfit is in nothing like a poor fit !

I wonder if the misfit-detecting brain rather found a hit through data libraries for imagined monsters.

Do computer vision algorithms ever do that ? Do curated libraries exist for monsters ?

Material is plenty. I loved a very smart and expert book, ‘After man’, and of course there is plenty of science fiction, magic and carnival masks out there, in the internet and in history.

It would be only left for us to use a robust algorithm that compares fit and misfit library scores.

And perhaps more prosaically, when computer vision identifies dark elongated stones as seals, maybe mineralogy libraries would be helpful too.

1 Like

Thanks for the update.

The link to the year-old note on training the model raises the question of when do you anticipate completing an update? I see that it is extremely resource-intensive to run but I am curious about what cycle you are thinking of going forward. I assume there is a trade-off between capturing the amazing new data and the time required to run the model.

Tony - no need to reply, Chris provided the answer below.

1 Like

I’d have to find the post, but they recently wrote any new run is on hold, as it requires physically building new servers, and until iNat staff are cleared to return to in-office working, this is not possible.

Post here https://forum.inaturalist.org/t/computer-vision-training-status/21083/10


got it, thanks for the link

Just to clarify – there is a model currently being trained. It’s using the images posted to iNat before September 2020. The model after that one is the one without a start date.


Just to avoid the slightest misunderstanding :

  1. I hope it is clear that in my post above I am not trying to make fun of anybody, not people in computer vision, not people working in restaurants ; the only joke (about French cuisine) –inherited from the other post further above- seems basic, fun and inoffensive (and I am half French) ;
  2. why is my post relevant to the topic here on “possibly worsened computer vision” ? Because the results of computer vision in iNat are so overtly disparate and so often disparately and desperately unlike human cognitive patterns, that saying “bad” or “crazy” is not very different here. Some IDs are impressively good, some others are hopeless. If hopelessness increases, as suggested in this topic, it is worth going back to very basic questions about why there seems to be hopelessness at all.
  3. Up to here we have talked about situations where human and computer vision are concordant and right, or about human being right or more meaningful, and computer vision wildly wrong. But my post above is about both of them being wrong in the same way. If such a finding is addressed in depth somewhere, and notably with reference to the iNaturalist software, I just wonder if anyone can direct me to papers or more knowledgeable people. Please !
1 Like

Is it taking already added ids too seriously or just going weird things like that? I have no idea what’s confusing it in this case, it’s pretty obvious it’s a moth or at least a winged insect.


Would you be available to help me understand why there is no answer to my questions above restated here ?

Another way to put the question at a general level could be as follows : are there negative data in the training dataset so that taxa are not only compared to one another but also to completely irrelevant objects ?

If not, is it reasonable to consider additional “negative” training dataset(s) that could be analyzed separately ?

The AI model is trained by showing it a bunch of observation photos and the identifications iNat users have given to them, and it “teaches itself” to distinguish among them with relatively high accuracy. There are no “negative” or “irrelevant” data in the training dataset, and there are lots of “relevant” data that are left out (taxa without at least 100 observations). As far as the computer is concerned, the only possibilities are the subset of organisms that iNat staff have shown it.

I suppose they could add non-identifiable or non-organism training data, and then sometimes the computer vision would say “this is no organism at all!” But what is the value of that? If a person is uploading a photo of a chair, they are not engaging with iNaturalist in good faith, and the suggested non-ID is not helping them. If it’s an unusual organism (such as one that isn’t in the training set) or a low quality photo, then “no organism” could deprioritize possible correct, or partially correct, IDs. Notably, despite the low-quality photo, iNat’s computer vision seemed to pick out the face of your “bambi-bee” and suggest an animal, which is partially correct.

Unfortunately, it’s impossible to know with certainty why the AI has made one suggestion over another, because the training process writes its own code, which is usually unreadable to humans.


Thank you for your detailed answer. Yet the question of considering negative data per se and even the analogy with monsters seem useful in general, to make progress, cf “What monsters might be lurking there?”. I will try to provide more detailed arguments soon.

… I tried it now (by copying and saving the image from your post ) and now it looks reasonable as far as I can tell :

I have no idea what happened, but I wonder if small details can change the proposed IDs in absurdly divergent ways.

I guess that it would have been more useful and meaningful if both moths and plants were proposed as possible alternative IDs from the beginning, instead of proposing only one possibility, and the wrong one initially.

Similarly in my Bambi-bee example above, computer vision proposed only mammals, the wrong ID, whereas humans could easily see both the wrong and the correct ID (a mammal and an insect).

This monolithic behaviour contrasts with other computer vision proposals of widely divergent taxa, like in the other example above.

This is a reply to :

It is partly similar to human, but not correct and not human like in the good meaning.

So I have tried to clip the bambi-bee image around the honey bee’s head either above it or on the side or below which would barely affect human interpretation, but here is what happened with computer vision !

Butterfly IDs appeared next to mammals with some types of clipping (example : left image) but not with others (middle and right images) ; only mammals appeared with the greatest zoom-in (middle), but only mammals and only mustelidae with high confidence after clipping from above (right).
The only explanation is “God nose”… I apologize but these results are not in favor of robustness …

1 Like

Well, it looks like a dog head, with 2 ears and snout, all brown, so I get AI responce. Bigger differenes can be seen in image of full organism, but zoomed in and zoomed out pretty far.


@melody_86 if you truly see only a dog snout there, I would be surprised, but in any case, whatever you see, you do not change your mind just after clipping the background area in different ways !

In fact depends, hah, but really, its mandibles’ part is very long for a regular bee photo, so, maybe cropping another pic and try it?

Will you be sad if I do not say “yes” ? Indeed, this should be done carefully, extensively, and systematically by the iNat team in charge of computer vision.

The lack of robustness above suggests that this computer vision algorithm is very far from having keen eyes for the obvious differences.This seems quite a bad start for endeavours like taxonomical classification, and I think this is related to current debates on neural networks in general.

But indeed some people try to solve them. For example it had been argued that random forest algorithms are more interpretable than neural networks and should be tested in parallel. @chrisangell would you agree ? Did you try ? Are you exploring other options ?

Then perhaps two other specific replies :

What can one say to this kind of explanations, albeit standard explanations ? Aren’t predictions better than prophecies when you cross the street ? Again, random forests or something equivalent …

This makes me see the difficulty for my proposal to compare scores from multiple training datasets. But then increasing speed should be high priority, at the expense of some other parameters. Again perhaps huge problems should not be ignored or dismissed, but investigated.

In fact even a clear explanation of what the available algorithm can do and cannot do would be a good start.

I agree, but I doubt it will ever be as good as human, not soon at least. I just think I could show this pic to some non-naturalists and many of them will be confused even if they know how bee face looks, you have to analyze shape of eyes, etc. while as I understand it current system looks at the whole picture and can’t divide parts.


“As she went away, the fox remarked, 'Oh, you aren’t even ripe yet! I don’t need any sour grapes”


@odole and @melodi_96 please watch your tone and try not to derail the discussion.

M, what? I didn’t say anything bad, and I get what @odole means, though I think some cases are too hard for any kind of intellect. I don’t get tone reference at all.