Workings of iNat Artificial Intelligence (AI) aka Computer Vision (CV)

This is rather a technical question: When posting an observation with more than one photo, does the iNat AI ‘look’ beyond the first photo? If it does, is equal ‘weight’ given to subsequent photos?
I have experimented with multiple photos (in the one observation) by changing the order in which they are loaded and it has resulted in quite different ID suggestions coming up via the AI function. Most recently, I posted photos of what was subsequently ID’d as a Squeaking Longhorn Beetle. In my initial posting I used as the first photo one of the beetle taken in-situ on a branch. I thought the photo was reasonably clear although the beetle was not overly distinct against the background. The AI only came up with one suggestion which was a spider (presumably it ‘mistook’ the antenae for an extra pair of legs).
When I changed the order of the photos and used one of the beetle on my hand (where it was much more clearly delineated) as the first photo, the AI gave me a Longhorn Beetle Sub-family, and several different species to choose from (albeit not including the correct one which is endemic to NZ). [As an aside, see recent topic of “Fingers in Photos” - point proven!]
So, obviously it helps to choose, for the first photo, one where as many features of the specimen are shown, as clearly as possible, and with as little background clutter as possible.
Are there other principles to follow when posting multiple photos? eg for plants, should it be flowers before leaves? Is the iNat AI able to take account of separate photos of the upper and underside of a leaf (both of which may be crucial to an ID) or is it best to try to capture both in one photo?
Enough questions already! Go for it!

CV looks at first photo only.


If you are using the web interface, during the upload process you can get the AI to look at each photo before you make them into one observation, if you want to know if different photos give you different results.


I also check individual autosuggestions for multiple photos at times and then use the common denominator.

It would be nice if this was automated at some point so it didn’t just use the first photo but
evaluated the first 3 …or all of the photos across a set…

But I guess when people add broad habitat shots and things in to the mix it might create an even more confusing output.

1 Like

it all just depends on what was included in the set that the Computer Vision trained on. if all the photos of a particular plant included only flowers, then it would only have a chance to recommend that plant if you also posted a photo with flowers as your first photo in the observation. if the training set for at plant includes a good mix of flowers, both surfaces of the leaves, stems, etc., then it probably wouldn’t matter what part of the plant was included in your own photo.

one thing that others haven’t mentioned is that the computer vision will take into account nearby observations when making suggestions. so adding a location may improve your results. also making a rough initial ID may help improve suggestions from the computer vision, as I’m fairly certain that it will try to limit suggestions to whatever the “iconic taxon” of the observation is. (so in your case, if you had selected beetles as an initial ID, you should have seen subsequent suggestions from the computer vision include only insects, not spiders.)

the last thing is that there are different implementations of the computer vision model, if i’m remembering correctly. so for example, if you use Seek, it may offer slightly different suggestions, since it uses a simplified model vs. the iNat website.

if you search the forum, the staff have already addressed most of your questions in various posts.


I see some obs where the first photo is planty and only in the second or third do I realise ‘we are looking at the butterfly’ which is finally, clear. If it is sitting in Unknown not everyone will dig thru to the butterfly - and those who filter for Lepidoptera won’t see it.

Probably worth remembering a point that iNat staff have made previously, Computer Vision does not “see” organisms, it “sees” pictures of organisms. Or maybe more precisely it analyzes arrays of ones and zeroes in files submitted by iNat users and finds patterns in the data, which it then tries to match to patterns in its existing database. So the “best” photo, the one most likely to return the correct identification, would be the one most similar to all of the other photos of the organism that are already in the system.

Of course it is very difficult to know what that looks like in advance, so my approach is to try to include photos that provide enough information to allow a human being to identify the organism. I treat the CV suggestion as just that: a suggestion, and as a starting point to learn more.


Yes, I’ve heard this many times before but it doesn’t make a lot of sense, but maybe I’m missing something. When I look at an organism I do not “see” an organism, I “see” a projection of the organism onto my retina that my brain then interprets (usually incorrectly). The distinction between a “picture of an organism” and an “organism” doesn’t seem to be adequately explained to me… That projection on my retina is, well, a picture at the end of the day

1 Like

That’s quite different, your brain analyzes the image and sees where one object ends and another starts, you have more than eyes in your experience and not only vision-related parts of brain are activated when you see an object, program can’t do that, it just sees patterns and it doesn’t matter if it’s one object or ten, together or separate, it can’t guess the texture, smell, anything that a brain does even when you don’t want it to do.


The android app will also look at each photo (and can offer different suggestions for each). The iPhone app doesn’t let you flip between photos when making an ID, though.

1 Like

A couple of examples from my experience:

I made an observation of a Rock Wren (bird) on a granite boulder in a talus field. Computer Vision identified it as a Pika (mammal), because it has been trained on hundreds of images that contain granite boulders in talus fields that have been identified as Pika.

Another observation that I encountered that sticks in my mind had a photograph showing a flower as a blurry purplish blob in the foreground superimposed on a different but well-focused plant in the background. Computer Vision “correctly” recognized that the subject of the photograph was intended to be the fuzzy purple blob in the foreground, because iNat users (including me!) have uploaded so many blurry flower photos. Of course there was no detail in the flower that would allow anybody to identify it, but CV made a species-level suggestion for a plant that does not grow on this continent.


I think this video has a decent explanation of how basic image recognition algorithms work. The computer is asked to tell the difference between a rectangle and a circle, and you can see in the video that its method for doing so is nothing like how you would teach a human to tell the difference between a rectangle and a circle.


On iPhone app, you can toggle which photo is the first one in order to see the different CV suggestions. There are tiny buttons below the photos for toggling. The little buttons are very challenging to hit. Takes us multiple tries.

but, how often, people confuse binomials which are the same for a plant and an animal. So many kingdom disagreements because - how did that happen - people confuse for example, Erica who is a spider instead of Erica (heather). If people struggle, no wonder CV does.

1 Like

Yes, but I was pointing out that the android version works similarly to the web version (as Vireya pointed out), where you can compare CV to all pics without having to switch back and forth between screens to toggel which pic is first.

1 Like

PS instead of circle vs rectangle imagine the AI / CV working thru these
Kingdom Disagreements

1 Like

@twr61 explained it really well, but I’ll add one more example: pinned insects. Let’s say we trained the CV model on only pinned and spread photos of a certain species of moth. Those photos might contain all of the required diagnostic details for that species - maybe some special mark on the underwing. But they would be photos of that species that a) all had a white background and b) showed the moth in a position that almost no iNaturalist user could replicate when photographing the moth in situ.

So if I took a photo of that moth species on a leaf or on some bark, with its wings closed, my photo of the moth would look very different than any of the photos the model was trained on for that speices because the background isn’t white and the wings are closed. So the model might really struggle getting a proper ID from my photo.


The most obvious way that this manifests is that the CV has no concept of “subject” vs. “background”. If an insect is against human skin, it’s likely to suggest a mosquito, regardless of whether the insect looks anything like a mosquito, because most photos of insects against skin are of mosquitoes. (A good way to avoid this bias is to crop out as much of the background as is reasonable).

As other examples, if a lot of pictures of one species are blurry, then the CV may assume a blurry picture is likely to be that species. If a lot of pictures of a nocturnal species are very dark, it may be inclined to suggest that species for any dark picture.

These are all characteristics of the photo, rather than the organism. In some cases they may be pretty good clues for what is likely to be in the photo, but in other cases they can mislead the CV.


You didn’t include a scientific name, but I’m assuming you mean this speices. It’s not in the model (go to the About tab on the taxon page, look on the right-hand side). So you would have never gotten the correct species ID from CV.

Yeah, Seek doens’t use the current model that’s available on the website and the iNat mobile apps. It’s older (so fewer taxa), simplified, it has a higher confidence threshold (as it only shows one suggestion at a time), and it doesn’t take location into account when displaying suggestions.

Yeah, always good to keep in mind. It’s a suggestion, and iNat’s power is really in the community.


There are some mind-boggling examples there!

A proportion of them have arisen because someone accepted an obviously incorrect CV suggestion. Then it takes a bunch of intelligent humans to over-rule the CV.

1 Like