iNaturalist Computer Vision Ranking and Combined Observations

I am a fairly new user to iNaturalist. I am interested in the efficacy of the Computer Vision in comparison to user comment identifications and my identifications for the same specimens. I have a couple outstanding questions about Computer Vision that I have had a difficult time tracking down.

First, are suggested identifications for a photograph ranked in any particular order? For example, if I post a photo of a lizard and I receive 5 suggested species identifications (from computer vision), is the first identification (identification suggested at the top) the most likely identification? Alternatively, are all five of the suggested identifications similarly/equally accurate? I do understand that the algorithm suggestions do not necessarily correspond to accurate identifications.

Second, when combing multiple photos for a single observation (e.g., 3 photos of the same lizard), how does this impact the algorithm’s suggested identifications? If suggested identification ranking matters (from the first paragraph), then does the algorithm combine suggested identifications from each of the 3 lizard photos for suggested identifications? Alternatively, do I just receive suggested identifications from the first of the 3 lizard images?

Thanks in advance. Please let me know if I can clarify. I have searched for these answers somewhat substantially, but I may not be aware of iNat lingo for optimal searching.


The observations towards the top are the more likely identifications.


CV only looks at first photo in observation; yes, first suggestions are more likely to be true according to cv. Anyway you should add only those ids you’re sure in, you can see cv can be very correct in close “regular for the taxon” photos and can be awful for a new angle on an object or just have correct id in the bottom of the list.


They should be ranked by visual similarity - eg the photo being analyzed is the most visually similar to the taxon at the top, but cruicially this is only among taxa in the model. And the model has quit a few biases and is missing a bunch of taxa, so I don’t think “most likely identification” is an accurate way to describe it, you have to take into account those caveats. If there are 10 species in a lizard genus and the model has only been trained on 3 of them, it’s saying “well, it looks like these three lizards and of those three it looks most like speices X”. But it’s possible you actually observed one of the other species that the model has not been trained on. I think it’s a good place to start and can often be right (espeically in some areas and with some taxa) but I’d use it more as a jumping off point to do a little more digging.


You only get suggestions for one photo, by default the first one. In the Android app (though not on the website, currently), you can scroll from photo to photo and see the suggestions for each one.


On the website, what I’ve done with observations that I can’t get good suggestions for (and I always treat them as suggestions that drive my further research), is to edit the observation and reorder the photos. That lets the computer vision make a suggestion for each in turn. It’s perhaps a bit fussy but not terribly so (I’m using a desktop computer with a mouse and keyboard). And I only have to do it for a small percentage of observations.


while visual similarity is the main component of the scores/rankings, my understanding is that location does factor into the rankings slightly. specifically, taxa with nearby observations will get a slight boost in their rankings, and, depending on whether you’ve chosen to include or exclude nearby observations, taxa without nearby observations may be excluded from the computer visions suggestions altogether.

also, i believe computer vision suggestions should be limited by the observation’s iconic taxon based on existing identifications at time of the computer vision assessment.

accurate is hard to quantify in this context, but if you want to get a better sense of how the computer vision ranks / scores its suggestions, see this:

there’s also a browser extension that someone developed to help you visualize the scores:


As a couple of other users have said, the CV only looks at the first pic if you have multiple pics. If you have a couple of pics from different angles, you can manually reorder them so as to check the CV suggestion for each pic in turn. I’ve found that if it suggests the same top species each time, with different pics of the same plant or bug, it’s probably right. If you reorder the pics and the CV changes its recommendation, then one of the suggestions may or may not be right.

1 Like

Computer Vision is a tool (one of multiple), and how you use it in this case depends on circumstances. Are you familiar with lizards? Can you (or better yet, have you) identified species of lizards? Have you seen this particular lizard before? If you are unfamiliar with lizards, or if this lizard is new to you, it’s probably best to ignore the CV’s specific suggestions and make a more general suggestion. The CV may or may not know more about this lizard than you do.

If you guess a species of lizard, and your guess is wrong, the observation is just one ID away from becoming an incorrect Research Grade observation. Once it becomes RG, the chances that the error will be discovered and corrected become very small.

As others have noted, if you use the website to upload photos, you can (and should) apply the CV to each photo of the lizard. (I don’t know if the phone apps have this capability, but based on other responses, apparently they do.) If the CV claims that it is “pretty sure” the lizard is in genus X in all three photos, then you could guess genus X and see what happens. In that case, resist the urge to agree with the first species-level ID that comes along. Let someone else confirm the ID.

I routinely apply the CV to every photo I upload, regardless of what I know about the organism. This helps me understand the CV’s strengths and weaknesses, and lets me use it more effectively.


Thank you all for your responses. They very helpful and have sparked a couple more questions as I think through my goals for understanding the efficacy of CV vs. my ID vs. community commentary.

If I upload three photos of the same lizard (i.e., listing them in the same individual observation) and the observation reaches community/research grade consensus, do each of the photos contribute to training the CV algorithm or just the first photo? In other words… I want to know if, for species that I want to help train the CV algorithm, is it much better to put what I would consider to be the most informational photograph in front when uploading?

I also want to clarify the qualifications for a “trained species” within CV. I have seen that 100 verifiable observations constitutes a trained species for potential CV suggestions. However, if I started a project within a specific geographic region, do the conditions of a trained species change to 100 verifiable observations within that specific region? Alternatively, does CV training only ever occur at the species level (never location specific)?

The CV is not trained by location, all the photos from a species are lumped together.

The CV may train on multiple photos from the same observation, but if it has many photos of a species (>1000), it won’t use all of them and will pull randomly from the pool of photos of that taxon.

1 Like

is also a courtesy to identifiers. If it is an arty photo of a flower Some flower but the ID is requested for the bug, then the best bug photo first is much more likely to hope for an ID.


As far as training is concerned, the order of the photos doesn’t matter. As far as I know, a photo is just as likely to be chosen for training as any other photo. That said, the order of photos in an observation is important…for the human identifiers that ultimately review your observation.

I don’t know what you mean by “the most informational photograph”.

By the way, consensus is not required for an observation to be eligible for training. If the observer claims that an observation is species Y, but no one else suggests an ID (one way or the other), all of the photos in the observation are eligible to train species Y.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.