I am a fairly new user to iNaturalist. I am interested in the efficacy of the Computer Vision in comparison to user comment identifications and my identifications for the same specimens. I have a couple outstanding questions about Computer Vision that I have had a difficult time tracking down.
First, are suggested identifications for a photograph ranked in any particular order? For example, if I post a photo of a lizard and I receive 5 suggested species identifications (from computer vision), is the first identification (identification suggested at the top) the most likely identification? Alternatively, are all five of the suggested identifications similarly/equally accurate? I do understand that the algorithm suggestions do not necessarily correspond to accurate identifications.
Second, when combing multiple photos for a single observation (e.g., 3 photos of the same lizard), how does this impact the algorithm’s suggested identifications? If suggested identification ranking matters (from the first paragraph), then does the algorithm combine suggested identifications from each of the 3 lizard photos for suggested identifications? Alternatively, do I just receive suggested identifications from the first of the 3 lizard images?
Thanks in advance. Please let me know if I can clarify. I have searched for these answers somewhat substantially, but I may not be aware of iNat lingo for optimal searching.
CV only looks at first photo in observation; yes, first suggestions are more likely to be true according to cv. Anyway you should add only those ids you’re sure in, you can see cv can be very correct in close “regular for the taxon” photos and can be awful for a new angle on an object or just have correct id in the bottom of the list.
They should be ranked by visual similarity - eg the photo being analyzed is the most visually similar to the taxon at the top, but cruicially this is only among taxa in the model. And the model has quit a few biases and is missing a bunch of taxa, so I don’t think “most likely identification” is an accurate way to describe it, you have to take into account those caveats. If there are 10 species in a lizard genus and the model has only been trained on 3 of them, it’s saying “well, it looks like these three lizards and of those three it looks most like speices X”. But it’s possible you actually observed one of the other species that the model has not been trained on. I think it’s a good place to start and can often be right (espeically in some areas and with some taxa) but I’d use it more as a jumping off point to do a little more digging.
On the website, what I’ve done with observations that I can’t get good suggestions for (and I always treat them as suggestions that drive my further research), is to edit the observation and reorder the photos. That lets the computer vision make a suggestion for each in turn. It’s perhaps a bit fussy but not terribly so (I’m using a desktop computer with a mouse and keyboard). And I only have to do it for a small percentage of observations.
It’s useful to assess all user ID comments, which may include more detailed or accurate info., and help confirm CV suggestions. CV should be used too, but only trusted in combination with making your own ID assessment. For CV- or user-suggested IDs, use Compare or view their taxa pages to check photos, geographic range, etc. Use Explore to search the location for the broadest taxon which includes all suggestions, then view each in Species (similar can be done using Compare). Also search external websites (e.g. GBIF, Wikipedia as starting points) or scientific publication sources (e.g. via Google Scholar). Also look at/compare CV results for both nearby and all settings.
You can use CV to check each of multiple photos for your own observation when editing, before combining photos (on iNat website), or by sequentially alternating which photo is selected as profile photo (on mobile app uploader).
Sometimes CV distinguishes between suggestions, saying a top one is “pretty sure,” or that there’s low confidence for any. So, when it only shows list with no explanation those fall between it’s low and high confidence. Even “confident” suggestions aren’t typically near certainty, and accuracy may vary, I’d estimate as low as 70% at times.
while visual similarity is the main component of the scores/rankings, my understanding is that location does factor into the rankings slightly. specifically, taxa with nearby observations will get a slight boost in their rankings, and, depending on whether you’ve chosen to include or exclude nearby observations, taxa without nearby observations may be excluded from the computer visions suggestions altogether.
also, i believe computer vision suggestions should be limited by the observation’s iconic taxon based on existing identifications at time of the computer vision assessment.
As a couple of other users have said, the CV only looks at the first pic if you have multiple pics. If you have a couple of pics from different angles, you can manually reorder them so as to check the CV suggestion for each pic in turn. I’ve found that if it suggests the same top species each time, with different pics of the same plant or bug, it’s probably right. If you reorder the pics and the CV changes its recommendation, then one of the suggestions may or may not be right.
Computer Vision is a tool (one of multiple), and how you use it in this case depends on circumstances. Are you familiar with lizards? Can you (or better yet, have you) identified species of lizards? Have you seen this particular lizard before? If you are unfamiliar with lizards, or if this lizard is new to you, it’s probably best to ignore the CV’s specific suggestions and make a more general suggestion. The CV may or may not know more about this lizard than you do.
If you guess a species of lizard, and your guess is wrong, the observation is just one ID away from becoming an incorrect Research Grade observation. Once it becomes RG, the chances that the error will be discovered and corrected become very small.
As others have noted, if you use the website to upload photos, you can (and should) apply the CV to each photo of the lizard. (I don’t know if the phone apps have this capability, but based on other responses, apparently they do.) If the CV claims that it is “pretty sure” the lizard is in genus X in all three photos, then you could guess genus X and see what happens. In that case, resist the urge to agree with the first species-level ID that comes along. Let someone else confirm the ID.
I routinely apply the CV to every photo I upload, regardless of what I know about the organism. This helps me understand the CV’s strengths and weaknesses, and lets me use it more effectively.
Thank you all for your responses. They very helpful and have sparked a couple more questions as I think through my goals for understanding the efficacy of CV vs. my ID vs. community commentary.
If I upload three photos of the same lizard (i.e., listing them in the same individual observation) and the observation reaches community/research grade consensus, do each of the photos contribute to training the CV algorithm or just the first photo? In other words… I want to know if, for species that I want to help train the CV algorithm, is it much better to put what I would consider to be the most informational photograph in front when uploading?
I also want to clarify the qualifications for a “trained species” within CV. I have seen that 100 verifiable observations constitutes a trained species for potential CV suggestions. However, if I started a project within a specific geographic region, do the conditions of a trained species change to 100 verifiable observations within that specific region? Alternatively, does CV training only ever occur at the species level (never location specific)?
As far as training is concerned, the order of the photos doesn’t matter. As far as I know, a photo is just as likely to be chosen for training as any other photo. That said, the order of photos in an observation is important…for the human identifiers that ultimately review your observation.
I don’t know what you mean by “the most informational photograph”.
By the way, consensus is not required for an observation to be eligible for training. If the observer claims that an observation is species Y, but no one else suggests an ID (one way or the other), all of the photos in the observation are eligible to train species Y.