Use computer vision on each photo in an observation

The Dutch “competitor” waarneming.nl is using all the photos of the observation, so it’s at least possible.

5 Likes

I offer almost always several photos:

  • closeup of flower/fruit,
  • closeup of leaf (or closeups of several leaves, in case of multiple shapes),
  • closeup of details (extrafloral nectary, hairs, spines, …),
  • overview of the entire individual.

Sometimes it is not enough: closeup of calyx may be necessary for identification.

6 Likes

From what I have read about inference, I think doing something like a Bayesian inference would be a better approach, although I am really not competent in computer inference methods.

Could the CV make suggestions based on which photo is selected rather than just the first photo?

5 Likes

Making a bad observation, like this, is its own problem, and not really relevant to this discussion.

I never said anything about the quality of the observation, though.
I was just commenting that this functionality ALREADY exists in the Android iNat mobile app.

2 Likes

Any movement on this?

If it is impractical for the AI to automatically do all pictures, then what about a button to get the AI to look at all of the images and return a summary of likely IDs, rather than just the first image? (obviously it wont apply to single picture observations)
It would also allow the possibility of a notification, such as “the different pictures suggest different species: please check and ensure that this observation is only for a single species: please split if this is not the case”. I dont think though that a different summary is needed for each photo, just a summed probability. But if a particular photo suggests something too different, it could perhaps be summarized: “4 of 5 photos suggest Aba cadabra, Aba gamma, etc., but pic 3 is likely Beta gamma

6 Likes

This discussion explains why iNat cannot identify even common moths from ventral photos. I want unusual photos of common species to be the first photo in some of my observations, so others can find an uncommon photo of a common species for comparison. And so AI will eventually be able to ID from all vantage points. According to the above, I will always have to know the species to ID it, or I will use leps.fieldguide.ai for confirmation. Which I often do anyway, because it’s much more accurate (though iNat is improving on Leps).

Not at this time.

I suspect there aren’t many ventral photos of moths on iNat and thus the model probably isn’t trained up on them.

No, not many. But when I’ve needed one to confirm the ID I suspect, I usually can find one, unless the moth is uncommon. I’ve been trying to tag my own. I wish there was a “ventral” identifier to tick (check). But iNat is more than generous in its data, so no worries.

I believe it does already? At least on the Android app.

Regarding the main subject, There are really two decisions here:

  • Whether the CVM should combine the suggestions for the individual pictures within your observation.
  • Whether all pictures should be used for training the CVM, or just the first.

It already selects pictures completely at random for the training, there is no question of using only the first picture; if there was any potentially viable change to the selection system it would probably have to be to deliberately choose training pictures in a more diverse way than random, not less. For example there are taxa with distinctive subspecies that have 100s of observations that are nevertheless <1% of the observations of the overall species, and those are currently under-represented by the ‘random selection’ system.

I think the question of combining suggestions is whether you would just show the ranked lists of suggestions separately or do you combine the scores post-hoc. Combining the scores would probably improve accuracy in many cases but it might be difficult to quantify how much because it is not what the CV is trained to do. Of course, training the model to actually score all of the pictures together in the first place would probably be best, but is not likely to happen in the near future because it would require changes in how the model is conceptualized and is challenging partly because the number of pictures it would need to handle varies significantly.

It should not be hard at all. It actually already works like that in the mobile app. Only the web version looks just at the first photo.

2 Likes

If I’m understanding this as a reponse to what I said just above your comment, it works like this in Android (no luck on iPhone?), which is part of what makes mobile IDing ironically much easier than desktop in some ways, alongside SIGNIFICANTLY faster loading times between pictures and observations.

I think running on multiple photos would significantly increase the server loads and costs to iNaturalist, so from that perspective, I understand why it’s not running on multiple photos and coming to a consensus based on them, though it would be the ideal system otherwise.

1 Like

Is this actually true? I had the impression training used all photos.

I think the server/electrical costs would be considerable (and to little benefit). I have no problem using a photo-by-photo approach to the CV suggestions on mobile (it actually may be superior in some ways), and it’d be a real benefit at much lower cost to transfer this capability to desktop as well for those who primarily make and ID observations there.

1 Like

We train on all photos for taxon where there are one thousand photos or less for that taxon. If there are more than one thousand photos for the taxon, then not all are used.

2 Likes

I wonder about this approach a little, at least when it comes to trees, people tend to upload the 1st photo as either very far away of the whole tree or close up of leaves/fruit, with few shots of diagnostic features like bark, branching pattern, canopy spread, macro shots of petiole/ leaf hairs/buds, etc.

This change to use only a selected subset (what are the criteria? is it randomized?) when a taxa reaches >1000 photos might explain why, for instance, the CV is relatively bad at recognizing Quercus agrifolia (Coast Live Oak), often suggesting other oaks, Eucalyptus, or trees that don’t have much resemblance to oaks at all!

Most machine learning models use only a subset of available data as training data. Using all photos to train would not be computationally feasible (some taxa have millions of photos). For training models, there’s a point at which adding additional training data is not worth it - the cost of the additional resources used/needed does not result in corresponding increases in performance. Previous posts about training the CV model have indicated that photos from RG observations are prioritized for inclusion in the training set, but the model itself is open/published, so you can probably dig into that to find a more specific answer if you wish.

Regardless, the thread should be focused on the original feature request which is about getting results from the CV model, not training. If a thread about CV training is desired, a new one can be made and we can move posts there, or one of the older ones can be reopened.