What Image(s) Are Used for "Training" Computer Vision?

At least twice @tiwane has mentioned in threads that annotating or “marking up” an image could affect the training of the computer vision. I may be assuming incorrectly that doing so is a problem with only one image of the taxon on the observation.

My basic question is: if multiple images are posted in a research grade obs, are all included for training? For instance, one might have dorsal and lateral images of an insect (either being the first image), or necessarily include ventral or other “parts” that are required to ID some of the trickier insects.

Thanks for any clarification.


We do randomize the selection a bit, and have some thresholds, so it’s possible not all photos in an observation will be used, but its position in an RG observation doesn’t determine its selection for training.


Thanks…I think :)

1 Like

I was thinking that my not-so-hot images would be good for threshold training. :wink: - as long as there was something diagnostic present - truth?
I was also thinking that rotations of the subject beyond the classic front and centre pose would help with 3d recognitions - truth?
One last thing is, that points of reference that we don’t recognize will be picked up for instance recognition of Northwestern vs American Crow from an image - thruth?

1 Like

Machine image recognition works differently than human identification of images does. It doesn’t care if the specific field marks that we use are in focus or even present. It compares unknown images to images that are known and decides how similar they are pixel by pixel.

This is partly why more human identified images of a species improve the accuracy of the machine suggestions - the more identified images there are to compare a new image to, the more likely the comparison will find more similarities with identified images.

This is also why marking up the images introduces confusion. If a particular person adds a red arrow to all their observations to indicate the focus of the id then that red arrow becomes part of their identified images. Other identified images or new images will not find as many similarities with this person’s observations as they lack the red arrow.

Hope this helps.


Nice explanation @marykrieger, spot on. Bottom line: the more images of a taxon, the better.

However, it’s important to remember that our computer vision model is not trained to recognize a certain taxon - it is trained to recognize iNaturalist photos of a certain taxon. Most photos on iNaturalist are taken by amateurs using smartphones or other consumer-grade equipment, and the organisms are almost always in situ. If you try to use computer vision on pinned insects, for example, it probably won’t work that well because most of the images it’s been trained on are in situ. It won’t have many reference images of insects on a white background.

So while uploading close-ups of small diagnostic areas can’t hurt, and is great for IDers to evaluate, I’m not sure how much of a help they’ll be for identifying most iNat photos with computer vision.

Each image that is used for training is randomly cropped and also rotated and/or flipped, and the model is trained on those images as well, so we’re able to get a few more images out of each single image. But as I said above, more is better, so the more angles you have, great. Although remember iNat isn’t primarily an image recognition training community, so no need to overload here.

I’m not exactly sure what you mean here. Aren’t those mainly identified by sound?


I think what is meant is that WE use certain characters, such as sound in this case, but the computer algorithms could theoretically detect secondary characters that human identifiers would not see. We strike this a lot with species that are worked with a lot by certain identifiers, where they can determine species level ID from a photo where normally you would need micro work to ID with certainty. But because they work with those species a lot, they start to recognise subtle differences and the range of variations that exist for the species, enough to be quite certain. The only issue is, the AI is only going to recognise those differences if the differentiation in human identifications is accurate and reliable!


A post was split to a new topic: Use computer vision to annotate observations?

I’m not super familiar with this crow pair but my understanding was that they can only “reliably” be separated by range because of hybridization and variability, so I’m not sure how accurate or reliable iNaturalist IDs would be…


Yes, the Northwest Crow is ID’d by call and range - they are also slightly smaller than the American Crow - L ~91%, WS ~87%, WT ~84%.

I believe there is a morphometric difference that we have yet to and may never be able to detect but that an AI system would be able to “see”. I keep looking at photos and can’t tell but know I can tell by the surrounds their relative size so this may be in keeping with the white background issue.

We look at primate faces and in general we cannot see a difference but I believe there are difference much the same as the differences in human face. If the image is being looked at pixel by pixel then the subtlety of difference should be theoretically recognized.

Then once “it” knows what the difference is, “it” could possibly teach us and we’ll see it and realize it was right in front of us all along.

Exactly! And any field worker that works with them on a daily basis could recognise them apart from 100 metres away!

Just as we look at something with 6 legs and immediately think “insect”, there will be artifacts in an image that an algorithm is going to recognise as consistant for a given species. Again, assuming the photos are IDd correctly in the first instance!

I have encountered it myself, in that I am watching observations go up of two moths species, and there is a wing venation detail that I can see consistantly appearing in one of the species. Experts I have discussed it with don’t themselves treat that as diagnostic, and aren’t sure of the validity of it, but they do agree on my IDs on other visible characters. Of course, the danger is I start to ID everything thus, and then the volume of material IDd by me becomes supporting evidence of the vaildity of the wing venation character!

The same goes for images mis-identified on iNat because someone saw an image in google etc that was labelled wrong. I think we have all encountered that one!

Humans can be a little to quick to jump to such character based ID’s, because I think we are hard-wired to recognise dangers in our environments, and for survival it is better to jump at shadows than to only react if certain of the danger… computers don’t have that disadvantage, but we can impose our liabilities on them via the processes and models we give them.


@tiwane Do you use the same software as Andre Poremski? Fieldguide.ai or do you have a custom version?

This is further complicated by the fact that the two “species” aren’t discrete units and probably not good species by any working definition ;)

1 Like

That is why I chose the two species. :) With the question, will computer vision learn to tell the two species apart. Are there relative visual differences that we fail to recognize.

1 Like

Here’s some info on iNat’s computer vision: https://www.inaturalist.org/pages/computer_vision_demo

I don’t know what software Fieldguide.ai uses, but I do know our model is trained only on images from iNat observations.

@bobmcd As for the two crows, I honestly don’t know if machine learning will be able to differentiate them.

I’ll keep submitting. We’ll wait and see. I’ve got time and there are lots of subjects around here.

1 Like

I have been trying to include a mm scale with photographs of moths in the hope that it would help others to identify the species. Now I wonder whether including the scale will confuse the AI.
So do I stop including the scale?
Do I post two photos one with a scale and one without a scale?

just my opinion, but keep doing it the way you are doing it!

1 Like

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.