Species Photos Required to Train Vision

How would one go about finding which species need observing/recording in order to provide photos for future training of Computer Vision ??

Many thanks for any insights



On species page there’s a badge if it’s included in cv:

So if there’s none this species probably needs more photos, anyway, new photos are always welcome!


Also we need a lot of good identification work, because there may already be photos of a species, but if they are not identified, then the images can’t yet be fed to the CV.


I don’t think that’s true. An observation need not be Research Grade to be included in the training. An observation doesn’t need a community ID either (one ID is sufficient). It doesn’t even need to be verifiable (e.g., not wild observations are eligible).

If it’s not ided as correct taxon or left at family level or higher it’s not useful for cv, and that’s the case for tons of taxa and thousands of observations.


Aside from the badge @Marina_Gorbunova mentions, there is also a feature request to make it a bit easier to get a broader overview in future :

right now the /v1/taxa/{id} API endpoint returns a “vision” field in its results, but the v1/taxa endpoint does not. if the latter did also return that field, then it would be relatively easy to get/view that information for a bunch of taxa. right now, you’d have to either go in with a list of taxon IDs, or you’d have to query to get a list of IDs and then query by that list of IDs. it’s possible to do this, but i’m not motivated enough myself to do it.


How many photos do species need to be included in CV?

Also, there’s an easy way to find species which have no photos at all (although those aren’t all this post is about). Search the taxon, location: world, click Species, and see which boxes are blank.

Nice thanks for the tip, maybe I will take a look myself

The number of photos/observations species require to be included is on the help page here

“as of the model released in March 2020, taxa included in the computer vision training set must have at least 100 observations, at least 50 of which must have a community ID. Photos for training are randomly selected from among the qualifying iNaturalist observations (that is, it is not only the first image of an observation that may be used for training)”


It wouldn’t be too difficult to modify this code I wrote for querying conservation status to also provide the taxon CV flag. It uses the algorithm outlined here:


1 Like

I did not say the photos of a species had to be Research Grade, just that they had to be identified.

1 Like

Would it be possible for experts to review the photo set that is used for training in order to weed out misidentifications? I think that could speed up the process significantly. I am talking about groups that are hard to ID and still largely inaccessible because of lack of expertise in the community (e.g., I work on wasps).

Another interesting question is how life stages and sexes are being handled. In some cases it is pretty straightforward such as caterpillars and butterflies. I assume two separate sets of images are being used to train CV for each of the two. It gets more complicated with the sexes or with species that gradually transform from juveniles into adults. Depending on the species, sexual dimorphism can vary from extreme to imperceptible, with every possible intermediate condition. I assume a priori decisions have to be made on how many separate entities CV will be trained for. For species with extreme sexual dimorphism it is obvious that there have to be separate sets of images to train CV. But where is the cut-off for species with moderate sexual dimorphism? Many insects fall into this category. To the untrained eye males and females may look quite similar yet taxonomists often use different diagnostic characters for each sex to get to a species ID.

1 Like

I was under the impression that they are not done separately. From what I’ve read on various forum threads, it sounds like the training model randomly selects photos from among the eligible observations of that taxon, regardless of life stage or sex. But I could be wrong. Doing them separately would make sense in cases like the ones you described.

1 Like

Like the comment above I don’t know if CV incorporates these yet, but doubt it does at least for sexes. Those would be very useful to distinguish. Currently there are annotations for life stage and sexes, which could maybe make implementing this more feasible in the future.

1 Like

The life stages and sexes are not fed separately to the CV. It does not learn in the same way that a human learns, so variations like that do not confuse it.

1 Like

Realised it’s pretty straightforward to look and see which species are included just using the observations portal of course. We can see we need to find some more Proteroiulus fuscus to gear up in European Blanuilidae for example! Only Blaniulus guttulatus has > 50RG and > 100 total obs in this family in Europe at present, likely leading to oversuggestion of Blaniulus sp.




This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.