Better to feed the AI more photos or less?

I read recently that the autosuggest for species/genus kicks in after 100 research grade observations.

I’ve been told a few times that I include too many images in my observations and I need to edit them better. Theres a range of issues connected to this for me (and I accept that I do need to work on my editing skills) …

But solely in relation to the AI, is there benefit to having a large range of angles and photos in observations to feed more data into the AI? I am presuming that to some extent, the more data the better, but I would like to know if this is misplaced.

Are 50 observations with 2 photos similar/better/worse than 100 observations with 1 photo in terms of training data?

Alternatively is this a bad thing, as it potentially clogs the system up with more photos/more load than necessary…putting pressure on the infrastructure? and I should be wary of how many I upload in this respect also…

My guess is that this is just a question of maintaining a reasonable balance to account for both ends of the spectrum. But would like to know for sure.

Thanks!

3 Likes

Observing should be driven primarily based on what you find most rewarding. I wouldn’t care much about how any potential computer vision training factors in, since it’s likely the training methods will vary in the future anyways. More photos are better up to a certain point for CV training (especially for rarely observed species), but having more data points on the map is also really useful for a lot of other purposes than computer vision.

For me it’s pretty context dependent: am I trying to document the distribution of the organism (fewer, probably hastily taken photos), am I trying to take good example photos for future use (more, higher quality photos, or particular photos of specific features), am I trying to get help with identification of an organism (lots of photos of different aspects, variable quality), etc.

9 Likes

Thanks for this!
Yes, have been also considering and weighing up most of the above.

The trade off for me isn’t datapoints vs number of shots of a single observation.
( not sure I actively consider either when in the field right now)
Rather, how much I annoy other identifiers in uploading too many photos!
…and yes, whether my mental argument that its a positive thing for the system is true or not.

In regard to context, the trigger for this question is UK diptera, so I would say, yes, more on the rarely observed species list. Or rather, rare in iNaturalist at least…
As many fly species do seem to have limited research grade observations…

It was noticeable to me last year when I joined also that many many UK flies were autosuggested as tiger flies. This is getting better though! Perhaps partly as we seem to have more active dipterists now pitching in.

One example is E.cyanella which is fairly recognisable when bronzed with its distinctive colour
https://www.inaturalist.org/taxa/547126-Eudasyphora-cyanella
It should be listed at least as one of the autosuggests, but currently is not, I guess as it hasn’t reached 100 observations.

So, it feels valid to actively help the AI in this context.
And indeed rewarding if I can help it ID UK flies better sooner rather than later.

1 Like

I identify – or try to identify – a lot of plant observations. Having lots of photos from different angles really helps! (Front and side of flower, top and bottom of leaf, a picture showing the whole plant, close ups of other potentially useful features like ligules, fruit, stems, buds, etc.) Keep posting lots of photos!

6 Likes

I think more than ten photos for an observation is likely not necessary, and uploading excessive photos can affect iNat’s infrastructure. It’s also important to remember that computer vision isn’t being trained to recognize an actual taxon, it’s is being trained to recognize iNaturalist photos of that taxon. You could upload a ton of macro photos of a tiny identifying feature (which is useful for humans trying to ID the fly), but will most other users take similar photos of that same feature? Probably not, so the AI wouldn’t be particularly helpful there.

I’m not sure if you read our recent blog post about the vision model, but we train the model only twice a year, so even if a taxon reaches that threshold, it can be months before a new model trained on that taxon is released.

2 Likes

Ok interesting, good to know re:infrastructure

Yes, I think the recent blog post must have been where I got the 100 observations number from…didnt catch the twice a year bit though.

I also wondered…does the model store different sets of images about a single species?
I wonder about when people post up eggs for example, whether those need a specific label in order for the model to sort them and if not, whether they are essentially lost in the model as they are scrapped as outliers?

I asked a similar question last year and got similar responses: “don’t worry about the CV, just use the site for what you want to use it for.” But it seems to me that just by asking questions like the OP, we’re making it clear that training the CV is something we (and many other users) value for its own sake? I was so excited to find iNaturalist last year in large part because the CV feels so magical to use, and finding places where that tool is incomplete and knowing I can expand it for future naturalists in my area makes the act of recording observations, and more diverse observations–life stages, seasonal changes, etc–feel far more rewarding.

You’re right that the training methods vary, but in my case that actually worked in my favor–when I asked, there was a rule against the CV learning species with many observations from few observers. By the time my observations were factored in, that restriction was removed. And some of the species I was targetting are ones the updated CV can now recognize. So I’m satisfied and will be expanding my efforts accordingly (eg right now it can’t recognize several common local trees based only on buds). And like you said–if the CV rules change in some way that invalidates that, those observations are still useful for all sorts of other reasons.

4 Likes

Definitely… it feels like there is a tangible sense of achievement here, and like you’re contributing to a long term goal.

Interesting point about making deliberately diverse observations of life stages, etc…
again, makes me wonder how the model deals with these. Be amazing long term if the autosuggest could also flag something up as being a juvenile for instance.

There are some great comments I realised below the recent blog post @tiwane linked to, which also mention some of these things and related issues.

I don’t think the CV associates any tags with its images other than taxonomy (I doubt most species have enough annotations to support that yet anyway). But it definitely recognizes a species from many angles, so to speak. There are just a lot more gaps in less commonly uploaded aspects of species. So it can ID some trees from buds but not others, etc, adults better than larval insects, etc.

3 Likes

just realised there are a whole load of similar and related points also raised here
https://forum.inaturalist.org/t/psst-new-vision-model-released/10854/20

and regarding annotation here
https://forum.inaturalist.org/t/use-computer-vision-to-annotate-observations/3331

2 Likes

Well, if you’ve been told about editing photographs, maybe it’s more about this than their number, probably the identifier felt exhausted about something, though it’s hard to judge. I’d say more photos is a big plus if they show different parts in focus.

3 Likes

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.