How are photos selected for CV training?

I find it encouraging that the pictures for Lunate Crassinella you’ve added since starting this project:
https://www.inaturalist.org/observations?created_d1=2023-06-01&place_id=any&taxon_id=253452&user_id=bonesigh
Don’t look immediately distinguishable by eye from the range of Lunate Crassinella observations by other people:
https://www.inaturalist.org/observations?place_id=any&taxon_id=253452&not_user_id=bonesigh
And, if anything, have slightly more diversity in appearances than the pre-existing observations.
So I think you are doing a decent job of not just taking basically 100 variants of the same picture with your own particular quirks. I think part of this is that basically with such tiny shells there are only so many ways to take a reasonably ID-able photograph. One note, if it is even possible to get your camera to focus, it might be good to try to include some edge-on or nearly-edge-on photographs, for a bit more diversity. Larger shells often have some photos like that in the mix, though maybe it is to be expected that they are inherently less typical for shells so tiny, I don’t know.

2 Likes

Just to clarify slightly, for the dextral shell, the opening of the shell will face you on your (the observer’s) right, and for sinistral it faces your (the observer’s) left.

2 Likes

Yes, that is a good clarification. Thanks.

Just a little context on why diversity matters, a typical classification neural network is a Convolution Neural Network. It trains filters that activate neurons when they pass over certain features in an image. If too many images are taken at the same focal length with the same camera, the network may unintentionally learn an attribute of the camera like the bokeh pattern as contributing to the ID. This is a gross oversimplification, but in my experience building and training classification models it’s easy to pick up artifacts.

4 Likes

I love your teamwork approach and spirit @DianaStuder !

1 Like

@alex Does this mean that you have found that having more than 1000 images per species does not result in significant improvements to the model? So all species are trained on between 100 and 1000 images? When you augment, are you augmenting so that all species end up with a similar number of images even though the augmented images aren’t as useful as distinct images?

2 Likes

https://www.inaturalist.org/observations?locale=nl&place_id=any&preferred_place_id=7506&taxon_id=430140 Over 100 photos but not in the Computer Vision model. How to solve this? Is there topic for this?


14 days later we see 106 Research Grade observations. Waiting what happens to the Computer Vision model next month.
https://www.inaturalist.org/observations?locale=nl&place_id=any&preferred_place_id=7506&quality_grade=research&taxon_id=430140&verifiable=any

Seems like a different topic and not really a forum one. On iNat, reach out to people who for sure know the taxon and maybe work together on identifying these observations accurately.

Only 49 of the observations are Research Grade, so I would guess there aren’t enough with Community ID yet to enter the model. But it’s getting there!

2 Likes

Another data point: Image flipping may be why the CV sometimes has problems distinguishing between righteye and lefteye flounders (Pleuronectidae and Paralichthyidae).

2 Likes

I would recommend you take photographs of some more of the shells of micromollusk species that you have on hand.

The shells of all mollusk species are somewhat variable. Of course the shell varies in shape as the mollusk grows and matures, and usually the shells, even when fresh, vary somewhat in color and maybe even in texture too. There are also sun-bleached shells and beach-worn shells. Us humans need to know about these variants, and so does the CV.

4 Likes

@alex any results from the experiment with pics of different “handed” mollusks that you can share?

Was this helpful?

Here are four pairs of gastropod species where the first one has sinistral shell-coiling, and the second one has dextral shell coiling, but both otherwise look rather similar:

One
Sinistral:
https://www.inaturalist.org/taxa/503453-Sinistrofulgur-sinistrum
versus
Dextral:
https://www.inaturalist.org/taxa/208724-Busycon-carica

Two
Sinistral:
https://www.inaturalist.org/taxa/499861-Sinistrofulgur-pulleyi
versus
Dextral:
https://www.inaturalist.org/taxa/971306-Fulguropsis-spirata

Three
Sinistral:
https://www.inaturalist.org/taxa/292300-Marshallora-modesta
versus
Dextral:
https://www.inaturalist.org/taxa/387350-Cerithiopsis-powelli

Four
Sinistral
https://www.inaturalist.org/taxa/563819-Physa-acuta
versus
Dextral
https://www.inaturalist.org/taxa/451596-Oxyloma-elegans

2 Likes

I did some analysis and it looks to me like our models underperform on these taxa, enough to warrant a further experiment, but I’ve been busy (and all of our GPUs have been busy) so we haven’t had the time to make much progress. However, it’s in my queue of experiments to run now that the geo modeling work is out the door.

7 Likes

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.