Allow for genus-level CV training sets irrespective of species level participation

Platform(s), such as mobile, website, API, other: Computer Vision

URLs (aka web addresses) of any pages, if relevant: Some of the most recent discussion of this conundrum can be found here:
https://forum.inaturalist.org/t/are-genus-level-rg-observations-used-for-cv-training/63859

Description of need:

This applies to any genus (of anything…moths, beetles, flies, spiders) which has many taxa which cannot reasonably be identified (yet) in photos, i.e. they require dissection or other examination which cannot typically be captured in photos. Presently, in such genera, if even one species is identifiable and is included in a CV run, apparently CV is prone to spit out that one ID or something unrelated. It lacks the capability of moving to genus-level ID suggestions when one species is well-documented but many/most are not.

Feature request details:

For selected genera, I would like to see CV trained on RG genus-level observations in order for the genus-level ID to become available as a suggested ID. This would probably require some type of nomination/flag process to identify candidate genera. Candidate genera should meet some criterion of “speciosity but unidentifiability”, i.e. they should be populated by a set of taxa that recognized experts agree cannot typically be identified in photos.
Such a training set for a genus won’t preclude species-level taxa in the same genus which might also being included in a training set (e.g. about 100 images from at least 60 observations).
A discussion is needed to figure out what criteria or algorithm staff might use to select candidate genera.

RG should not be required. RG isn’t required for any taxa inclusion in the CV. What actually matters is the community ID. An observation of a species with two species IDs will turn it RG because it is a species. An observation of a genus with two genus IDs will not make it RG because that taxon level does not automatically become RG. Both observations would be CV eligible.

1 Like

I feel like the CV should be trained on every rank… or at least more of the basic ones; class, family, genus etc. I suppose it would add way more taxa to train on, but it’s already training on 100k leaves. It seems to me like having more direct knowledge about observations sitting at intermediate stages would probably be helpful rather than inferring indirectly based on relationships between the leaves that it knows.

Only knowing the leaves misses out on a lot of relevant information about what is unidentifiable, because the leaves are necessarily the most identifiable endpoints of the tree. Knowing what isn’t identifiable is just as important as knowing what is, and only knowing about leaves biases it towards thinking that everything should be identifiable. At least theoretically that makes sense to me, and that seems consistent with what happens in practice.

1 Like