Computer vision and subspecies

As far as I can tell, computer vision does not consider subspecies. The recent update to CV has been very good for cicadas in the sense that it is identifying the correct species the vast majority of the time for specimens that are identifiable. The problem is many subspecies exist and I think it’s appropriate to identify them, but I end up having to type in their names rather than simply agreeing with someones subspecies ID, which is quite rare considering CV doesn’t suggest them. As is for me as the identifier, it is just as slow as if they simply IDed at family level or higher. If CV would incorporate subspecies in its suggestions, it would save a lot of identifier time. Is this something being considered?

3 Likes

I don’t know how viable this would be. Subspecies for many taxa are purely based on range, and for others, the morphological differences are very slim. Your request seems to be just ‘include ssp in the CV,’ but we need to consider the possible issues that could come up.

2 Likes

For those subspecies you speak of, I would think a proper CV model would simply identify at species.

2 Likes

I think subspecies based purely on range would pose an issue for the CV since (based on my understanding), it’s really trained on the visual data. The geographic info comes into play when suggesting “Seen Nearby” or “pruning” the rankings. So for subspecies ided exclusively on range (not visual differences), the CV might make a lot of mistaken subspecies level IDs that would need to be fixed.

Also, subspecies IDs based purely on range have limited value as they don’t really add information to the observation.

This could also be an issue if there are only some subspecies of a given species with enough observations IDed to subspecies level to be in the CV model. If not all subspecies are represented, CV might end up suggesting only the set of more observed subspecies that are in the model, leading to erroneous IDs.

3 Likes

But then it would have to be trained specifically per species to know which should and which shouldn’t be identified to subspecies.

This is a very good point. It’s the same with some genera: one super common species, and a couple other common species, but aren’t observed commonly. The really common one that is observed a lot gets recommended a lot, even for the other species, which aren’t in the CV yet, though they are common as well. (I just said ‘common’ a lot.)

4 Likes

I think it would be wonderful is ssp. and var. were added to the CV model. Lack of focus on ssp. and var. seems to be one of iNat’s greatest weaknesses. At least in plants, many ssp. and var. are very easy to ID visually, though it is possible most of those should be recognized as distinct species rather than subtaxa within a species. For those that aren’t easy to ID visually, it would be no different than IDing between species that are difficult to tell apart visually. If they can’t be told apart visually, the model should just default to the next higher rank where an ID is more confident. Because many ssp. and var. are so different, including them all together in the CV model as a single taxon probably makes IDs that much less reliable in those cases. I suspect the main problem with ssp. and var. for iNat is that not all species have them, so they don’t play well together data-structurewise.

5 Likes

Those issues aren’t really any different than you have with species in many cases. Is there any potential difference that you could point to that demonstrates an issue present with IDing subspecies that does not exist in parallel at a species level?

The CV could really speed things up. One of the plants that I’m often IDing in my home province is threeleaf foamflower, which has 3 very visually distinct leaf motifs.

Perhaps we should consider a more menu-based approach for the CV suggestions, where one could click on a button within the species suggestion to open a menu to quickly access subspecies.

4 Likes

One thing to consider as a difference between IDing subspecies vs species is that subspecies generally aren’t distinct, clearly-divided entities in the way that species are (otherwise they would be classified as species rather than subspecies) (of course species can sometimes have hybrid intermediates too, but it is usually less “messy” than with subspecies). You get into a grey zone where many individuals may be genetically and/or morphologically intermediate between subspecies, and having the CV make guesses might do more harm than good by misclassifying individuals and artificially simplifying very complex situations. In my opinion, it would be better to keep subspecies ID as something that a user or identifier can choose to do themselves, but that isn’t automatically suggested by CV.
The case of the threeleaf foamflower varieties is a bit unique since the three varieties are so visually distinctive (although genetically they might not really be differentiated), CV would probably classify them well, but it might not be useful to apply to subspecies more generally.

2 Likes

i would hope subspecies-by-range would never be included in the AI. I won’t agree with subspecies IDs by range at all, because i don’t think it’s good practice to ID that way.

There’re still many subspecies that are very easily divided, especially with how iNat follows taxonomy with many ssp being actually distinct species. You can divide Motacilla alba personata from alba group easily, cv doesn’t have to know/show each subspecies, but those that can be supported both by image itself and geography (even though people are so against it, there’re species living on different continents, no problem in marking those at ssp level, even if someone thinks it “adds nothing”).

3 Likes