Computer vision and subspecies

dan_johnson · July 22, 2021, 5:37pm

As far as I can tell, computer vision does not consider subspecies. The recent update to CV has been very good for cicadas in the sense that it is identifying the correct species the vast majority of the time for specimens that are identifiable. The problem is many subspecies exist and I think it’s appropriate to identify them, but I end up having to type in their names rather than simply agreeing with someones subspecies ID, which is quite rare considering CV doesn’t suggest them. As is for me as the identifier, it is just as slow as if they simply IDed at family level or higher. If CV would incorporate subspecies in its suggestions, it would save a lot of identifier time. Is this something being considered?

zdanko · July 22, 2021, 5:48pm

I don’t know how viable this would be. Subspecies for many taxa are purely based on range, and for others, the morphological differences are very slim. Your request seems to be just ‘include ssp in the CV,’ but we need to consider the possible issues that could come up.

dan_johnson · July 22, 2021, 6:02pm

For those subspecies you speak of, I would think a proper CV model would simply identify at species.

cthawley · July 22, 2021, 6:43pm

I think subspecies based purely on range would pose an issue for the CV since (based on my understanding), it’s really trained on the visual data. The geographic info comes into play when suggesting “Seen Nearby” or “pruning” the rankings. So for subspecies ided exclusively on range (not visual differences), the CV might make a lot of mistaken subspecies level IDs that would need to be fixed.

Also, subspecies IDs based purely on range have limited value as they don’t really add information to the observation.

This could also be an issue if there are only some subspecies of a given species with enough observations IDed to subspecies level to be in the CV model. If not all subspecies are represented, CV might end up suggesting only the set of more observed subspecies that are in the model, leading to erroneous IDs.

zdanko · July 22, 2021, 6:52pm

But then it would have to be trained specifically per species to know which should and which shouldn’t be identified to subspecies.

zdanko · July 22, 2021, 6:54pm

This is a very good point. It’s the same with some genera: one super common species, and a couple other common species, but aren’t observed commonly. The really common one that is observed a lot gets recommended a lot, even for the other species, which aren’t in the CV yet, though they are common as well. (I just said ‘common’ a lot.)

keirmorse · July 22, 2021, 6:57pm

I think it would be wonderful is ssp. and var. were added to the CV model. Lack of focus on ssp. and var. seems to be one of iNat’s greatest weaknesses. At least in plants, many ssp. and var. are very easy to ID visually, though it is possible most of those should be recognized as distinct species rather than subtaxa within a species. For those that aren’t easy to ID visually, it would be no different than IDing between species that are difficult to tell apart visually. If they can’t be told apart visually, the model should just default to the next higher rank where an ID is more confident. Because many ssp. and var. are so different, including them all together in the CV model as a single taxon probably makes IDs that much less reliable in those cases. I suspect the main problem with ssp. and var. for iNat is that not all species have them, so they don’t play well together data-structurewise.

murphyslab · July 22, 2021, 9:00pm

Those issues aren’t really any different than you have with species in many cases. Is there any potential difference that you could point to that demonstrates an issue present with IDing subspecies that does not exist in parallel at a species level?

The CV could really speed things up. One of the plants that I’m often IDing in my home province is threeleaf foamflower, which has 3 very visually distinct leaf motifs.

Perhaps we should consider a more menu-based approach for the CV suggestions, where one could click on a button within the species suggestion to open a menu to quickly access subspecies.

elsemikkelsen · July 23, 2021, 7:14am

One thing to consider as a difference between IDing subspecies vs species is that subspecies generally aren’t distinct, clearly-divided entities in the way that species are (otherwise they would be classified as species rather than subspecies) (of course species can sometimes have hybrid intermediates too, but it is usually less “messy” than with subspecies). You get into a grey zone where many individuals may be genetically and/or morphologically intermediate between subspecies, and having the CV make guesses might do more harm than good by misclassifying individuals and artificially simplifying very complex situations. In my opinion, it would be better to keep subspecies ID as something that a user or identifier can choose to do themselves, but that isn’t automatically suggested by CV.
The case of the threeleaf foamflower varieties is a bit unique since the three varieties are so visually distinctive (although genetically they might not really be differentiated), CV would probably classify them well, but it might not be useful to apply to subspecies more generally.

charlie · July 23, 2021, 2:48pm

i would hope subspecies-by-range would never be included in the AI. I won’t agree with subspecies IDs by range at all, because i don’t think it’s good practice to ID that way.

fffffffff · July 23, 2021, 4:30pm

There’re still many subspecies that are very easily divided, especially with how iNat follows taxonomy with many ssp being actually distinct species. You can divide Motacilla alba personata from alba group easily, cv doesn’t have to know/show each subspecies, but those that can be supported both by image itself and geography (even though people are so against it, there’re species living on different continents, no problem in marking those at ssp level, even if someone thinks it “adds nothing”).

wildskyflower · August 11, 2021, 11:33pm

Here’s an example of a case where CV would be helpful that has come up for me (I originally posted it in https://forum.inaturalist.org/t/new-computer-vision-model-released/24729/35?u=wildskyflower):

There are currently 26965 of sambucus racemosa (red-berried elder) but 101 observations of sambucus racemosa melanocarpa (rocky mountain elder), which is a regional variant which is black. If you are randomly selecting 1000 photos for the CV, at most 3 or 4 of those will be melanocarpa, and some of them will show things like the leaves and not the actual black berries. Consequently, in my region the CV usually suggests something not found locally for an elderberry which is black, even though melanocarpa is the only black elderberry in this area. Even if it does suggest racemosa based on the leaves, a user inexperienced in the taxa might scroll through the first few photos and decide it can’t be that because seemingly all of the photos have red berries, and click something else. As melanocarpa is now over the threshold where it could be included in CV if inaturalist taxonomy assigned it as its own species, it seems like it would be useful to train it as one.

system · October 10, 2021, 11:34pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Computer Vision should take into account fraction identified to species General	17	549	March 10, 2025
Make computer vision include hybrid taxa on an opt-in basis Feature Requests	38	1078	May 30, 2025
When could hybrids could be included in the CV? General	16	963	November 3, 2023
Does Computer Vision take into account all stages of a holometabolous insect? General	8	309	February 23, 2021
Distinctive species never suggested by computer vision General	6	940	December 1, 2019

Computer vision and subspecies

Related topics