Species split from a taxon in the CV model are also in the CV model even if they are below the threshold

Although I’m not fully against the idea, I’m putting this under “bug reports” as I don’t think this is intended behavior.

Many species were split off from Unio crassus, a species included in the CV model. Each taxon split off of U. crassus is now included in the model as well:


(Suggestions for https://www.inaturalist.org/observations/25900561)

Some species in the Unio crassus complex (e.g. U. nanus) have only 1 observation and are still included in the model:

When one species is suggested, it also appears to suggest many more from the complex:

I presume the species are being suggested based off of the training data of U. crassus s.l.

Maybe another result of the delayed CV update?

1 Like

I think this is likely the same underlying idea as your other post:
https://forum.inaturalist.org/t/computer-vision-displays-inactive-taxon-taxon-change-occurred-after-training-data-exported/54050

I’m more inclined to think these are bugs after realizing this didn’t use to happen. At least not that I ever noticed.

1 Like

Details of the suggestions and vision scores for https://www.inaturalist.org/observations/25900561 :

1 Like

this exact phenomenon also happened recently with the split of Erysiphe cruciferarum. the other two species in its complex are being suggested by the computer vision fairly sensibly, but E. cruciferarum sensu stricto has fallen below the threshold for inclusion (as described above for Unio nanus) and yet is still being suggested. there seems to be some distinction being drawn by the computer vision between the species, but I suspect it is only location-based because the model can’t possibly have been trained on visual distinctions among them (particularly E. alliariicola which did not have a taxon entry before preparation for the split)

1 Like

I let our CV devs know about this.

1 Like

Yes, this is quite similar to that reported issue. In this case though the behavior reported here was as designed. The taxon mentioned in the initial post, Unio crassus, was involved in a taxon split after the data for the currently deploy CV models were exported. As a result of the split, observations that used to be considered Unio crassus, may be considered one of several separate Unio species. As far as the currently released models are concerned though, all those observations are still considered to be the same species, and the vision model was trained to use all of their photos to learn the visual characteristics of Unio crassus.

When taxa are split after the training data for a model is exported, when that model is released there is a process that runs regularly to determine the current state of the taxa it knows about. There may be 1-1 mappings, (one taxon replaced by another as was the case in the linked forum thread), 1-many mappings (what is observed here), many-1 mappings (taxon merges), or taxa may be removed entirely.

For 1-1 mappings, if the model thinks an observation is the now inactive taxon, it should recommend the currently active replacement instead.

For 1-many mappings, if the model thinks an observation is the original taxon (which may be inactive, or as in this case still active), it should replace the original taxon with all active replacements and all of them will inherit exactly the same scores. Any of their photos may have been used to train the vision model to learn about the original taxon, and until we train a new model there isn’t enough information to distinguish between them, so they all share the same scores. The taxa are not “in” the models - only the original taxon is “in” the models. Rather the replacements are injected into the CV recommendations in a process after the models have made a determination.

For many-1 mappings, if the model thinks an observation is any of the original taxa, they will be replaced with the currently active taxon, and it will inherit the highest score of if there are multiple matches.

Finally for taxa that are removed entirely without replacement, if the model thinks an observation is that taxon, it will be removed from the recommendations.

The next model is training now and it will know about this split from the start. If any of these species do not have enough photos for training, they will no longer be recommended when that model is released (that is unless taxon changes made from now on result in any of them being a currently active taxon replacement for other taxa that did make the cut).

3 Likes

This topic was automatically closed after 12 hours. New replies are no longer allowed.