The CV training process has gotten more complex over time, but change is very slow because each change in how it works has unexpected ripple effects. There was an attempt to introduce hybrids a couple years ago and it went poorly, and only recently have there been slow experiments to reintroduce them. So its options in what taxa can be involved in training and suggestion are still fairly simple and constrained, and they’re manually selected by the developers. There are certain things that it’s predictably incapable of learning because they’re loopholes in the rules it has to follow.
You can read roughly the process used for selecting taxa in training here, but the most relevant detail is that it only trains on leaf taxa. If no species in a genus are included in the CV then it will train on a random sample of all observations within the genus. But if any species within the genus are eligible for inclusion, then it will only train on a random sample of observations of those species. It won’t train on any observations identified only to genus level.
So in this example, if they all have unidentifiable exuviae, then the CV won’t know about the exuviae even if all 10 species are in the model, because the exuviae are all stuck up at genus level and the CV has no idea of their existence. There’s a feature request to also train on non-leaf taxa here.
So far the CV doesn’t get any feedback about mistakes it makes. There are a couple options which have been proposed here already: