i checked, and it looks like the cutoff is actually 100 observations (with photos, i assume), including 50 research grade.
so if that’s correct, you could tell if a taxon will be included in the next training run by looking at the combination of the two sets:
- (look for count >= 100) https://jumear.github.io/stirfry/iNatAPIv1_observations_species_counts?taxon_id=48571&photos=true
- (look for count >= 50) https://jumear.github.io/stirfry/iNatAPIv1_observations_species_counts?taxon_id=48571&quality_grade=research
you could probably use the experimental compare tool to combine those two sets in most cases, like so:
the 50 research grade limit explains the ones not included.
the other situations might be due to any number of things. there is a known bug that seems to affect some taxa. but maybe there have just been a lot of changes to those taxa since the model was last changed?
looking at C. elliottii, it seems like if you include withdrawn ids, there could have been close to 100 observations, and maybe someone just did a lot of cleanup on that taxon since the last CV training?
C. hochstetteriana and C.peregrina have fewer observations, even if you include withdrawn. however, they are both limited to the Azores. so maybe an observer disappeared? or was there some sort of weird taxon change? someone can do more research here if they like, but i’m not super interested in digging much deeper here.