Label species that are included in the CV training model

There is sometimes confusion about which species and taxonomic ranks (for simplicity, I will just use species from now on) have been included in the CV model. I propose that for each species included in the CV model, on that species’ Taxon page, a label/icon is provided. Ideally this would indicate which CV model(s) have been trained on the species in question.
Finally, it would be good if the species included in the CV training model could be made searchable in some way. There is a nice wiki for species often erroneously suggested by the CV
https://forum.inaturalist.org/t/computer-vision-clean-up-wiki/7281 but I would like to have a more comprehensive way to find potential problem areas.

I’m debating with myself about what would be more useful - labeling the species that are in the CV model, or the ones that aren’t.

3 Likes

4 Likes

I think something like 20-30K “leaves” (species and higher taxonomic ranks) are included in the CV model that is in effect right now. No idea about the model that is being trained currently, but definitely more than what we have now. Considering that there are over 327K species in the iNat database right now (and I think but am not positive that that number is just species, not higher ranks), I think it’s more useful to label what has been trained on by the CV than what hasn’t. Of course, I’m fine either way!

Can someone provide an example of how you would use this information? Like, how would knowing whether or not a taxon is in the CV model change your behavior?

1 Like

I personally wouldn’t use it, but I suspect the main use case is the inverse, knowing which taxa are not included, either to explain why it is not suggested, to prioritize adding records if you have them available etc.

4 Likes

Three reasons:

  • there’s an element of curiosity - I just plain want to know if a species is in the model or not. I see enough chatter in the forum to know that I’m not the only one who wonders about this.

  • It doesn’t happen much, but sometimes a species is included in the model, but then it turns out that that species can’t be reliably IDed to species based on the photos typically posted to iNat e.g. with Sacrophaga carnaria:
    “Every day, there are new observations of the Common Flesh Fly, Sarcophaga carnaria submitted, due to the CV suggestions.
    So there must have been many observations in the past to be included in the learning process, because after a thorough curation process, right now the number of observations on species level is down to 5, and other members of this genus have a maximum of 19 observations.”
    So in the above case it would be interesting to know which model(s) were trained on sarcophaga carnaria and if the most recent model has it or not.

  • The wiki that Bouteloa created is great, but it certainly is not comprehensive. I can imagine that if I had a list of all plants that were in the model that occurred in my state, for example, I might try to go through them systematically to see if there are glaring examples of CV errors. I realize there is probably a way to create a query that would accomplish that objective (plants that occur in X location with # of observations > Y), but I don’t know how to construct it.

8 Likes

I know the City Nature Challenge is simply hosted by iNat
but
before a new city / region joins in
it would be wonderful to make sure, say 10, obvious and common species there, were already in CV - to make it easier for identifiers.
I can see the improvements in CV for Cape Town since the first round - when even something easy and obvious like Protea cynaroides mystified the iNat elves.

I’ve actively tried to fill in the blanks before and wanted to know where more observations / photos are needed to get a species included.

And the inverse… for those in the CV clean-up wiki we want to get out of the model, it would be good to

  1. know if its in the current model
  2. have some warning before a new model is trained so we can have a group push to fix it

e.g. the S.carnaria example @matthias55 mentions, which I just coincidentally posted about in a similar context…

1 Like

this part of the request has been addressed:

not sure if there’s any intent to address the other item:

7 Likes

that’s great! excited about the new model and also labelling taxa that are included. From some brief checking, it looks like only species have been labelled so far. E.g.
Bleptina (genus) is not labeled
https://www.inaturalist.org/taxa/173334-Bleptina (though it was just suggested when I ran a recent observation)
But Bleptina caradrinalis is labeled.
https://www.inaturalist.org/taxa/215271-Bleptina-caradrinalis

1 Like

this is intended at the moment. From https://www.inaturalist.org/blog/54236-new-computer-vision-model, “We’ve also released a new feature for taxon pages on the website which allows you to see which taxa are included in the model. This badge only appears on species pages, not pages of genera, families, etc.”

(also see Alex’s comment on that post for context/explanation)

4 Likes

got it. Thank you!

2 Likes

Excited to see this one implemented. Could be good to extend a version of it to genera I think, though I get what @alex is saying about it being less straightforward to explain.

For species labels it could also be nice to have the icons present in places where you have views of the entire species list - to readily see how many from a genus are already coverd.
E.g. on the dropdowns :-

One thing which seems to add to the ones on the CV clean-up wiki are where we have genera with only a single species making the model - could be helpful to encourage and further highlight the need to get other species included to balance out the data.

2 Likes

I would vote for the suggestion to have a small symbol on the species drop-down list, as I’d consider it quite valuable and probably use it regularly. Could you propose it as a separate request?

2 Likes

done!