Every month new species are being added to the computer vision algorithm. My question is, which country/region is gaining the most species? And in a slightly similar vein, which region is receiving the most improvement in CV accuracy from monthly updates?
My intuition is torn here. On one hand, it would seem an area in the tropics like the amazon, congo, or possibly Oceania would gain the most species since these areas are incredibly diverse species wise. On the other hand, an area like the United States has relatively fewer species but many more inatters observing, increasing the likelihood rare species are documented and eventually make it into the model.
I’m curious if that kind of data is logged somewhere. It would be interesting which factor is more influential: species diversity or # of inatters in a region.
i don’t think what you’re asking for is even measurable.
I think it could be. If the staff were to release the list of all species in the CV model, you could download all the data from iNat for those species and create a heatmap of which areas have the most observations which are included in the CV.
Maybe this could be tracked from the list of new species included in cv that is published at each update ? Not really sure how to make it work tho
Now that we have the geomodel, surely it’s just a matter of combining the geomodel data for each new CV species to produce a sort of heat map.
Thanks, that looks interesting and when I click on the link, it takes me to my default search place. Can you explain a little more what your link is filtering by? For example, where did you get that string of taxon IDs?
edit: I think I answered the question myself. You’re clicking on the species links in the blog posts announcing a computer vision update, e.g., https://www.inaturalist.org/posts/75633-a-new-computer-vision-model-v2-1-including-1-770-new-taxa The link you provided in the previous post was for Fungi, and Fungi only.
I highly suspect it’s the latter, particularly regions with specialists, since the ‘new’ species need expert identification to differentiate similar species, and document or identify species that have few to zero observations already on iNat.
I’m pretty certain of this based on my experience going through local floras in Monterey, CA and both identifying and observing the species that were missing or had low observations and within a few months got them added to the CV.
I’ve toyed with the idea of picking an under-identified area with a reliable plant list and learning to identify those species with few observations to get them on the map and eventually the CV model.
You’d really need those specialists for verification though, since winging it would do more harm than good to the CV model.
For each taxon, you can check whether it is pending or included.
We aim at 100 photos, so about 60 obs.
Sometimes it needs us to retrieve just a few more, to get it included in the next CV update.
If you pick a country from this project
any help with refining the IDs to where taxon specialists are already filtering for them (family for example) - will help.
I agree that this type of targeted observing/IDing could really increase the coverage of the CV.
I think there’s really three basic limitations for the CV:
Observers making observations of the species (getting raw data)
The ability to ID the species from photographs (possibility of creating training data)
The presence of expert IDers who can do the IDing (creating accurate training data)
The second limitation is actually the “hardest” - For some species, IDing by photos really isn’t possible, though there are some species where the exact correct pictures (or pics made with special equipment) will allow an ID where others will not.
In many areas though, I suspect that there are plenty of species that are IDable from photos and for which IDers exist on iNat, it is just that the species is comparatively rare and/or rarely observed (not charismatic, small, drab, etc.) . This might be particularly true for species with limited ranges (so the number of IDers in the area might be low).
One potential caveat is that, if the CV is trained on a dataset with low variation, like all observations from the same population or made by the same observer, it may not perform very well. A varied training set will generally lead to a better performing model.
avoiding that is already plumbed into the CV. Many photographers, different cameras, varying photo quality and angles. They do try.
Given that plants are notoriously difficult to ID (iNat has a single icon for ALL the green stuff in Plantae)
I was surprised to see that Flora of Africa is 2/3 RG !
Since I am buried in that other third.
I agree the the CV training data selection does a good job sampling from all available photos.
However, I was referring to the utility of targeting observations of taxa to add observations so that they can be included in the CV. if just one user is doing this for a species with very few observations already, their photos may make up the vast bulk of a training data set.
See an example of the issue here: https://forum.inaturalist.org/t/how-are-photos-selected-for-cv-training/42403/2
I was thinking from the identify side. Looking at a broader taxon to retrieve a few more. Rather than go forth and take photos. (But I did add number 23 yesterday)
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.