How many observations are needed before a taxon makes it into the Computer Vision training package?

annkatrinrose · September 14, 2020, 4:29pm

I’m trying to find this information but can’t seem to locate it: How many observations are needed before a taxon makes it into the Computer Vision training package? I’m asking because I think I’ve come across a species that keeps getting misidentified as something else presumably due to being completely unknown to the algorithm. I wrote a journal post about it for those interested in details.

Star3 · September 14, 2020, 4:40pm

I think it needs 50

bouteloua · September 14, 2020, 8:31pm

@annkatrinrose the current cut-off is:

Which taxa are included in the computer vision suggestions?

This has changed over time, but as of the model released in March 2020, taxa included in the training set must have at least 100 observations, at least 50 of which must have a community ID. As more observations are added and more identifications made, additional taxa can be added to the computer vision suggestions. This means your observations and IDs work to make better models!

annkatrinrose · September 14, 2020, 9:37pm

Thanks! I knew the info was somewhere but I just couldn’t find it. It should be over the threshold now (200+ observations, at least half of which seem to be research grade) so hopefully it will pop up as a suggestion in the future.

amacnaughton · September 14, 2020, 9:41pm

The criterion of 50 is helpful to know. Other than using that figure as a general guide, is there a list of which taxa are included, or all not included, in the computer vision suggestions? It would be helpful to know if a taxon had been considered by the computer vision.

Also, is there a way to report taxa which are commonly misidentified by the computer vision?

sedgequeen · September 14, 2020, 10:48pm

There’s a Forum thread, I believe under General, called Computer vision clean-up - wiki

It lists lots of problem taxa, with a plea to help clean them up.

rymcdaniel · September 24, 2020, 4:57pm

I’ve had a similar problem with Amphiachyris dracunculoides and Gutierrezia texana in Texas, which can only really be distinguished by photos of the phyllaries, which most observers don’t take. The iNat community (and probably naturalist community in general) is fairly unaware of Gutierrezia texana and so most initial sightings were identified as Amphiachyris, whether there was enough info to make the ID or not. Then Amphiachyris dracunculoidse got included in the computer vision model while Gutierrezia texana did not and it just snowballed from there. It seems like for a lot of users their only awareness of possible species is that suggestion box because iNat is their first tool for learning about living things. Anyway, I also wrote a journal post about the situation with a guide for differentiating the two, but I am patiently awaiting the next computer vision model release when I hope at least Gutierrezia texana will be included and finally show up in the suggestion box. I don’t expect that to fix everything, but maybe it will restore some balance to the situation.

I suspect this is fairly common problem as I have seen it happen to a lesser extent in other genera as well such as Callirhoe and Grindelia. The lesser known species or less widely ranging species just don’t have enough observations to make it into the model, and until they do, it’s just sort of hit or miss on whether there is a volunteer out there to informally curate the situation.

BTW, does anyone know when a new computer vision model might be released? I know the last one was in March, and it seems like a new one should be upon us any day now. I can understand that those in control would want to keep this knowledge private in case they miss the announced date, but it would be nice at least to have a general time frame.

chrisangell · September 25, 2020, 3:15pm

BTW, does anyone know when a new computer vision model might be released? I know the last one was in March, and it seems like a new one should be upon us any day now. I can understand that those in control would want to keep this knowledge private in case they miss the announced date, but it would be nice at least to have a general time frame.

I don’t know, but it took over 4 months to train the last version, as the number of photos and taxa included in the model keeps increasing: https://www.inaturalist.org/blog/31806-a-new-vision-model It seems like there could be a pretty substantial lag for new taxa and these kinds of manual corrections to make it into the model, for instance, if you cross the threshold while the next version is already being trained.

system · November 24, 2020, 3:15pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Species Photos Required to Train Vision General	15	937	December 23, 2021
Computer vision questions General	4	655	November 26, 2019
How are photos selected for CV training? General	74	2686	December 10, 2023
Distinctive species never suggested by computer vision General	6	941	December 1, 2019
Searching for taxa that need observations to be included in computer vision? General	22	1161	November 7, 2022

How many observations are needed before a taxon makes it into the Computer Vision training package?

Which taxa are included in the computer vision suggestions?

Related topics