Recommendations on improving the AI algorithm?

someplant · March 23, 2025, 1:13am

I think the inclusion criteria for species could be reformed in some way. Right now, leaf taxa are included if they have 100 or more observations (or roughly something along those lines). The rationale for this is that they don’t want to include taxa where there are few images and thus little training data, and I broadly agree, this seems reasonable to me. But the current criteria can occasionally lead to problems. Let’s look at Closterium for example:

Closterium is a genus of freshwater, single-celled algae. It’s common, but somewhat difficult to identify to species. To get a species-level ID, you usually need to get the length/width/curvature of the cell (not necessarily difficult to do, but most people don’t do this), and you often need a close-up view of the center/apex/cell walls. Often it also helps to look at multiple individuals with a population, to get a sense of variation between individuals. Of course, the other major problem is the difficulty of accessing literature, in particular the books are expensive and hard to find.

Which brings us to the issue of CV model: because of the number of observations, only two species (C. moniliferum and C. acerosum) are included in the model. These are probably two of the most common species, but if you find a Closterium on your microscope slide, there’s very good chance it won’t be one of those two species.
Using observation counts as a rough example — there are 1473 (non-casual), species-level observations of Closterium and 665 (non-casual) observations of C. acerosum and *C. moniliferum. Assuming that this is a representative sample of Closterium, this means that ~55% of the time, CV will never pick the right option, and has no way to do so!

I think in this situation, it is probably a good situation to limit the CV so that it only includes Closterium. Sure, it will no longer suggest Closterium moniliferum or C. acerosum when there is bona fide C. moniliferum or C. acerosum there. But because these other species make up a large proportion of Closterium, I think limiting the CV in this case could actually lead to higher accuracy by limiting misidentifications.

In short, I think by using wider categories (be conservative and suggest genus only, rather than suggest one of a few species) could improve the AI. I don’t know how you would decide when to do this. You could look at the species or genus counts like I did with Closterium, or there could be some way to manually flag a taxon. Just throwing this idea out there.

Topic		Replies	Views
Problems with Computer Vision and new/inexperienced users General	134	5544	December 27, 2021
Can CV be changed upon request? General question , computer-vision	26	735	March 28, 2026
New Computer Vision Model Released News and Updates	73	3758	May 20, 2024
Don't use computer vision General	169	9769	September 18, 2020
CV suggestions no longer accurate after City Nature Challenge General	42	948	July 18, 2025

Recommendations on improving the AI algorithm?

Related topics