iNat data quality in comparison to 'expert knowledge'


In that time, what percentage of the observations had one or more citizen scientists weigh in?

My impression is that accuracy is too heterogeneous for a summary of iNaturalist as a whole to be helpful. Some taxa are reliably identified. Some aren’t. What matters to someone using the data is probably, “What’s the accuracy of the data I want to use?” not “What’s the average accuracy of all of the data?” Sometimes, these will be very different things.

Also, there isn’t an unambiguously correct way to measure accuracy. Different kinds of accuracy will be meaningful to different users. For instance, researchers using iNaturalist to look at plant phenology are probably going to focus on a few widespread, common, easily identified species. They would, then, want a dataset that is large in numbers of individuals, but small in numbers of taxa. Coming from a more taxonomic background, my inclination is the opposite—my ideal dataset has high biodiversity but not necessarily large numbers of observations per species. The widespread, common, and easily identified species are generally the ones I find least interesting.

Now suppose we have a dataset that includes 10,000 observations of Alpha beta and 50 observations each of Alpha gamma and Alpha delta. Further, let’s suppose that 99% of the observations of Alpha beta are correctly identified, but the accuracy rate is only 33% on observations of Alpha gamma and Alpha delta. If we measure accuracy as the number of correctly identified observations over total observations, it’s going to be slightly below 99%. Our hypothetical phenology researcher is probably going to be quite happy with that. If we measure accuracy as the proportion of species that are correctly identified, it’s 33%. A taxonomist focusing on biodiversity and less interested in common species is going to think the data are pretty awful.

In other words, I don’t think accuracy in the abstract is very meaningful. Accuracy of which observations for which use?


(For what it’s worth, I think the general patterns in my hypothetical example are accurate—observations are heavily skewed towards a small set of common taxa, and accuracy is much higher for that small set of frequently observed taxa than for most taxa. The particular numbers, though, are exaggerated and not intended to be representative of anything.)


For you. I see new to me species on iNat every day.
And new (please add missing) species for Africa most days.

