The problem with blindly using biodiversity databases

I agree with this completely. I have worked with a lot of scientifically rigorous data that also had plenty of issues. How we choose to use data and the assumptions we make is an important part of using any data.

It is also important to point out that sometimes you just have to use data that you know has issues because like it or not, it is the best available data. You just need to be clear about your assumptions and identify those issues and risks clearly and plainly. I go by the view that folks can disagree with the the data used if they like, but they then need to step up and identify better data. If they cannot, then its irrelevant because you cannot always wait for perfect data to make decisions.


That paper itself appears to be deeply flawed. As noted by Rod Page, they count synonyms and specimens undetermined beyond genus as “incorrectly named”, don’t state clearly what “wrong” means beyond that, and most fundamentally, don’t provide the data so anyone else can check.

They also cite the revision of Aframomum that is the source of the correct names as being from 2014, when in fact it wasn’t published until three years after this paper, in 2018. Even then it was published in a small-run print-only format, so many herbaria probably still have not updated.

The paper provides zero evidence of widespread misidentification in museum collections. It certainly does happen, but as someone who works at one, it’s at nowhere near the level that it does on iNat. One of the advantages of this and similar sites is that it can maintain consistent nomenclature, which museums have a hard time doing. But having tree heliotrope listed under the name Tournefortia argentea instead of Heliotropium arborea is not “mistaken identity”.


Thanks! The discussion comments under Rod Page’s blogpost are interesting too. Still looking for an apples-to-apples comparison to some of the ID quality metrics that kueda posted back in 2019…

I saw some data from iNat that said the ID’s were decent (70%?). I looked for it after reading this, but I could not find it. I remember using the link to counter someone who said iNat data were poor.


I’m not sure how you can do an apples to apples. One concern is how do you classify the digital platform observations that may or may not be wrong, ie their identity cant conclusively be confirmed.

I’ve seen at least 1 study where all those got called wrongly identified and the study conclusion was thus the ID quality was poorer on digital platforms. I remember being provoked by the study claiming a high error rate on birds on inat which are one of the easiest groups to identify. But likewise I can’t find that thread here.


In the link posted above by @kiwifergus Kueda states :

“accuracy varies considerably by taxon, from 91% accurate in birds to 65% accurate in insects”.

Making the latter part bold, as there are 90000 or so species of insect in N.America but only 2000 or so bird species, so, arguably the 65% accuracy is the more relevant end of the benchmark. This is in a N.American context I think also(?), so likely less accuracy in most other locations.

Meanwhile I´ve seen the museum comment raised by @fffffffff and @dianastuder repeated many times elsewhere - but as far as I saw when I last looked into this, it just seemed to stem from an anecdotal comment/supposition …its not in connection with any actual figures. In the link posted above by @tiwane the only mention of museum quality seems to be in an offhand comment by TonyRebelo.

I struggle to believe any respectable museum insect collection would have a comparable 65% accuracy.

Not that I think iNat is doing a bad job! Its clear its come on leaps and bounds in UK obs this last year. But, there does seem to be a bit of an echo chamber around some of these stats and statements on the forum…which is problematic.


Highly doubt birds have high error rate, there’re 10 people checking each bird observation.

@sbushes well, links were posted before, personally I’m not good enough in saving them, though I tought it all was at least about RG observattions, there’re far less mistakes in those, if we could eliminate blind agreeing it’ll be up to 90% of true ids (complex groups will always lead to mistakes, plus most iNat obs don’t have a specimen itself, unlike museums).


The Kueda link stats are with regard to RG data.

I’m sure I dug through the links to the museum/herbarium accuracy comments previously… it just ended up at a single anecdote that has been regurgitated over and over since then in a way similar to this thread. I don’t believe there was ever a solid source, though it would be great to see one / be proved wrong.

The idea that iNaturalist accuracy will compare to museum accuracy seems conceptually doomed from the outset to me. At least in the UK, something like the Natural History Museum has the largest collection - with the bulk of our type specimens and the bulk of our taxonomists. These are literally the people making the keys and staking out the information we are using to make IDs here! It seems really counterintuitive to think that iNaturalist accuracy could be likely equivalent to museum collection accuracy in its current design.
In taxa where we have experts such as those from the NHM engaged on iNaturalist - like Tachinidae - we have similar accuracy, sure. In taxa like millipedes, where we have almost nobody active, we seem to have very little accuracy. It’s just inevitably going to be patchier.


If you want a weighted average, I think it would be better to do it based on number of posts of birds : insects rather than number of species in the continent. That should give you a measure of accuracy on iNaturalist, rather than a measure of how easily bird photos are identified compared to insects, as the insects photographed will be biassed towards the big eye-catching ones, whereas most of the 90000 species will be inconspicuous flies, parasitic wasps and beetles which will be unidentifiable without photos taken down a microscope.


That’s exactly the point. Why should we accept any part of the study as valid when demonstrably and easily disproven stats like claiming a 10 percent error rate on birds are included.
A study of this type is only valid if the observations used in it are weighted in a similar basis to the dataset as a whole.

The following are the insects used in the study>

Poanes - a notoriously difficult to separate group of dull brown and orange butterflies
Agraulis vanillae - a relatively distinctive butterfly
Disholcaspis cinerosa - a gall wasp which are most often submitted as egg cases and easy to get wrong
Aquarius - water strider genus
Belostomatidae - Giant water bugs
Enithares - Backswimmers
Lethocerus - water bug genus
Corixidae - Water boatman genus
Laccotrephes - water scorpion genus
Lethocerus griseus - water bug species

So you have a known difficult butterfly genus, a distinctive butterfly genus, a gall wasp and multiple aquatic insects as the dataset.

That’s not representative of the observations on the site in any way.

There are currently just over 5million research grade records of insects in North America. 25% of those are butterflies, 12% are odonata. These are small well studiied popular groups. Another 25% are moths which while larger in species count are also well studied and popular. So you have over 60% of the records just from those 3 groups alone where there is virtually no chance there is a 35% error rate.

All this study ‘proves’ is one or more aquatic insect expert has questions about the accuracy of a group of taxa that represent a small minority of the overall insects on the site. There is no way in the world it is statistically robust enough to claim a 35% error rate on insects, or any of the other statistical claims it makes.


Careful iders were tested, as most of bad data goes from new and short-participated (upload 20 pics with AI suggestions and delete the app) users, if we look at experts their id rate will be the same as museums’. As same people work with both datasets. And most of weird ids don’t go to RG, we see them a lot, but we see them because we divide them from "normal’ ones and remember better than thousands of thousands of “good” observations.


Details of taxa interesting, hadn’t noticed that.
Kueda also stated this was a relatively adhoc experiment.
And looking more closely there are only 300 insect records in total even taken into account(?)
If so, this anyhow seems like way too small a dataset to take a solid % from as well.

I mean ultimately, it would be great to just see more quantative analysis of this…
In the mean time, I just wish these stats and statements were used with a little more context. Especially when some of the stats used seem biased towards more common and simpler to identify taxa.

There might be high accuracy on common taxa but museum collections don’t suffer from this sort of taxonomic bias - five million museum records would span across species distribution more evenly. So at least with regard to museum quality statements, this isn’t comparing apples with apples without weighting these %'s somehow (?) as @jhbratton mentions.

Indeed…and similarly, there is no way in the world it is statistically robust enough for anyone to claim comparison in accuracy to a museum collection.


At no point in this or any other thread on this topic have I suggested this to be the case. Comparing the accuracy of in-hand physical specimens vs. digital photographs is not possible.

I have no idea what the error rate is for museum specimen collections, and have never commented on it. I simply dispute the validity of the error rate being constantly stated here about iNat identification accuracy based on an ad hoc very small, non representative sample.


Sure. This wasn’t aimed at you specifically… more broadly at the statement which gets thrown around within the forum.

The museum statement is interesting though, as it would be helpful in theory, to compare accuracy metrics to external entities as @muir also mentions. Without a benchmark, it all becomes a bit meaningless.


Just pointing out that any comparison between iNat ID accuracy and curated collection ID accuracy is really not comparing apples with apples. The two data sources are very different beasts… generated in very different ways, for likely very different purposes. If anyone finds that iNat data can’t answer the questions they have, then they have other sources they can go to. Conversely, can a collated collection provide answers to questions about HOW the general public relate with the organisms in the environment?

To put an analogical spin on this… think of vehicle safety… there are labs that run highly controlled experiments with crash test dummies and complicated sensors and high speed cameras. They generate a very useful set of data for designing safe vehicles. Then there is a large amount of live accident data, and traffic flow data, collected by various organisations and authorities around the world. Is that accident data comparable to the lab crash test data? It’s a different kind of data! It (could potentially) answer a different set of questions!

Anybody that is used to looking at curated collection data, that then looks at iNat data and says it doesn’t measure up, is just kinda stating the obvious, and to reject iNat data completely because of that assessment is kinda dumb. When I need a hammer and I pick up a socket wrench, to compare it’s functionality to a hammer and then discard it as worthless because it doesn’t measure up in that task would be daft!

But what I find most daft, is that experts seem to claim that iNat IDs not being accurate is a reason for withdrawing their participation. That is like refusing to wear a seat belt because they cause injuries in an accident! iNat data is likely to be MORE accurate with their participation…


As someone who may have contributed to that echo chamber in the past, I’ll just speak to my own experience. I spent my undergraduate and graduate student years working and conducting research in two different, relatively well-curated herbaria, one with about 100,000 specimens, one with about 1,500,000 specimens. I was regularly finding and annotating misidentified specimens in both.

I would be hard-pressed to extrapolate that experience to an overall percentage in either collection, since I was mainly looking at specimens from taxa and geographic areas that I knew something about (much as I do now on iNaturalist).

This isn’t anything to do with the respectability of the institutions or their curators. With that many specimens coming in from all manner of collectors, there is no way even the best-staffed institution could keep up with all of the inevitable misidentified material. (And what museum ever has all of the curatorial funding they wish for?) Again, quite analogous to the situation on iNaturalist.

Analogous to iNaturalist data, the error rate for museum collections would be expected to vary widely depending on taxonomic groups, geography of the collections, and funding of the institutions. When pulled into world-wide aggregators such as GBIF, however, it wouldn’t surprise me at all if the overall error rates for iNaturalist data versus museum data were statistically comparable.

That said, this is of course just anecdotal opinion based on personal experiences in both arenas. The difficulty of doing meaningful statistical comparison has already been well pointed out. And in the end, I’m not sure if it’s worth belaboring since, as also well pointed out, there is no such thing as a flawless data set, and data users are ultimately responsible for how they vet and use the data, whatever the source. I would rather spend the effort getting the data of interest as close to flawless as possible.


The far side of this discussion - is when the old herbarium record has a vague location, and the botanist goes out exploring - and ta da finds a plant that was officially extinct.

We have available data and we work out from that. It fascinates me to follow discussions, pencilled notes, can anyone read, could be, or maybe … and we get there in the end.


I absolutely agree. The point is that databases, especially if they are rich in data and georeferenced, are extremely convenient for research. Of course databases are really useful if used for what they are, that is often something compiled a posteriori without (or with limited) the possibility of verification of the original data.

Again I agree. But I think that this should concern the staff behind GBIF that, in turn, should consider the possibility to verify what is uploaded from iNat.

iNat as a citizen science-based database is even more useful than others because most of the observations can be verified as far as their ID or their wild/cultivated status are concerned. Anyway it is not ready-to-use. As regard, I am trying to make use of the observations posted from the region where I live in and I have had to put much effort in correcting many IDs as well as flagging tons of non-wild observations. For a relatively limited area it is a task that can be undertaken by one/few people but when you deal with the observations made at the country level the workload grows.

It’s an interesting possibility.



To get to a closer apples-to-apples comparison, this discussion suggests a few things to me of how one would want to more credibly compare errors between iNaturalist and more traditional biodiversity collections:

  1. Compare same taxon groups from same region (preferably with taxon groups that have had a recent history of stable/uncontentious classification). Ideally, you would randomize the selection of taxon groups and regions. Next best would be to stratify to include some well-known and -studied taxon groups and geographies, and some less well-known/collected/studies taxa and places, so that you could see if there were differences across a spectrum.
  2. Use a typology of error categories that are clearly defined and applicable to both iNat and museums. kueda had defined several key terms in his blogpost around accuracy, specificity/precision. It also seems like misidentifications could also be parsed more finely (e.g., there are misidentifications due to identification error, and misidentifications due to out-of-date taxonomy). That would help address some of the flaws that @kmagnacca and others found in that tropical plant paper I posted.
  3. Identify the appropriate sample size a priori. I don’t see anyone has mentioned it yet, but the 65% insect accuracy stat is based on less than 200 expert identifications! (and as @cmcheatle pointed out, only a handful of taxa). I don’t have a great sense of what you would want in terms of sample size, but 200 seems far too few.

It seems like the right approach is deep humility when it comes to comparing iNat data quality with museum collections until we know more about the differences.


So really, these are out of date identifications more so than misidentifications, right? I’m curious how curated collections handle this routinely, if at all. I guess I thought that game of catch up was built in to things like the citation histories of speciesfiles and things like that. Do people actually go update the little labels sitting next to each bug in a collection?