The problem with blindly using biodiversity databases

This has been discussed before (e.g. Strengths and Weaknesses of iNaturalist Data).

While I don’t disagree with the general point that users of public domain data need to exercise care , I don’t see iNaturalist or other platforms collating observations by non-specialists as the problem. Crappy science, including the kind manifested by the failure to curate data from third parties, is a reality that predates iNaturalist by centuries. People with an axe to grind or those with a problematically relaxed approach to methods do not need iNat in order to corrupt the conversation. The large and growing collection of images, recordings and records on iNaturalist are a golden resource for those who use them appropriately.

There are people using public domain databases to do good science. Some of them post here and I am aware some of them are actively involved in the curation of the taxa of interest to them. Professor Ascher, to whose letter you linked, is a case in point.

Why would you do that? Statistical methods exist for subsampling almost any kind of dataset for QA/QC purposes. If your question is framed simply enough you often don’t even need to go that far. The sweetgum case posted by @jharkness is actually an example of a simple way of doing QA/QC that doesn’t bother with formal statistical analysis although I’m pretty sure that a randomized subsample would have produced the same answer in less time.

The misuse of data from platforms like iNaturalist is not a problem with the technology. It is a problem with the use and abuse of data and should not be framed otherwise. On the other hand, finding ways to enhance the quality of iNaturalist data that don’t interfere with the main mission of increasing awareness of biodiversity is a good thing. Maybe there needs to be a captive/cultivated project structured as a learning exercise or perhaps a captive/cultivated leaderboard :shushing_face:.

6 Likes