The data double standard

muir · January 20, 2024, 5:22am

Interesting article The Data Double Standard from June 2023 by Allison Binley and Joe Bennett, both affiliated with Carleton University, Ottawa, Canada. https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.14110

The discussion broadly applies to iNat, other community science efforts and what the authors describe as “professionally collected data.” Worth reading in full, but abstract and figure-1 copy and pasted below. Related to past forum discussions related to dataset quality, professional data and museum/herbarium records.

Abstract

Conservation planning requires extensive amounts of data, yet data collection is expensive, and there is often a trade-off between the quantity and quality of data that can be collected. Researchers are increasingly turning to community science programs to meet their biodiversity data needs, yet the reliability of such data sources is still a common source of debate.

Here, we argue that professionally collected data are subject to many of the limitations and biases present in community science datasets. We explore four common criticisms of community science data, and comparable issues that exist in data collected by experts: spatial biases, observer variability, taxonomic biases and the misapplication of data. We then outline solutions to these problems that have been developed to make better use of community science data, but can (and should) be equally applied to both kinds of data.

We highlight four main solutions based on research using community science data that can be applied across all biodiversity data collection and research. Statistical techniques that have been developed for processing community science data can equally help account for spatial biases and observer variation in professional datasets. Benchmarking or vetting one dataset against another can strengthen evidence and uncover unknown sources of biases. Professional and community science datasets can be used together to fill knowledge gaps that are unique to each. Careful study design that accounts for the collection of relevant and important covariate data can help statistically account for sources of bias.

Currently, a double standard exists in how researchers view data collected by professionals versus those collected by community scientists. Our aim is to ensure that valuable community science data are given the prominent place they deserve, and that data collected by experts are appropriately vetted and biases accounted for using all the tools at our disposal.

Figure 1 Sources of error and bias that are found equally in data collected through both community science and conventional (professional) monitoring.

tisli · January 20, 2024, 2:39pm

This an excellent topic, and thank you for bringing this valuable paper to the attention of the community @muir.

SQFP · January 20, 2024, 5:55pm

With great power comes great responsibility. (© Uncle Ben )
As noted, “experts” can be wrong too. Even if a harder goal to reach, I’d rather have 10 cheap experts keeping each other in check about a tricky observation… than one Godlike expert with 10 votes to spend. Democracy for all its flaws (and its debatable use in scientific matters) is not necessarily worse than aristocracy or noocracy.

–

As regards the OP topic, an anecdotical remark: for vascular plants of a certain island, the data available on GBIF varies a lot in quality depending on its source… but not really in a community-vs-professional way. It features notably:

a regional “pro” source serving as a baseline, a reputable golden standard;
another “pro” source (countrywide public service) that is very often shamelessly wrong;
one “community” source which proves surprisingly reliable, or verifiable at the very least;
another “community” source too unreliable, and often unverifiable, to be worth considering.

Beyond the grim issue of a (once-revered) public service having lost part of its expertise - let’s hope data consumers realize this! - my guess is that both community-driven sources somehow manage to attract (stimulate? retain?) different kinds of participants, or in different amounts, while also operating differently wrt data.

sedgequeen · January 20, 2024, 6:05pm

Exasperated, here. How do you expect them to get corrected, then? Oh, it may not be worthwhile for you to search for observations to correct the ID, but I sincerely hope that at least you do add an accurate ID (confirming or correcting) to any observation that you open for any reason.

Note that you can also tag other people who know the taxon, to get more votes. And the “pre-mavericks” project helps with this, too.

JKT · January 20, 2024, 6:41pm

In my experience one ID can make most of the old identifiers to check and change their opinion - especially if the new ID is from someone who’s been around for awhile. It should at least start a discussion. Though I admit that most of the IDs in Lepidoptera come from a relatively small group of people and not from random beginners. The latter would be less likely to respond.

system · March 21, 2024, 1:59pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Biases in biodiversity data Nature Talk	2	396	July 1, 2023
New paper offers suggestions for improving the value of citizen(community) science data Nature Talk	28	2496	September 23, 2019
Article: Social inequities and citizen science can skew our view of the natural world General	39	1538	May 26, 2024
Paper discussion: Recognition and completeness: two key metrics for judging the utility of citizen science data General	14	758	March 30, 2023
Interesting article on citizen science and vulnerable species Nature Talk	14	1485	October 18, 2022

The data double standard

Abstract

Related topics