New paper offers suggestions for improving the value of citizen(community) science data

A new open access paper in PLOS critiques the haphazard sampling usually found in citizen(community) science databases (such as iNat).

There is some nice discussion here of the statistical limitations of project such as iNat that I think echos discussions we’ve had on the forum. In a way the paper is aimed more at the organizations running citizen(community) science projects rather than users themselves but I think everyone could learn something from considering their points.

The authors make a fairly non-specific recommendation to incentivize sampling in particular times and places rather sampling (and detecting) specific species. This would improve coverage and give us better data on where organisms are NOT found for instance. For example, current leaderboards in iNaturalist incentivize finding species and making observations rather than sampling per se. The authors don’t provide a lot of details of how to build such a system and I don’t know how their suggestions would fit into iNaturalist. But their paper is very relevant to the ultimate scientific value of iNaturalist.

I hope some of you find it of interest.


If you liked this one stay tuned for another one coming out soonish looking at the important future areas of citizen science :))

1 Like

Hold a bioblitz if you want observations from a particular place.


But, what can individuals do on an ongoing basis?

I would love more of an emphasis on natural communities, and what makes up the community in a particular ‘patch’. There must be ways to break that down so that individuals can work on that.

I visit a limited number of natural areas, trying to delineate the ‘architecture’ of that area. But I don’t know if the resulting observations are of any use, beyond just presence of the individual organisms. It would be interesting to focus on how iNat could structure the observations in a way that meets the paper’s recommendation.


It goes a bit beyond just holding a bioblitz. The key is incentivisation, and a lot of people aren’t necessarily incentivised by a bioblitz. Benefits not always immediately obvious to them.


Thanks for the suggestion but just to clarify the authors are addressing a slightly different issue. The problem is the scientific value of the observations–not the number of observations per se. These are not necessarily the same. For instance knowing where organisms are not found and how hard the users looked for them are valuable data that are not collected during bioblitzes. I won’t elaborate but you get the picture.


I think it’s awesome to regularly visit the same local spots and record what’s there. I find that more rewarding than traveling to far flung places but that’s just my personality.

I’m not sure what you can do as an individual. Projects like iNaturalist kind of get their strength from having a whole lot of people involved all making the same kind of observations. In general, recording some notes about your route through the ecosystem of interest, how long you were there, and more detailed data on phenology and behavior of what you saw is always valuable. Keeping counts of the number of individuals of certain species is valuable (but very difficult for numerous things like common plants and insects).

A good model perhaps are the journals of 19th century naturalists such as Henry David Thoreau. Ecologists have gone back to his journals and used them to extensively in comparing the historical ecosystem to what we observe in the same place today. Here’s paper from the journal Bioscience that summarizes of some of that research (let me know if the link doesn’t work):

You can actually read his journals here which is pretty cool.

Lots of other naturalists who were not professional scientists also kept valuable journals like this that have made really important contributions.


Related is the Trips feature, which you can read more about and see some examples here:

This type/level of data would increase the value a lot.


“Related is the Trips feature, which you can read more about and see some examples here:

Cool! So I could make up, say, a list of the dominant overstory trees in my region, and record presence/absence of them at all my sites. Which raises issues when I make repeated trips to the same small place… maybe do a tree inventory once a year, ditto an understory tree/bush inventory once a year, etc. And record bedrock and soil type, slope, water regime, etc. in the comments. If iNat develops a more systematic way to record this info, it would be easy to update.


long ago, i worked in marketing research. most of the time before we did a survey, we would look at past research, study trends, and do a focus group to figure out what would even be relevant to consider in a more structured survey. i think of stuff in iNaturalist as similar to that background and focus group research. i think it’s a mistake to try to make it out to be anything more than that in general.

sure, you can structure how some people are collecting data, but you will make that data more useful only for whoever has defined that process. you define the process according to what kind of question you’re trying to answer. ask a different question, and you’ll need a different process.


I read recently about an ongoing citizen-leveraging flooding survey that’s structured to make sure people go out and take pictures not only of the “impressive” spots that usually get photo-shared, but also known spots that might have flooded but didn’t this time. If you have people you can count on to make an observation routinely at a given place during each major rain event, then you get something more like the kind of robust data you need to do effort statistics.

Not something individuals can do, but perhaps something specific area or project managers could do locally — making sure your bioblitzes evenly cover the area rather than focusing on the high-diversity spots. Some of your volunteers, on some of the occasions, would have to sacrifice their likelihood of getting rare observations or lots of observations, but bird counts do this sometimes — somebody’s got to take the side of the park with all the pigeons, for the good of Science!

They are also data that that are not collected by iNaturalist. That would require a totally different approach and it would be difficult to get citizen scientists to collect them,IMO. Negative data are always problematic.

it would be neat for iNaturalist to have a feature to match people who want to do the footwork for science and people who have projects that need foot soldiers. i think that would definitely be in the spirit of the platform / community.


sounds like

1 Like

yes, but done within the familiar ecosystem and community of iNaturalist.

I’ve hesitated using iNat for “tracking-over-time” projects. As an example, I am involved in tracking the progress of invasive plant in a specific (small-ish) area. iNat doesn’t seem to be the right software for this, but I can’t put my finger on why it isn’t.

Again not within iNat, but Adventure Scientists was established to matchmake adventurers and scientists. They’ve used iNat for some of their projects.


I didn’t read the paper thoroughly, I just scanned it. But I disagree with the premise of the analysis in this paper.
Our current scientific methodologies for sampling and statistical analysis were developed over centuries of parallel intellectual work by scientists and mathematicians. That’s why those methods work so well.
But the sampling methodology of Citizen Science doesn’t meet those rigorous standards because it isn’t designed to do so. It is a new type of sampling. The burden of responsibility shouldn’t fall on the Citizen Scientist who is contributing data, it should fall on the statisticians and professional scientists to figure out how to use the data that is contributed.
Incentivising more rigorous data collection policies has to be done very carefully. Citizen Scientists have established how they want to gather data. We shouldn’t discourage that in the name of a framework of scientific rigor developed for a different kind of data collection.


Well said!


I think it’s important to recognize that statistics are not magic. In most cases, we can’t just “invent” a new method that overcomes the limitations of a sampling design. Statistics can be clever about new ways to account for biases and variability in measured variables, but it generally can’t replace data that wasn’t collected.

This is a struggle even in the research community - I work with some particular types of population models and very often my first piece of advice to colleagues is “think about your analysis first, then design your data collection protocol”. This inevitably saves a lot of headaches later and helps ensure that their data are actually useful.

I wouldn’t suggest totally overhauling community science protocols to make them as robust as many used by researchers, but I think there is a lot of room to make small improvements to data collection that could vastly improve the utility of these data.

For example, the ability to easily record presence/absence information or effort (even if not standardized) opens up a lot more options for using the data.