Biases in iNat data

dlevitis · June 14, 2021, 7:57pm

I enjoyed reading through this old topic, which includes mention of several biases people perceive in iNat data. https://forum.inaturalist.org/t/not-an-unbiased-dataset/16800
Obviously iNat isn’t intended to be an unbiased sample of every living thing, or of anything really. Nevertheless, as a scientist who uses iNat data, I like to be clear on what the biases are and therefore how the data should and shouldn’t be used. I would appreciate hearing what biases you all consider important/interesting/overlooked in the larger iNat data.

A few examples, just to get us rolling:

Over-representation of whatever plants are flowering or fruiting.
Under-representation of whatever is hard to ID from photographs.
Over-representation of anything charismatic.
Under-representation of anything common.
Under-representation of anything that no one locally is IDing.
Over-representation of anything that lives near trails/roads.
Over-representation of anything that live near universities.
Over-representation of anything that lives near socioeconomically advantaged populations.
Under-representation of most tropical taxa.

Please be as general or specific as you wish, but I’d appreciate hearing what you would add to this list. Thank you.

mpintar · June 14, 2021, 8:09pm

For insects, larger species are much more commonly observed than smaller ones. Aquatic beetle and heteropteran species that fly are more commonly observed than those that don’t or do so less frequently.

muir · June 14, 2021, 8:25pm

Underrepresentation of taxa that don’t attract ID attention on iNat. It’s a feedback loop that was brought up on someone’s journal post recently – there can be a observer tendency for taxa that attract the attention of someone who is on iNaturalist who is also passionate and knowledgeable about the group. And conversely, for taxa that languish without comment or ID, or are only IDed to a high level, the observer may “pass them by” the next opportunity.

graytreefrog · June 14, 2021, 8:34pm

Underrepresentation of species that are frequently present in photos meant to be of other species that have been posted in observations only identified as the other species. For example, an photo may be posted of an insect and the observation may be IDed as that insect by the observer, but the insect may be walking over an uncommon species of moss, visible in the photo, that the observer didn’t even think of making a separate observation for.

mnharris · June 14, 2021, 8:38pm

Very true. I don’t photograph mosses because they are very rarely identified.

roomthily · June 14, 2021, 8:41pm

3 and 4 can conflict with each other like it feels like an over-representation of common non-native plants. or honey bees vs native bees.

might be a difference in coastal aquatic and terrestrial aquatics (other than birds). that might fall into the over-representation of large things and identifiable via photography things.

lichens!

optilete · June 14, 2021, 8:46pm

Guess it depends on the continent, with a reaction time about three hours I am very confident about the ID’ers

DianaStuder · June 14, 2021, 8:49pm

and yet, the other day, someone new to me was on iNat. A bryologist at the University of Cape Town.

A bit, build it and they will come.

zakronia · June 14, 2021, 8:55pm

When a species is the only one present of a higher taxa, it is much more likely to reach Research Grade because you can more quickly rule out the possibility of it being something else. This can create a overrepresentation of these species, because of a larger percentage of observations reaching Research Grade.

For example, the Purple Foxglove (Digitalis purpurea), is a very common plant in Washington state, is the only species with Research Grade observations in Tribe Digitalideae, and is fairly easy to identify anyways. This means more people are willing to ID it, creating more observations, and getting more Needs ID observations to Research Grade, faster.

The same thing happens with Yellow-spotted Millipede (Harpaphe haydeniana) in Washington.

Also plants are more likely to be observed then anything else, because plants are much easier to photograph, to find, and you can move around the plant to get different angle with risking scaring it off. ( or at least I hope you can, otherwise I have some questions)

mattparr · June 14, 2021, 9:24pm

Underrepresentation of whatever has an evolved preference to remain unobserved: by camouflage, mimicry, running away, hiding always or during part of the day, etc.

Underrepresentation of whatever is inaccessible to humans: deep sea vent creatures, etc.

muir · June 14, 2021, 9:26pm

I think this is a good list @dlevitis A couple years ago, I had journaled on something related: What’s the world’s most observed insect genus? and more thoughts on iNat observability

What affects the observability of a genus or other taxon? Whether on iNat or in general, I think the most important factor is habitat accessibility. If a taxon doesn’t occur on the road system, occurs in a habitat away from human population centers, and/or can’t be easily observed from dry land, then I suspect that the taxon is unlikely to ever be among the most observed on iNat.

Aside from accessibility, I think you probably need at least two of the following four factors, and the more the better, to boost both detectability and observability:

Common: the degree to which a taxon is present and abundant.

Charismatic: the degree to which a taxon appeals to people. Charisma is obviously somewhat in the eye of the beholder, but broadly, it appears to be a detectable influence on what humans care about in nature (e.g., paper: Human preferences for species conservation: Animal charisma trumps endangered status).

Conspicuous: the degree to which a taxon is notice-able and visible. For example, a taxon that is diurnal, brightly colored or highly contrasted, large-bodied, and/or perches in the open is more observable than a nocturnal, dull-colored, microscopic thing that resides in the soil or thick vegetation.

Camera-friendly: the degree to which a taxon is photograph-able. I’m not quite sure if/how this might differ from a taxon being conspicuous, but something about taxon staying still in well-lit situations. To the extent that iNat observations are increasingly made via the app, this factor increasingly means smartphone camera-friendly.

dkaposi · June 14, 2021, 10:16pm

The bias of regional organizations also influences the iNat data. I live in Ontario and the provincial entomological society was an early proponent of using iNat to collect Lepidoptera observations. Ontario now has 67% of the Lepidoptera observations in the country, but only 39% of the national population. I don’t believe that there are more bugs per person in Ontario, but there appear to be more keen citizen scientists on this platform in the province.

Having said that, #3 is still an issue as the Monarch is the most observed species in the province, with almost as many observations as the second and third most popular leps combined (which happens to be moths). I’m not sure about #8 for this taxa as so many observers travel to natural areas for better habitat. Underrepresentation is often related to accessibility due to the the size of the province/country, as noted by @cmcheatle in the other thread:
Not an unbiased dataset:

Of course it is geographically biased. Even here in ‘First World’ Canada my home province is a million square kilometers with large sections a blank canvas on iNat, due to them being unreachable without a float plane, helicopter or weeks long canoe expedition.

jciv · June 14, 2021, 10:23pm

Over-representation of taxa a heavy iNat user focuses on. I can see my skewing of the numbers for a few spider species I am interested in. 2/3 of the observations were mine.

Over-representation of insects that are attracted to UV light.

jdjohnson · June 14, 2021, 10:56pm

With the exception of some coral reef areas, under-representation of anything that lives underwater. It’s just harder to get good photos underwater.

novapatch · June 14, 2021, 11:41pm

Given the active Brood X periodical cicada emergence in the eastern US, I’ve noticed:

Over: Magicicada septendecim because you can ID from most photos
Under: M. cassinii & M. septendecula because you can only ID with an underside photo or audio recording

The Brood X NYC project has no species IDs because we only have six observations, all are either M. cassinii or M. septendecula, and none have the above identifiable evidence.

earthknight · June 14, 2021, 11:42pm

Reaction time of 3 hours?

I often wait for years to get an ID or confirming ID, and often that’s only to family or genus.

SE Asia doesn’t have a lot of people on iNat with the experience and knowledge of this region. It’s getting better, but it’s slow.

earthknight · June 14, 2021, 11:43pm

Same with microscopic life.

dlevitis · June 14, 2021, 11:48pm

That’s a well written and informative blog post. Thank you!

dlevitis · June 14, 2021, 11:50pm

I’m always impressed by the relatively few iNat users who get identifiable photos of microscopic life.

mamestraconfigurata · June 15, 2021, 12:32am

Point 2 is also related to another source of bias - the unavailability of online (or other) resources. I took a photo of a chironomid this morning, and doubt I could find a resource to identify it further. Or else it would involve features that a photo cannot provide.

Topic		Replies	Views
Not an unbiased dataset General	37	3526	December 3, 2020
Improving Data Quality General	34	1146	August 12, 2022
Interesting paper on which butterflies are over/under-represented in RG observations in USA + Canada General	19	445	September 24, 2024
What is iNat's wish list for observations? General question	38	1787	May 27, 2022
Mapping distributions of iNatters via common species General	29	1595	June 28, 2024

Biases in iNat data

Related topics