Biases in iNat data

I enjoyed reading through this old topic, which includes mention of several biases people perceive in iNat data. https://forum.inaturalist.org/t/not-an-unbiased-dataset/16800
Obviously iNat isn’t intended to be an unbiased sample of every living thing, or of anything really. Nevertheless, as a scientist who uses iNat data, I like to be clear on what the biases are and therefore how the data should and shouldn’t be used. I would appreciate hearing what biases you all consider important/interesting/overlooked in the larger iNat data.

A few examples, just to get us rolling:

  1. Over-representation of whatever plants are flowering or fruiting.
  2. Under-representation of whatever is hard to ID from photographs.
  3. Over-representation of anything charismatic.
  4. Under-representation of anything common.
  5. Under-representation of anything that no one locally is IDing.
  6. Over-representation of anything that lives near trails/roads.
  7. Over-representation of anything that live near universities.
  8. Over-representation of anything that lives near socioeconomically advantaged populations.
  9. Under-representation of most tropical taxa.

Please be as general or specific as you wish, but I’d appreciate hearing what you would add to this list. Thank you.

17 Likes

For insects, larger species are much more commonly observed than smaller ones. Aquatic beetle and heteropteran species that fly are more commonly observed than those that don’t or do so less frequently.

6 Likes

Underrepresentation of taxa that don’t attract ID attention on iNat. It’s a feedback loop that was brought up on someone’s journal post recently – there can be a observer tendency for taxa that attract the attention of someone who is on iNaturalist who is also passionate and knowledgeable about the group. And conversely, for taxa that languish without comment or ID, or are only IDed to a high level, the observer may “pass them by” the next opportunity.

24 Likes

Underrepresentation of species that are frequently present in photos meant to be of other species that have been posted in observations only identified as the other species. For example, an photo may be posted of an insect and the observation may be IDed as that insect by the observer, but the insect may be walking over an uncommon species of moss, visible in the photo, that the observer didn’t even think of making a separate observation for.

10 Likes

Very true. I don’t photograph mosses because they are very rarely identified.

10 Likes

3 and 4 can conflict with each other like it feels like an over-representation of common non-native plants. or honey bees vs native bees.

might be a difference in coastal aquatic and terrestrial aquatics (other than birds). that might fall into the over-representation of large things and identifiable via photography things.

lichens!

5 Likes

Guess it depends on the continent, with a reaction time about three hours I am very confident about the ID’ers

3 Likes

and yet, the other day, someone new to me was on iNat. A bryologist at the University of Cape Town.

A bit, build it and they will come.

10 Likes

When a species is the only one present of a higher taxa, it is much more likely to reach Research Grade because you can more quickly rule out the possibility of it being something else. This can create a overrepresentation of these species, because of a larger percentage of observations reaching Research Grade.

For example, the Purple Foxglove (Digitalis purpurea), is a very common plant in Washington state, is the only species with Research Grade observations in Tribe Digitalideae, and is fairly easy to identify anyways. This means more people are willing to ID it, creating more observations, and getting more Needs ID observations to Research Grade, faster.

The same thing happens with Yellow-spotted Millipede (Harpaphe haydeniana) in Washington.

Also plants are more likely to be observed then anything else, because plants are much easier to photograph, to find, and you can move around the plant to get different angle with risking scaring it off. ( or at least I hope you can, otherwise I have some questions)

7 Likes

Underrepresentation of whatever has an evolved preference to remain unobserved: by camouflage, mimicry, running away, hiding always or during part of the day, etc.

Underrepresentation of whatever is inaccessible to humans: deep sea vent creatures, etc.

10 Likes

I think this is a good list @dlevitis A couple years ago, I had journaled on something related: What’s the world’s most observed insect genus? and more thoughts on iNat observability

What affects the observability of a genus or other taxon? Whether on iNat or in general, I think the most important factor is habitat accessibility. If a taxon doesn’t occur on the road system, occurs in a habitat away from human population centers, and/or can’t be easily observed from dry land, then I suspect that the taxon is unlikely to ever be among the most observed on iNat.

Aside from accessibility, I think you probably need at least two of the following four factors, and the more the better, to boost both detectability and observability:

Common: the degree to which a taxon is present and abundant.

Charismatic: the degree to which a taxon appeals to people. Charisma is obviously somewhat in the eye of the beholder, but broadly, it appears to be a detectable influence on what humans care about in nature (e.g., paper: Human preferences for species conservation: Animal charisma trumps endangered status).

Conspicuous: the degree to which a taxon is notice-able and visible. For example, a taxon that is diurnal, brightly colored or highly contrasted, large-bodied, and/or perches in the open is more observable than a nocturnal, dull-colored, microscopic thing that resides in the soil or thick vegetation.

Camera-friendly: the degree to which a taxon is photograph-able. I’m not quite sure if/how this might differ from a taxon being conspicuous, but something about taxon staying still in well-lit situations. To the extent that iNat observations are increasingly made via the app, this factor increasingly means smartphone camera-friendly.

12 Likes

The bias of regional organizations also influences the iNat data. I live in Ontario and the provincial entomological society was an early proponent of using iNat to collect Lepidoptera observations. Ontario now has 67% of the Lepidoptera observations in the country, but only 39% of the national population. I don’t believe that there are more bugs per person in Ontario, but there appear to be more keen citizen scientists on this platform in the province.

Having said that, #3 is still an issue as the Monarch is the most observed species in the province, with almost as many observations as the second and third most popular leps combined (which happens to be moths). I’m not sure about #8 for this taxa as so many observers travel to natural areas for better habitat. Underrepresentation is often related to accessibility due to the the size of the province/country, as noted by @cmcheatle in the other thread:
Not an unbiased dataset:

Of course it is geographically biased. Even here in ‘First World’ Canada my home province is a million square kilometers with large sections a blank canvas on iNat, due to them being unreachable without a float plane, helicopter or weeks long canoe expedition.

6 Likes

Over-representation of taxa a heavy iNat user focuses on. I can see my skewing of the numbers for a few spider species I am interested in. 2/3 of the observations were mine.

Over-representation of insects that are attracted to UV light.

8 Likes

With the exception of some coral reef areas, under-representation of anything that lives underwater. It’s just harder to get good photos underwater.

10 Likes

Given the active Brood X periodical cicada emergence in the eastern US, I’ve noticed:

The Brood X NYC project has no species IDs because we only have six observations, all are either M. cassinii or M. septendecula, and none have the above identifiable evidence.

6 Likes

Reaction time of 3 hours?

I often wait for years to get an ID or confirming ID, and often that’s only to family or genus.

SE Asia doesn’t have a lot of people on iNat with the experience and knowledge of this region. It’s getting better, but it’s slow.

10 Likes

Same with microscopic life.

5 Likes

That’s a well written and informative blog post. Thank you!

1 Like

I’m always impressed by the relatively few iNat users who get identifiable photos of microscopic life.

2 Likes

Point 2 is also related to another source of bias - the unavailability of online (or other) resources. I took a photo of a chironomid this morning, and doubt I could find a resource to identify it further. Or else it would involve features that a photo cannot provide.

7 Likes