Interesting paper on which butterflies are over/under-represented in RG observations in USA + Canada

Here is the paper. It should be open access, so anyone can read it without a paywall. https://esajournals.onlinelibrary.wiley.com/doi/10.1002/fee.2783

They compare butterflies reported on iNaturalist with those reported on eButterfly. They are only able to compare the coasts of the mainland US + Canada because there are too few lists posted on eButterfly from the interior of the US and Canada.

Here is the figure showing the over- and under-represented butterflies on iNaturalist:
image

The abstract:

The volume of and interest in unstructured participatory science data has increased dramatically in recent years. However, unstructured participatory science data contain taxonomic biases—encounters with some species are more likely to be reported than encounters with others. Taxonomic biases are driven by human preferences for different species and by logistical factors that make observing certain species challenging. We investigated taxonomic bias in reports of butterflies by characterizing differences between a dedicated participatory semi-structured dataset, eButterfly, and a popular unstructured dataset, iNaturalist, in spatiotemporally explicit models. Across 194 butterfly species, we found that 53 species were overreported and 34 species were underreported in opportunistic data. Ease of identification and feature diversity were significantly associated with overreporting in opportunistic sampling, and strong patterns in overreporting by family were also detected. Quantifying taxonomic biases not only helps us understand how humans engage with nature but also is necessary to generate robust inference from unstructured participatory data.

Very interesting stuff! I wonder if Pieris rapae is left out of iNaturalist more because people recognize it as a common introduced species, so people who are interested in posting “cool” butterflies/making life lists aren’t inclined to post every single one they see. I was kind of shocked not to see Monarchs on the over-represented list for either coast!

Looks like there’s a real need for some guide resources on some of these more confusing taxa, both on “how to document” and how to identify.

10 Likes

I always find Pieris rapae hard to take a picture of because it’s always flying about, never landing for more than a second, and the same goes for all the yellow butterflies. If they sat on a flower for more than a few seconds, I’d definitely have more observations of them.

15 Likes

These results largely conform to my own butterfly observations. However, I upload most butterflies I manage to photograph. In my experience, under-represented species are more difficult to photograph (both generally and for specific diagnostic features). Consequently, I have fewer photos of these species and they tend to reach research grade less often. I see Pieridae butterflies all the time, but the lack of observations is logistical rather than preference-oriented.

5 Likes

Oh that’s really interesting! I would expect the brown ones at the top of the list to be less well-represented, because they are not as easy to spot. (At least, I have not found them as easy to spot.) But maybe that’s why they are over-represented? If they are less commonly seen, then people might be more likely to photograph every single one they see.

Personally, I am much more likely to bother to photograph a butterfly if it’s one I haven’t photographed as much. After half a dozen or so observations, my interest tends to drop off. :sweat_smile: I tend to do this regardless of how common they are.

I expect that the Colias ones are underrepresented in part because people are not confident they can tell them apart. Taking a photo of something you are pretty sure you can’t identify is not as much fun.

2 Likes

I think that is where the human bias may come in, more specifically the “human preferences for different species” as they called it. I guess brightly coloured butterflies are more easy to remember, so users looking to find the most different species and not caring much about repeats, might recognise more easily that they already got that species than with a less memorable one.
I think that could cause underrepresentation especially for common species.

For myself, I don’t upload into iNat most of the photos I take of the more common butterflies in my area. Once I’ve documented a species in the time/place/activity that matters to ME, then I tend to view additional observation uploads as “redundant” and mostly skip doing them. Sometimes in iNat I might use the Quantity annotation to note when I’ve seen several of a species, but I don’t do that consistently: does any researcher really use the Quantity annotation when compiling iNat data? In eButterfly on the other hand, I do reliably use the Quantity box when I submit observations.

1 Like

The more I look at the Eastern Region list (the one I live in and occasionally post butterflies from), the more confused I am. Like the authors of the paper, I would have assumed that iNaturalist observations would be highly biased towards eye-catching species: ones that are larger, more colorful, and brightly patterned. I’d also add that I’d predict that ones that are found in places with high human population density and that feed on garden plants will be posted more, because for pretty much any taxa that can stand humans, the iNaturalist heatmap of observations is pretty much just a population density map. However, maybe this last “bias” is also equally present in eButterfly, and therefore goes unexamined.

This is why the eastern list is so baffling. Just going through my local observations, I’d have predicted that big swallowtail species and monarchs would be overrepresented, and pretty much everything else underrepresented.

To the contrary, the species they identify as being the most overrepresentative is a dull butterfly the size of a human thumbnail that only reveals its beauty if you’re lucky enough to be right next to it or wielding a telephoto lens. And then the next two most overrepresented are not necessarily easy to identify (in the sense that there are fairly similar species in their range). Then the Baltimore Checkerspot being the 4th most overrepresented made me go “HUH???” because they’re essentially a northern Appalachian+foothills species that does not occur at great density and does not occur in great numbers in any major east coast urban area. So for the vast majority of iNaturalist users in the region, they’d have to travel somewhere to observe it, which is just crazy to me in an overrepresented species! I guess the dedicated butterfly posters are outweighing the new users who are posting a swallowtail they found on their zinnias this morning (I don’t mean this negatively. These are the types of butterfly posts I upload, not the rarities.)? I would have NEVER guessed that to be the case?? The weird thing about these specific butterflies being overrepresented on iNat compared to eButterfly would mean that the iNat posters seem (by proxy of likely having to travel to find it) to be more “dedicated” butterfly seekers or “life listers” than eButterfly users are, which I would have definitely guessed to be the other way around.

3 Likes

I always find Pieris rapae hard to take a picture of because it’s always flying about,

Yeah I also definitely lean towards wanting to take/post nice photos but there 100% are some users who will post a white blur with a black dot in it and ID it Pieris rapae. And depending on the region that may be enough to ID it.

I have 100% noticed that dragonflies and birds that land only very rarely are underrepresented on iNat (both by myself and in general), and I guess I never thought about it for butterflies, but makes sense they’d be the same way.

2 Likes

I haven’t looked at the paper, but I am guessing that “over-represented” is based on the number of observations compared to the population size. So, if there are a ton of observations of Monarchs, but also there are a ton of Monarchs, that would not be over-represented. But if the population is very small, then it would not take many observations for them to be over-representing the species.

In birding, we expect rare species to be over-represented, because people are more likely to go out of their way to report those. For example, iNaturalist has 9 observations of Roseate Terns in Ohio, but they are all of the same bird (population size = 1). There are 6,872 observations of Red-winged Blackbirds, which is only a small fraction of the population size (many hundreds of thousands).

(That reminds me, I never uploaded any of my photos from the second day. Make it 10 observations of Roseate Terns in Ohio. :sweat_smile:)

5 Likes

Exactly, and what is the population size? The study compares iNat to eButterfly, but eButterfly isn’t a perfectly unbiased and comprehensive count of all butterfly populations. With bird populations I think we have a better idea, but in this case I think we’re comparing two sources, each with unknown accuracy…

3 Likes

seems like this kind of comparison should reveal:

  1. relatively low iNat observations for species that are hard to photograph and relatively high iNat observations for species that are easy to photograph (because it’s more common for iNat observations vs eButterfly observations to be associated with a photo)
  2. relatively low iNat observations for species that are common in an area (since an observer might make only one observation even if there are many individuals in an area on any given day, and might not spend the time to photograph butterflies that they see all the time)
  3. relatively high iNat observations for species that are hard to identify to species in the field (since iNat has other identifiers to help with identification)… although maybe this is offset by different user bases (maybe eButterfly users would be more expert butterfly identifiers to begin with vs iNat users)
4 Likes

I rarely photograph Cabbage Whites because they are so abundant. Monarchs are rare and I pursue to get a photo. Tiger Swallowtails are quite common but very flighty and very difficult to photo unless I am in a particular location where there are oaks.

3 Likes

100% both sources will have biases. Even a survey of butterfly abundance by professional lepidopterist-ecologists will have biases.

Throughout, the paper essentially assumed that having an explicit listing protocol with people marking that they recorded every butterfly they saw would be unbiased, but we know that’s not true. Due to the limits of human vision you might easily miss species that are small or not moving, or flying high overhead, among other things. But the thing about a listing protocol is that users should be biased in similar ways each time, but with iNaturalist there’s no protocol and people are using it however they would like. Some people log in once to post a Zebra Swallowtail and then log out forever. Other people are avid life listers posting each new species they see once, or maybe a couple times if they get a better photograph or see a new different-looking subspecies or so forth. So that would automatically bias it towards rarities, I suppose, even if there are some iNat users using iNat with the same protocol as eButterfly. (Even then they would have to be a remarkably quick on the draw photographer to photo every individual of every species they see in an observation period!)

Lastly, I admit I do have a bunch of Cabbage White photos I haven’t uploaded. I haven’t found them to be particularly flighty. Maybe the ones around here are lazy, haha. But I didn’t really feel like spamming iNat with more Cabbage White pics would be of much value. Likewise with dragonflies, I am a bit more spammy but still have not posted every individual Blue Dasher I have seen. So I’m definitely picking and choosing slightly more interesting or rare things (to me) and it makes sense the data in aggregate has a similar bias. I guess I’m just overall surprised the volume of “dedicated” posters (e.g. ones who are most likely to post a Banded Hairstreak or Baltimore Checkerspot) outweighs the volume of “new” posters who are posting big charismatic species that show up on their patio. I think my vision of what gets posted on iNat is skewed by spending too much time giving initial, broad IDs to things with no taxon/unknown assigned, a place where new users tend to congregate due to being unfamiliar with either the UI or the etiquette (that even if you don’t know what you’re looking at at all, it’s best to give some broad ID like ‘insect’ so the insect folks can help you figure out what you’re looking at).

3 Likes

I don’t know the eButterfly protocols, but on eBird we talk about how we are really measuring detectability, not abundance. You mark a checklist complete if you reported all the species that you observed, not all the species that were present. You almost always know that there are birds around which you could not detect! There are ways you can still use the data to say at least something about abundance, but like a lot of science, you need to do some more work to get that from the raw numbers.

That said, it seems to me that comparing data from iNaturalist and eButterfly is great way to learn about the differences between the two platforms, but less useful for understanding the organisms being studied. They have different protocols and different users! I suppose you could control for some of that by only looking at observations from people who use both, but of course that would also introduce additional factors.

4 Likes

You almost always know that there are birds around which you could not detect!

Exactly, on eBird this becomes quite evident when looking at migratory species in US/Canada, the bar charts (which I believe are calculated as a % of lists in a certain area that had the species) are usually a lot higher in spring vs. fall, even though logically we know that there are actually more birds migrating in fall than in spring since the populations are larger immediately after breeding. But the birds are less salient without their breeding plumage and typically not singing either.

I think it’s fine to compare different platforms with different user sets; the variable in common was more just the geographic area, and I think the question they were trying to get at was more like, if you are planning to use iNat vs eButterfly data, how would that impact your research? What biases are different between them? And lastly, likely most interesting for discussion here, how can we improve the quality of data on iNat? (They seemed to focus on iNat more for that since the type of protocol that eButterfly uses is already more researched as to how to use that data.)

1 Like

It would also be interesting to consider those moths that people are prone to mistake for butterflies. in the Pacific Northwest, for instance, “the butterfly that isn’t in the field guide” is usually a day-flying, black-and-white Geometrid.

1 Like

The paper’s fine, but it’s led to some terrible, alarmist clickbait headlines like “When it comes to butterflies, people prefer pretty ones: That’s a problem for scientists” and “Community science has a personal bias problem” (I’m purposely not linking to the articles).

Yes, like any data set iNat data has its biases, and I think it’s great people are exploring them as a way to improve or at least understand them. It’s unfortunate that the findings are being publicized in a way that would make someone who’s skimming headlines dismiss iNat and similar data out of hand.

4 Likes

seems like more of a journalism problem. this week, the trending weird science news topic seemed to be related to cocaine in sharks off of Brazil, and the title for the NYT’s version of the article was “Not Afraid of Sharks? Well, Now They’re on Cocaine”.

1 Like

Yes, it’s not a problem with the authors of the paper and I wasn’t blaming them. As I said, I think it’s great that people are working to understand biases in iNat and other participatory science data.

It’s just frustrating that the headlines are so alarmist. Obviously nuance and headlines are not often compatible, but blowing up the scale of the issue is not necessary.

2 Likes

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.