They compare butterflies reported on iNaturalist with those reported on eButterfly. They are only able to compare the coasts of the mainland US + Canada because there are too few lists posted on eButterfly from the interior of the US and Canada.
Here is the figure showing the over- and under-represented butterflies on iNaturalist:
The abstract:
The volume of and interest in unstructured participatory science data has increased dramatically in recent years. However, unstructured participatory science data contain taxonomic biasesâencounters with some species are more likely to be reported than encounters with others. Taxonomic biases are driven by human preferences for different species and by logistical factors that make observing certain species challenging. We investigated taxonomic bias in reports of butterflies by characterizing differences between a dedicated participatory semi-structured dataset, eButterfly, and a popular unstructured dataset, iNaturalist, in spatiotemporally explicit models. Across 194 butterfly species, we found that 53 species were overreported and 34 species were underreported in opportunistic data. Ease of identification and feature diversity were significantly associated with overreporting in opportunistic sampling, and strong patterns in overreporting by family were also detected. Quantifying taxonomic biases not only helps us understand how humans engage with nature but also is necessary to generate robust inference from unstructured participatory data.
Very interesting stuff! I wonder if Pieris rapae is left out of iNaturalist more because people recognize it as a common introduced species, so people who are interested in posting âcoolâ butterflies/making life lists arenât inclined to post every single one they see. I was kind of shocked not to see Monarchs on the over-represented list for either coast!
Looks like thereâs a real need for some guide resources on some of these more confusing taxa, both on âhow to documentâ and how to identify.
I always find Pieris rapae hard to take a picture of because itâs always flying about, never landing for more than a second, and the same goes for all the yellow butterflies. If they sat on a flower for more than a few seconds, Iâd definitely have more observations of them.
These results largely conform to my own butterfly observations. However, I upload most butterflies I manage to photograph. In my experience, under-represented species are more difficult to photograph (both generally and for specific diagnostic features). Consequently, I have fewer photos of these species and they tend to reach research grade less often. I see Pieridae butterflies all the time, but the lack of observations is logistical rather than preference-oriented.
Oh thatâs really interesting! I would expect the brown ones at the top of the list to be less well-represented, because they are not as easy to spot. (At least, I have not found them as easy to spot.) But maybe thatâs why they are over-represented? If they are less commonly seen, then people might be more likely to photograph every single one they see.
Personally, I am much more likely to bother to photograph a butterfly if itâs one I havenât photographed as much. After half a dozen or so observations, my interest tends to drop off. I tend to do this regardless of how common they are.
I expect that the Colias ones are underrepresented in part because people are not confident they can tell them apart. Taking a photo of something you are pretty sure you canât identify is not as much fun.
I think that is where the human bias may come in, more specifically the âhuman preferences for different speciesâ as they called it. I guess brightly coloured butterflies are more easy to remember, so users looking to find the most different species and not caring much about repeats, might recognise more easily that they already got that species than with a less memorable one.
I think that could cause underrepresentation especially for common species.
For myself, I donât upload into iNat most of the photos I take of the more common butterflies in my area. Once Iâve documented a species in the time/place/activity that matters to ME, then I tend to view additional observation uploads as âredundantâ and mostly skip doing them. Sometimes in iNat I might use the Quantity annotation to note when Iâve seen several of a species, but I donât do that consistently: does any researcher really use the Quantity annotation when compiling iNat data? In eButterfly on the other hand, I do reliably use the Quantity box when I submit observations.
The more I look at the Eastern Region list (the one I live in and occasionally post butterflies from), the more confused I am. Like the authors of the paper, I would have assumed that iNaturalist observations would be highly biased towards eye-catching species: ones that are larger, more colorful, and brightly patterned. Iâd also add that Iâd predict that ones that are found in places with high human population density and that feed on garden plants will be posted more, because for pretty much any taxa that can stand humans, the iNaturalist heatmap of observations is pretty much just a population density map. However, maybe this last âbiasâ is also equally present in eButterfly, and therefore goes unexamined.
This is why the eastern list is so baffling. Just going through my local observations, Iâd have predicted that big swallowtail species and monarchs would be overrepresented, and pretty much everything else underrepresented.
To the contrary, the species they identify as being the most overrepresentative is a dull butterfly the size of a human thumbnail that only reveals its beauty if youâre lucky enough to be right next to it or wielding a telephoto lens. And then the next two most overrepresented are not necessarily easy to identify (in the sense that there are fairly similar species in their range). Then the Baltimore Checkerspot being the 4th most overrepresented made me go âHUH???â because theyâre essentially a northern Appalachian+foothills species that does not occur at great density and does not occur in great numbers in any major east coast urban area. So for the vast majority of iNaturalist users in the region, theyâd have to travel somewhere to observe it, which is just crazy to me in an overrepresented species! I guess the dedicated butterfly posters are outweighing the new users who are posting a swallowtail they found on their zinnias this morning (I donât mean this negatively. These are the types of butterfly posts I upload, not the rarities.)? I would have NEVER guessed that to be the case?? The weird thing about these specific butterflies being overrepresented on iNat compared to eButterfly would mean that the iNat posters seem (by proxy of likely having to travel to find it) to be more âdedicatedâ butterfly seekers or âlife listersâ than eButterfly users are, which I would have definitely guessed to be the other way around.
I always find Pieris rapae hard to take a picture of because itâs always flying about,
Yeah I also definitely lean towards wanting to take/post nice photos but there 100% are some users who will post a white blur with a black dot in it and ID it Pieris rapae. And depending on the region that may be enough to ID it.
I have 100% noticed that dragonflies and birds that land only very rarely are underrepresented on iNat (both by myself and in general), and I guess I never thought about it for butterflies, but makes sense theyâd be the same way.
I havenât looked at the paper, but I am guessing that âover-representedâ is based on the number of observations compared to the population size. So, if there are a ton of observations of Monarchs, but also there are a ton of Monarchs, that would not be over-represented. But if the population is very small, then it would not take many observations for them to be over-representing the species.
In birding, we expect rare species to be over-represented, because people are more likely to go out of their way to report those. For example, iNaturalist has 9 observations of Roseate Terns in Ohio, but they are all of the same bird (population size = 1). There are 6,872 observations of Red-winged Blackbirds, which is only a small fraction of the population size (many hundreds of thousands).
(That reminds me, I never uploaded any of my photos from the second day. Make it 10 observations of Roseate Terns in Ohio. )
Exactly, and what is the population size? The study compares iNat to eButterfly, but eButterfly isnât a perfectly unbiased and comprehensive count of all butterfly populations. With bird populations I think we have a better idea, but in this case I think weâre comparing two sources, each with unknown accuracyâŚ
relatively low iNat observations for species that are hard to photograph and relatively high iNat observations for species that are easy to photograph (because itâs more common for iNat observations vs eButterfly observations to be associated with a photo)
relatively low iNat observations for species that are common in an area (since an observer might make only one observation even if there are many individuals in an area on any given day, and might not spend the time to photograph butterflies that they see all the time)
relatively high iNat observations for species that are hard to identify to species in the field (since iNat has other identifiers to help with identification)⌠although maybe this is offset by different user bases (maybe eButterfly users would be more expert butterfly identifiers to begin with vs iNat users)
I rarely photograph Cabbage Whites because they are so abundant. Monarchs are rare and I pursue to get a photo. Tiger Swallowtails are quite common but very flighty and very difficult to photo unless I am in a particular location where there are oaks.
100% both sources will have biases. Even a survey of butterfly abundance by professional lepidopterist-ecologists will have biases.
Throughout, the paper essentially assumed that having an explicit listing protocol with people marking that they recorded every butterfly they saw would be unbiased, but we know thatâs not true. Due to the limits of human vision you might easily miss species that are small or not moving, or flying high overhead, among other things. But the thing about a listing protocol is that users should be biased in similar ways each time, but with iNaturalist thereâs no protocol and people are using it however they would like. Some people log in once to post a Zebra Swallowtail and then log out forever. Other people are avid life listers posting each new species they see once, or maybe a couple times if they get a better photograph or see a new different-looking subspecies or so forth. So that would automatically bias it towards rarities, I suppose, even if there are some iNat users using iNat with the same protocol as eButterfly. (Even then they would have to be a remarkably quick on the draw photographer to photo every individual of every species they see in an observation period!)
Lastly, I admit I do have a bunch of Cabbage White photos I havenât uploaded. I havenât found them to be particularly flighty. Maybe the ones around here are lazy, haha. But I didnât really feel like spamming iNat with more Cabbage White pics would be of much value. Likewise with dragonflies, I am a bit more spammy but still have not posted every individual Blue Dasher I have seen. So Iâm definitely picking and choosing slightly more interesting or rare things (to me) and it makes sense the data in aggregate has a similar bias. I guess Iâm just overall surprised the volume of âdedicatedâ posters (e.g. ones who are most likely to post a Banded Hairstreak or Baltimore Checkerspot) outweighs the volume of ânewâ posters who are posting big charismatic species that show up on their patio. I think my vision of what gets posted on iNat is skewed by spending too much time giving initial, broad IDs to things with no taxon/unknown assigned, a place where new users tend to congregate due to being unfamiliar with either the UI or the etiquette (that even if you donât know what youâre looking at at all, itâs best to give some broad ID like âinsectâ so the insect folks can help you figure out what youâre looking at).
I donât know the eButterfly protocols, but on eBird we talk about how we are really measuring detectability, not abundance. You mark a checklist complete if you reported all the species that you observed, not all the species that were present. You almost always know that there are birds around which you could not detect! There are ways you can still use the data to say at least something about abundance, but like a lot of science, you need to do some more work to get that from the raw numbers.
That said, it seems to me that comparing data from iNaturalist and eButterfly is great way to learn about the differences between the two platforms, but less useful for understanding the organisms being studied. They have different protocols and different users! I suppose you could control for some of that by only looking at observations from people who use both, but of course that would also introduce additional factors.
You almost always know that there are birds around which you could not detect!
Exactly, on eBird this becomes quite evident when looking at migratory species in US/Canada, the bar charts (which I believe are calculated as a % of lists in a certain area that had the species) are usually a lot higher in spring vs. fall, even though logically we know that there are actually more birds migrating in fall than in spring since the populations are larger immediately after breeding. But the birds are less salient without their breeding plumage and typically not singing either.
I think itâs fine to compare different platforms with different user sets; the variable in common was more just the geographic area, and I think the question they were trying to get at was more like, if you are planning to use iNat vs eButterfly data, how would that impact your research? What biases are different between them? And lastly, likely most interesting for discussion here, how can we improve the quality of data on iNat? (They seemed to focus on iNat more for that since the type of protocol that eButterfly uses is already more researched as to how to use that data.)
It would also be interesting to consider those moths that people are prone to mistake for butterflies. in the Pacific Northwest, for instance, âthe butterfly that isnât in the field guideâ is usually a day-flying, black-and-white Geometrid.
The paperâs fine, but itâs led to some terrible, alarmist clickbait headlines like âWhen it comes to butterflies, people prefer pretty ones: Thatâs a problem for scientistsâ and âCommunity science has a personal bias problemâ (Iâm purposely not linking to the articles).
Yes, like any data set iNat data has its biases, and I think itâs great people are exploring them as a way to improve or at least understand them. Itâs unfortunate that the findings are being publicized in a way that would make someone whoâs skimming headlines dismiss iNat and similar data out of hand.
seems like more of a journalism problem. this week, the trending weird science news topic seemed to be related to cocaine in sharks off of Brazil, and the title for the NYTâs version of the article was âNot Afraid of Sharks? Well, Now Theyâre on Cocaineâ.
Yes, itâs not a problem with the authors of the paper and I wasnât blaming them. As I said, I think itâs great that people are working to understand biases in iNat and other participatory science data.
Itâs just frustrating that the headlines are so alarmist. Obviously nuance and headlines are not often compatible, but blowing up the scale of the issue is not necessary.