iNaturalist Identification Accuracy

A pretty belated update, but a few years ago, I posted about what things get misidentified as large milkweed bugs

Our related research was published recently and is open access:

We went over more than 10,000 records of large milkweed bugs and checked the IDs and scored the images for life stages and numbers, and used those data to understand how life history and phenology for this species varies regionally.

One of our most interesting (to me!) findings was that approximately 98% of the records we evaluated were correctly identified to species level, and when they were incorrect, the incorrect IDs were largely of closely related or visually similar species.

Happy to talk more about any of it, but we are super grateful to the whole iNaturalist community for contributing photos and identifications and the forum for early discussions about this work!


Great stuff, @alexis18!


You listed “iNaturalist Citizen Scientists” as an author on a journal publication? That is very cool and sets a good precedent.


Yes! We got a lot of support in figuring out how we could give appropriate group attribution. Norms are still evolving, but as an author team, we felt it was worth elevating to authorship because the work would have been impossible without the iNaturalist community contributions.


SCGP on the map = South central Great Plains? Weird name for the Chihuahuan and Sonoran deserts (I jest). I love seeing the colorful Lygaeus kalmii and trying to take worthy pictures of them.


Nice! Added here:


We did really struggle with the best ways to delineate ecoregions, and I think one of our take-home messages is that delineating the best level of aggregation for a continentally distributed species can be challenging, but it is important to understand local drivers or regional drivers.

We defined our ecoregions based on Level I and Level II ecoregions used by the United States Environmental Protection Agency (USEPA) and based on the Monarch Watch program waystations, accounting for the number and density of observations to ensure a relatively even sample size across regions. Our SCGP includes the Level I ecoregions North American Desert, Southern Semiarid Highlands, Temperate Sierras, Tropical dry forests, and the Great Plains, so it really is covering a LOT of ecoregion ground.

But, it does line up roughly by which type of milkweed species would be the host:

And this region is the (roughly) where A. asperula and A. linaria would be the dominant species. There wasn’t quite sufficient sample size in Mexico to break out A. linaria is on its own, unfortunately, which would be warranted in follow-ups with more data in that region. Another concern, though, is that as you go south, there is some risk of capturing hybrids between more southern Oncopeltus species. But maybe in a follow-up!


To follow up on this comment (sorry, I haven’t read the paper yet) … how did you define dominant milkweed? If I’m not mistaken, this region has the highest species diversity on your map. My personal experience suggests that A. subvert is similarly widespread and more abundant over much of the region, maybe not all.


Thanks for sharing this. Incredibly fascinating work!


Very cool, kudos!!
I think a lot of people dont realize their are folks on inat that work really hard to maintain a accurate data set for particular species groups.


The figure of the milkweeds is from a different publication not ours, just to clarify so we didn’t do anything new in this work on the host plant side. The host plant side is also where I start to get a bit out of my depth since I am more on the data and informatics side, so I’m checking with the coauthor more familiar with the milkweed species for a more specific answer.

From the data side, the US EPA ecoregions were the primary input into the derived regions we used, and they were merged based on the monarch watch and the milkweed distributions. Another contributor is we did end up without sufficient data depth to model across the whole region we defined, so this was the portion of that range where we were ultimately able to predict: