What things are misidentified as large milkweed bug?

alexis18 · May 20, 2020, 2:09am

It’s pretty preliminary, but I’m super excited about this figure!
I’m using images of large milkweed bugs (Oncopeltus fasciatus) in an ecology research project, and our first step was to go through images to confirm IDs.

We’ve processed 9,000+ images and we’ve found very few (<1%) of research-grade observations have been incorrectly IDd (which is amazing!). I made this cool visualization to show ‘Of these incorrectly IDd observations, what were their ultimate identifications?’ The corrected IDs are on the right.

I thought it was a cool graphic, and that some of you might think it was cool too!

astra_the_dragon · May 20, 2020, 4:37am

very cool! thank you for sharing this information. :)

pisum · May 20, 2020, 5:06am

interesting… did you create this visualization as a one-off, or did you hook some code into the API similar species endpoint (http://api.inaturalist.org/v1/docs/#!/Identifications/get_identifications_similar_species) so that a similar visualization could be generated for any taxon?

alexis18 · May 20, 2020, 3:23pm

It’s not the greatest code I’ve ever written, it’s in R, and it works in the current form for any species-level question just by changing the taxon_name and the taxon_id at the top of the script. It only goes through genus-level differences to species, no grouping by higher taxonomic levels. It would probably be pretty straight forward to change it up to other levels though. The large milkweed bug one will be ~marginally~ different because it’s supplemented by our specific corrections.

Tulip poplar:

Honey bee:

#install.packages("tidyverse", "devtools", "networkD3")
#library(devtools)
#devtools::install_github("hrbrmstr/curlconverter")
library(curlconverter)
library(tidyverse)
library(networkD3)
taxon_id <- 47219
taxon_name <- "Apis mellifera"

####API Call####
api <- paste("curl -X GET --header 'Accept: application/json' 
 'http://api.inaturalist.org/v1/identifications/similar_species?    is_change=false&current=true&order=desc&order_by=created_at&taxon_id=", taxon_id, "'", sep="")
my_ip <- straighten(api) %>% 
  make_req()

dat <- content(my_ip[[1]](), as="parsed")
links <- data.frame(target = rep(NA, dat$total_results), value = rep(NA, dat$total_results))
for(i in 1:dat$total_results){
  links$target[i] <- dat$results[[i]]$taxon$name
  links$value[i] <- dat$results[[i]]$count
}

####Formatting####
links %>% 
  separate(target, c("genus", "species")) %>% 
  group_by(genus) %>%
  summarise(n = n(), value = sum(value)) %>%
  filter(n > 1) %>% 
  mutate(target = genus)%>%
  select(-n, -genus) -> genera
genera$source <- rep(taxon_name, nrow(genera))
links$source <- rep(taxon_name, nrow(links))

links %>% 
  separate(target, c("genus", "species"), remove = F) -> links
links$source[links$genus %in% genera$target] <- links$genus[links$genus %in% genera$target]
links <- select(links, -genus, -species)
links <- bind_rows(links, genera)

####SANKEY####
nodes <- data.frame(
  name=c(as.character(links$source), 
         as.character(links$target)) %>% unique()
)

# With networkD3, connection must be provided using id, not using real name like in the links dataframe.. So we need to reformat it.
links$IDsource <- match(links$source, nodes$name)-1 
links$IDtarget <- match(links$target, nodes$name)-1

# Add a 'group' column to each node. Here I decide to put all of them in the same group to make them grey
nodes$group <- as.factor(c("my_unique_group"))

# Make the Network
p <- sankeyNetwork(Links = links,
                   Nodes = nodes,
                   Source = "IDsource",
                   Target = "IDtarget",
                   Value = "value",
                   NodeID = "name", 
                   sinksRight=F,
                   fontSize = 20,
                   iterations = 23,
                   nodeWidth = 5)
p

pisum · May 20, 2020, 4:32pm

the fact that this is coded at all makes this great code, as far as i’m concerned. thanks for sharing!

jlayman · May 20, 2020, 5:37pm

This is fantastic! I had no idea these kinds of charts could be generated in R!

scharf · May 20, 2020, 8:57pm

Interesting that Liriodendron (“tulip tree,” “tulip poplar”) is getting misidentified a certain percentage of the time as actual tulip (Tulipa gesneriana). I wonder how many of those are misclicks or upload problems as opposed to incorrect identifications.

Edited to add that at least the other misidentifications of this species are all trees, except for Hunnemannia.

alexis18 · May 21, 2020, 2:33am

Yeah that’s really interesting. I checked some sea urchin species for that, since I’ve accidentally ID’d porcupine in the past since typing ‘urchin’ into the suggest ID bar provides porcupines as an ID since one of their common names is ‘urchin’, but no dice.

nycbirder · May 21, 2020, 2:46am

@alexis18 Those are amazing visualizations! Nicely done … question: how does one execute an R script against the iNat database?

pisum · May 21, 2020, 6:05am

i’ve never used R before, but here’s what i did to run alexis18’s script and generate the figure:

download and install R from https://cran.r-project.org/. (i use 64-bit Windows. so i downloaded the Windows installer from https://cran.r-project.org/bin/windows/base/ and installed just the core files and 64-bit files.)
download and install RStudio Desktop from https://rstudio.com/products/rstudio/download/.
open RStudio, and go to File > New File > R Script… copy alexis18’s code above and paste it into the new R script window.
change the taxon_id and taxon_name in lines 7 and 8, if you want to run the script for something other than Apis mellifera.
note that the first 3 lines of the code are commented out. (they begin with a #.) these commands only need to be run once, since they’re basically installing some packages that the rest of the script uses. (if you have the packages installed already, you don’t need to run these lines.)
run the first line by highlighting everything on that line except the #, and then click the run button. this will install a bunch of packages and dependencies. it may take a while to download and install everything. so just wait until the stop button in the console window below the script window goes away. there may be some prompts that pop up along the way that you may have to respond to in the console window, too. (as an alternative to running that first line, you can also go to Tools > Install Packages… and install the packages from there.)
once those 3 packages and dependencies are fully installed, highlight everything in the second line except the #, and run that.
then highlight everything in the third line of the script except the #, and run that.
from there, just continue to click the run button to run through the rest of the script.
when you’ve run the last line, the resulting visualization should pop up in a Viewer window in the bottom-right corner of RStudio. you can zoom in on the result to view it in more detail, or you can export it as an image for use outside of RStudio.

the specific part of alexis18’s code that actually gets the data from iNaturalist (via its API) is in the section labeled “API Call”. generally, there appear to be many ways in R to get data from an API. you could do an internet search to learn more, or i’ve also noted some resources (from my own quick internet search) here: https://forum.inaturalist.org/t/inaturalist-user-trends-with-r/12538/15.

alexis18 · May 21, 2020, 2:12pm

Thanks for that guide!

You can also make the changes to lines 7 & 8 and remove the comments (#) on the first three lines then press the ‘Source’ button at the top (it runs all the code in the scripting window, upper right), and then type the letter p in the console (bottom right) and press enter to get the graph to appear in the lower left viewing pane.

alexis18 · May 21, 2020, 2:17pm

You can learn more about how to access APIs in R here: https://rpubs.com/plantagenet/481658 (The way I did it is kind of lazy)

And you can learn more about the API itself here: https://www.inaturalist.org/pages/api+reference

jpsilva · May 21, 2020, 8:13pm

I wonder if there’s a correlation between the use of common names and the number of misidentifications?

scharf · May 21, 2020, 9:46pm

Probably lots. See this thread.

jeanphilippeb · May 22, 2020, 9:47pm

Many Laburnum anagyroides are misidentified as Cassia fistula.

For instance:
https://www.inaturalist.org/observations/46823243

Golden shower tree ; Golden rain tree
Cassia fistula

Golden chain ; Golden rain
Laburnum anagyroides

jeanphilippeb · May 22, 2020, 9:50pm

I have also seen a few Erythrostemon gilliesii (“Yellow bird-of-paradise shrub”) misidentified as Strelitzia (“bird-of-paradise flower”).

Beside their common name, these plants have visually nothing in common. So, the common name confusion is the only explanation for this misidentification.

alexis18 · May 23, 2020, 12:39am

It looks like the most frequent misidentification

alexis18 · May 23, 2020, 12:43am

Interesting!

jeanphilippeb · May 23, 2020, 4:14pm

Thanks a lot for this great analysis!

jeanphilippeb · May 23, 2020, 4:19pm

One of the common name of Caesalpinia pulcherrima is also “red bird of paradise”, which contributes also to the confusion with Erythrostemon gilliesii, competing with Strelitzia. But this may not be the exclusive explation, (1) Caesalpinia pulcherrima and Erythrostemon gilliesii are in the same tribe and (2) there exists a yellow subspecies of Caesalpinia pulcherrima making it visually closer to Erythrostemon gilliesii.

Topic		Replies	Views
iNaturalist Identification Accuracy General	11	1218	March 25, 2024
iNat-Enabled Research: Milkweed Bug Ecology General	3	485	August 1, 2020
Frequent incorrect observations due to specific common names! General	43	1929	January 24, 2024
iNat Visualizations Using R General	12	3221	September 8, 2022
Similar Case of Many Incorrect Bryophyte IDs- Possibly AI? General	6	279	June 21, 2024

What things are misidentified as large milkweed bug?

Related topics