What things are misidentified as large milkweed bug?

It’s pretty preliminary, but I’m super excited about this figure!
I’m using images of large milkweed bugs (Oncopeltus fasciatus) in an ecology research project, and our first step was to go through images to confirm IDs.

We’ve processed 9,000+ images and we’ve found very few (<1%) of research-grade observations have been incorrectly IDd (which is amazing!). I made this cool visualization to show ‘Of these incorrectly IDd observations, what were their ultimate identifications?’ The corrected IDs are on the right.

I thought it was a cool graphic, and that some of you might think it was cool too!

25 Likes

very cool! thank you for sharing this information. :)

3 Likes

interesting… did you create this visualization as a one-off, or did you hook some code into the API similar species endpoint (http://api.inaturalist.org/v1/docs/#!/Identifications/get_identifications_similar_species) so that a similar visualization could be generated for any taxon?

5 Likes

It’s not the greatest code I’ve ever written, it’s in R, and it works in the current form for any species-level question just by changing the taxon_name and the taxon_id at the top of the script. It only goes through genus-level differences to species, no grouping by higher taxonomic levels. It would probably be pretty straight forward to change it up to other levels though. The large milkweed bug one will be ~marginally~ different because it’s supplemented by our specific corrections.

Tulip poplar:

Honey bee:

#install.packages("tidyverse", "devtools", "networkD3")
#library(devtools)
#devtools::install_github("hrbrmstr/curlconverter")
library(curlconverter)
library(tidyverse)
library(networkD3)
taxon_id <- 47219
taxon_name <- "Apis mellifera"

####API Call####
api <- paste("curl -X GET --header 'Accept: application/json' 
 'http://api.inaturalist.org/v1/identifications/similar_species?    is_change=false&current=true&order=desc&order_by=created_at&taxon_id=", taxon_id, "'", sep="")
my_ip <- straighten(api) %>% 
  make_req()

dat <- content(my_ip[[1]](), as="parsed")
links <- data.frame(target = rep(NA, dat$total_results), value = rep(NA, dat$total_results))
for(i in 1:dat$total_results){
  links$target[i] <- dat$results[[i]]$taxon$name
  links$value[i] <- dat$results[[i]]$count
}

####Formatting####
links %>% 
  separate(target, c("genus", "species")) %>% 
  group_by(genus) %>%
  summarise(n = n(), value = sum(value)) %>%
  filter(n > 1) %>% 
  mutate(target = genus)%>%
  select(-n, -genus) -> genera
genera$source <- rep(taxon_name, nrow(genera))
links$source <- rep(taxon_name, nrow(links))

links %>% 
  separate(target, c("genus", "species"), remove = F) -> links
links$source[links$genus %in% genera$target] <- links$genus[links$genus %in% genera$target]
links <- select(links, -genus, -species)
links <- bind_rows(links, genera)

####SANKEY####
nodes <- data.frame(
  name=c(as.character(links$source), 
         as.character(links$target)) %>% unique()
)

# With networkD3, connection must be provided using id, not using real name like in the links dataframe.. So we need to reformat it.
links$IDsource <- match(links$source, nodes$name)-1 
links$IDtarget <- match(links$target, nodes$name)-1

# Add a 'group' column to each node. Here I decide to put all of them in the same group to make them grey
nodes$group <- as.factor(c("my_unique_group"))

# Make the Network
p <- sankeyNetwork(Links = links,
                   Nodes = nodes,
                   Source = "IDsource",
                   Target = "IDtarget",
                   Value = "value",
                   NodeID = "name", 
                   sinksRight=F,
                   fontSize = 20,
                   iterations = 23,
                   nodeWidth = 5)
p
11 Likes

the fact that this is coded at all makes this great code, as far as i’m concerned. thanks for sharing!

4 Likes

This is fantastic! I had no idea these kinds of charts could be generated in R!

1 Like

Interesting that Liriodendron (“tulip tree,” “tulip poplar”) is getting misidentified a certain percentage of the time as actual tulip (Tulipa gesneriana). I wonder how many of those are misclicks or upload problems as opposed to incorrect identifications.

Edited to add that at least the other misidentifications of this species are all trees, except for Hunnemannia.

4 Likes

Yeah that’s really interesting. I checked some sea urchin species for that, since I’ve accidentally ID’d porcupine in the past since typing ‘urchin’ into the suggest ID bar provides porcupines as an ID since one of their common names is ‘urchin’, but no dice.

1 Like

@alexis18 Those are amazing visualizations! Nicely done … question: how does one execute an R script against the iNat database?

i’ve never used R before, but here’s what i did to run alexis18’s script and generate the figure:

  1. download and install R from https://cran.r-project.org/. (i use 64-bit Windows. so i downloaded the Windows installer from https://cran.r-project.org/bin/windows/base/ and installed just the core files and 64-bit files.)
  2. download and install RStudio Desktop from https://rstudio.com/products/rstudio/download/.
  3. open RStudio, and go to File > New File > R Script… copy alexis18’s code above and paste it into the new R script window.
  4. change the taxon_id and taxon_name in lines 7 and 8, if you want to run the script for something other than Apis mellifera.
  5. note that the first 3 lines of the code are commented out. (they begin with a #.) these commands only need to be run once, since they’re basically installing some packages that the rest of the script uses. (if you have the packages installed already, you don’t need to run these lines.)
  6. run the first line by highlighting everything on that line except the #, and then click the run button. this will install a bunch of packages and dependencies. it may take a while to download and install everything. so just wait until the stop button in the console window below the script window goes away. there may be some prompts that pop up along the way that you may have to respond to in the console window, too. (as an alternative to running that first line, you can also go to Tools > Install Packages… and install the packages from there.)
  7. once those 3 packages and dependencies are fully installed, highlight everything in the second line except the #, and run that.
  8. then highlight everything in the third line of the script except the #, and run that.
  9. from there, just continue to click the run button to run through the rest of the script.
  10. when you’ve run the last line, the resulting visualization should pop up in a Viewer window in the bottom-right corner of RStudio. you can zoom in on the result to view it in more detail, or you can export it as an image for use outside of RStudio.

the specific part of alexis18’s code that actually gets the data from iNaturalist (via its API) is in the section labeled “API Call”. generally, there appear to be many ways in R to get data from an API. you could do an internet search to learn more, or i’ve also noted some resources (from my own quick internet search) here: https://forum.inaturalist.org/t/inaturalist-user-trends-with-r/12538/15.

7 Likes

Thanks for that guide!

You can also make the changes to lines 7 & 8 and remove the comments (#) on the first three lines then press the ‘Source’ button at the top (it runs all the code in the scripting window, upper right), and then type the letter p in the console (bottom right) and press enter to get the graph to appear in the lower left viewing pane.

4 Likes

You can learn more about how to access APIs in R here: https://rpubs.com/plantagenet/481658 (The way I did it is kind of lazy)

And you can learn more about the API itself here: https://www.inaturalist.org/pages/api+reference

4 Likes

I wonder if there’s a correlation between the use of common names and the number of misidentifications?

1 Like

Probably lots. See this thread.

Many Laburnum anagyroides are misidentified as Cassia fistula.

For instance:
https://www.inaturalist.org/observations/46823243

Golden shower tree ; Golden rain tree
Cassia fistula

Golden chain ; Golden rain
Laburnum anagyroides

I have also seen a few Erythrostemon gilliesii (“Yellow bird-of-paradise shrub”) misidentified as Strelitzia (“bird-of-paradise flower”).

Beside their common name, these plants have visually nothing in common. So, the common name confusion is the only explanation for this misidentification.

1 Like

It looks like the most frequent misidentification

2 Likes

Interesting!

1 Like

Thanks a lot for this great analysis!

One of the common name of Caesalpinia pulcherrima is also “red bird of paradise”, which contributes also to the confusion with Erythrostemon gilliesii, competing with Strelitzia. But this may not be the exclusive explation, (1) Caesalpinia pulcherrima and Erythrostemon gilliesii are in the same tribe and (2) there exists a yellow subspecies of Caesalpinia pulcherrima making it visually closer to Erythrostemon gilliesii.