Are you able to filter tags (e.g. Caterpillars) in R packages for iNaturalist

marpoulin · March 9, 2021, 4:58pm

I am currently using rinat to get population data for Lasiocampid caterpillars. Is there any way to filter the results to exclude moth observations? If it is not possible for rinat, are there any other R packages that would let me do this?

Thanks :)

jcook · March 9, 2021, 5:11pm

It looks like rinat.get_inat_obs() only exposes a subset of the requests params provided by the Observations API. You can just use httr to create your own requests, and search for only larva observations using observation ‘Life Stage’ annotations. I’m looking over the docs to see what a request for that would look like…

jcook · March 9, 2021, 5:16pm

A request for what you want using httr should look something like this:

library(httr)
r <- GET("https://api.inaturalist.org/v1/observations?taxon_id=56584&term_id=1&term_value_id=6")

Parameters:

Taxon ID 56584 is the Family Lasiocampidae
Term ID 1 is ‘Life Stage’
Term value ID 6 is larval stage

And here’s what that same query looks like in the web UI: https://www.inaturalist.org/observations?order_by=created_at&place_id=any&subview=grid&taxon_id=56584&term_id=1&term_value_id=6

alexis18 · March 9, 2021, 5:24pm

It is possible to do using the API:

term_id=1&term_value_id=6
term id 1 is lifestage, and value 6 is larva

term_id=1&term_value_id=6
value 5 is nymph

term_id=1&term_value_id=4
value 4 is pupa

term_id=1&term_value_id=8
value 4 is juvenile

As far as I’m able to tell, these annotations aren’t passed to GBIF, so you couldn’t get them there or from rgbif.

I usually do this in R by querying the api using curlconverter, getting the total results, then making separate calls to get what I’m interested in. Here’s one for getting the json results for the first 200 observations tagged with pupa.

library(curlconverter)
library(tidyverse)

api <- "curl -X GET --header 'Accept: application/json' 'https://api.inaturalist.org/v1/observations?term_id=1&term_value_id=4&per_page=200'" 
my_ip <- straighten(api) %>%
   make_req()
dat <- content(my_ip[[1]](), as="parsed")

Since the API results are capped at 200, I usually then take the total results and use a for loop to page through it (I start with 2 because I usually get what I need from the first call for the first 200):

for (i in 2:ceiling(dat$total_results/200)){
  api <- paste("curl -X GET --header 'Accept: application/json' 'https://api.inaturalist.org/v1/observations?term_id=1&term_value_id=4&per_page=200",
               "&page=",
               i,
               "'",
               sep = "")
  
  my_ip <- straighten(api) %>%
    make_req()
  results <- content(my_ip[[1]](), as="parsed")
  
INSERT HERE THE STUFF YOU WANT TO DO WITH THE API RESULTS

  Sys.sleep(1)
 }

Then there’s some finagling with the .json file to get what you’re interested in, but that depends on your project goals. But this style of script you can paste in the curl call from the API https://api.inaturalist.org/v1/docs/ for whatever search terms you’re looking for.

alexis18 · March 9, 2021, 5:31pm

Just a note, you could technically instead exclude anything tagged with adult with the without_term_id= and without_term_value_id= but I would discourage this for analysis since this will also include adults that just don’t have that annotation yet and the annotation depth for most taxa is sparse so you’d need to manually confirm if you wanted to be confident this only had caterpillers

marpoulin · March 9, 2021, 8:51pm

And is there any way to get these results to get exported to excel like in rinat?

jon_sullivan · March 9, 2021, 10:07pm

I use the fromJSON function in the jsonlite R package to achieve the same thing as @jcook with httr and @alexis18 with curlconverter. I’m not sure about the relative merits of each package. They all do the job of accessing the iNat API.

I use loops like @alexis18 mentions to pull in all of the results, but I’m careful not to loop through a massive set of results. iNat warns that the API is supposed to be for supporting app development, not scraping data. I tend to add an if(){} line to my code to prevent the loop from running if my search results are more than 10 pages. That forces me to double think whether I really need to download all of that or if there is a smarter way of getting just what I need.

write.csv will output a spreadsheet that can be opened in Excel. However, there’s a problem in that some of the columns from the API will contain lists (eg one observation, and one row of data, will contain a list of all photos for an observation). You’ll need to decide what you want to do with those to flatten everything down into one spreadsheet. The simplest thing to do is just exclude the columns with lists (like identifications, comments, and photos) and use write.csv on the rest. When I need everthing, I typically expand the lists out into their own spreadsheets for export, linked to each other by the observation id.

Here’s an example of how I expand out comments.

# sort from obs with the most comments
iNatObs <- iNatObs [order(iNatObs$comments_count, decreasing=TRUE),]

# loop through extracting all comments from each observation
all_comments <- cbind(data.frame (obs.id = iNatObs$id [1]), as.data.frame(iNatObs$comments[1]))
for (i in 2:nrow(iNatObs)){
	if (!is.null(unlist(iNatObs$comments[i]))){
	all_comments <- rbind.fill (all_comments, cbind(data.frame (obs.id = iNatObs$id [i]), as.data.frame(iNatObs$comments[i])))
	}
}

(That code uses the plyr package for the rbind.fill function.)

alexis18 · March 9, 2021, 10:23pm

Yes @jon_sullivan makes a good point, and it’s also an important reason to include the Sys.sleep() part of the loop because it makes sure you’re not sending too many requests per minute with big calls. I typically know what I’m looking for when I use this method, and I use it when I already have made sure I can’t get the data from GBIF (e.g., when I need annotations like these that aren’t available, or when I need needs-id or casual observations)

lotteryd · March 9, 2021, 10:52pm

Meanwhile, there are about 27K un-annotated Lasiocampidae records currently, a chunk of which are larvae. @marpoulin, are you looking globally or is there a specific region of interest? I have been doing some lep life stage annotation lately and can go through a more specific set if you are interested, so that you will get more hits on the caterpillars.

marpoulin · March 10, 2021, 1:40am

I’m looking at global data. I’m trying to get all the global Lasiocampidae into an excel file for analysis. That would be lovely, thanks.

pisum · March 12, 2021, 6:55am

also see this post and the related thread: https://forum.inaturalist.org/t/using-r-to-extract-observations-of-a-specific-phenological-state/7007/6

system · May 11, 2021, 6:56am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Using R to extract observations of a specific phenological state General	7	2520	December 5, 2019
Code to extract annotations from exported JSON General	40	2863	September 9, 2022
Quickly find caterpillar images General	14	1624	June 26, 2021
Filter by annotations? General	5	169	February 19, 2025
How to select observations with 'life stage' tag = larva in a project? General web , question , projects	2	520	January 28, 2023

Are you able to filter tags (e.g. Caterpillars) in R packages for iNaturalist

Related topics