R scripts to help analyse iNat content

is there a forum or location for iNat programmers to share R scripts?
Recently a bioblitz was held locally and i was asked if there was a way to track volunteer effort.

My R scripts are rusty but i do know that it is possible to access the iNat dbase directly (as opposed to downloading all the records associated with the project).

Does anyone have an existing script, or have one that could be modified, or is interested in writing/sharing a new script?

STEPS
Access list of iNat memberIDs associated with a project
Determine the start and end time of the DateObserved associated with their observations
Determine the start and end time of the DateAdded associated with their observations
Once the basic script works then it could be revised as people may have collected observations on different days and they may have uploaded observations in spurts.

Second part could look at effort from identifiers.

Creating such a script would be useful to many groups in the iNat community.

Thanks
Mary

6 Likes

No, but at one point I suggested a wiki to hold tools/code etc (not just R) be created to centralize this stuff, so since no one else took it on:

https://forum.inaturalist.org/t/wiki-external-code-tools-etc-for-working-with-inat/15906/2

3 Likes

I don’t know that they are gathered in one spot yet, but here are a number of topics that discuss R script uses.

https://forum.inaturalist.org/search?q=R%20script

In the meantime, Perhaps, you can message one of the posters and discuss your interests?

1 Like

Since the iNat code itself is stored on GitHub, why not a GitHub repository? Code can be forked easily and improved by others in the community.

2 Likes

Welcome to the Forum, @mkeaveney

2 Likes

What would be nice is a CRAN package for porting iNat data into R. Analogous packages exist for various other online databases, e.g. federal census data, USGS land cover data. That would be a worthwhile project for somebody to take on.

1 Like

i thought that there was a CRAN option. similiar to eBird, GBIF, OBIS but i might be out of date.

I have an R package on GitHub that I have not yet put on CRAN as I finalize more functions, but that can port iNat data into R. The function and API arguments are here with some examples at the end: https://rdrr.io/github/pjhanly/iNatTools/man/iNat.html

This function will run an API query for observations (up to the 10,000 limit) and put them into a data frame in R. I have a number of other draft functions if there is something more specific that someone requires.

Installation:
install.packages(“remotes”)
library(remotes)
install_github(“pjhanly/iNatTools”)
library(iNatTools)

Examples:
Fetch all bird observations within a kilometer of the Empire State Building:
df <- iNat(taxon_id = 3, lat = 40.748424, lng = -73.985698, radius = 1)

Fetch all observations of Gastropoda (47114) for a user_id (473359):
df <- iNat(user_id = 473359, taxon_id = 47114)

Fetch all observations of flowering dicots for a project:
df <- iNat(project=“golden-ears-provincial-park”, taxon_id = 47124, term_id = 12, term_value_id = 13)

Fetch deer (Cervidae) observed on 2019-06-19:
df <- iNat(taxon_id = 42158, observed_on = “2019-06-19”)

Fetch all observations created for a project for the first 3 days of May 2019:
df <- iNat(project=“bowerbird”, created_d1=“2019-05-01”, created_d2=“2019-05-03”)

Fetch all observations of Felidae that have IUCN endangered status and are Research Grade:
df <- iNat(csi = “EN”, taxon_id = 41944, quality_grade = “research”)

Fetch all monarch butterfly observations within a specific bounding box of latitude and longitude:
df <- iNat(taxon_id = 48662, nelat = 40, nelng = -94, swlat = 39, swlng = -95)

2 Likes

Another function I put on GitHub is one to sum the effort of observers within a data frame (e.g., a project): https://rdrr.io/github/pjhanly/iNatTools/man/sampling_effort.html

This function takes the data frame from above and sums the total time observing and distance traveled per observer for each of their observing events. It has an argument for the max time limit between observations to be included in the same observing event (default = 30 min) as well as the max distance jumps and distance traveled per minute (to limit false distance effort from GPS inaccuracy and from something like someone hopping in a car). Certainly a lot of corner cases that this won’t be perfect for, but it is a reasonable start at determining observer effort.

This also sums up observations by major taxon per observer if you wanted to account for bias in observers preferentially recording certain taxa like birds, plants, etc.

2 Likes

See related discussion here: https://forum.inaturalist.org/t/using-r-to-extract-observations-of-a-specific-phenological-state/7007

1 Like

Awesome work, Patrick! Do you foresee increasing the 10K download limit? Another feature that I could imagine being useful is some sort of randomized subsampling approach that would allow you to use broad query parameters without ending up with an unwieldy amount of data. For example, I might have a question that pertains to all flowering plant observations in North America, but maybe downloading 5% of the observations at random would be sufficient to detect the patterns of interest without crashing my laptop. Just a thought. Keep up the good work!

1 Like

this is a limit imposed by iNaturalist. it’s possible to creatively work around this limitation, but…

depending on what patterns you’re looking for, you might be able to find the patterns by getting aggregated data from iNaturalist, as opposed to individual observations.

1 Like

Hello.
In my thread https://forum.inaturalist.org/t/wild-boar-alert-project-for-visually-impaired-persons/15774 I try to describe Wild boars alerting system for visually impaired persons using iNaturalist data. Unfortunately I am not a programmer myself but I like to find and play with OpenSource software as a building blocks to create useful projects.
Would it be possible to have a function which would fetch records from the iNaturalist API into data frames along a route from point A to point B using for example https://openrouteservice.org or https://github.com/graphhopper/graphhopper OpenStreetMap routing engines? This data could be used as an additional layer on the map, but this would be a topic for another thread or forum.

Sorry I missed the responses since I haven’t logged into the forum.

As pisum mentioned, the 10K download limit is imposed by the API. This is per call so you can break a larger call into smaller chunks if there is a logical way (like if each location has <10K observations). I also do have some code that will automatically do larger downloads (>10k obs, not requests) by indexing by the observation ID#. Something like all flowering plants in N.A. would far exceed the daily limits for the API, however.

For that type of question you may be more interested in using the GBIF data set of Research Grade observations: https://www.gbif.org/dataset/50c9509d-22c7-4a22-a47d-8c48425ef4a7

It would depend on if you needed the non-RG observations or other data about observations that isn’t exported to GBIF. The random sampling download is an interesting idea.

I’d like to see that. I have been working on an API calling code, but I have gotten stuck on how to have it change the end point automatically as it prepares for the next call.

The API Recommended Practices describes the following method in the Pagination section:

One way to use the API and fetch more than 10k records is to sort by id ascending (e.g. &order_by=id&order=asc) and use the id_above parameter set to the ID of the record in the last batch. id_above is a search parameter, so this is essentially the same recommendation as above to change search parameters, but by using a parameter that doesn’t affect the nature of the results