What would you like to learn about getting data from the system?

For reference, this is my draft outline of the video series that I had in my head (I’m going to try to make this a Wiki so that others can edit it, if they like):

1. Overview of series

  • iNat is primarily for connecting people to nature and creating a community of folks interested in nature, but data is a nice byproduct.
  • Show some interesting / meaningful examples of how people have used the data from iNaturalist
  • Talk about the goals of the series
    • Aim for breadth rather than depth. (It’s simply not possible to cover every way to interact with data.)
    • When talking about tools, focus on ones that are easy to use and/or provide others easy access to things made in those platforms
  • (Overview of the rest of the series)

2. Getting the most out of the web UI

3. Basic ways to download iNaturalist data

  • Sometimes the screens just don’t provide data in the format that you need. But the system also provides some ways to download data so that you can use it in your own creations.
  • Observation CSV download
    • Show how to use
    • Considerations:
      • up to 200,000 observations per request
      • handles a wide variety of filter input parameters and output field options, but not always easy to replicate exact setups (if changing these)
      • can get unobscured coordinates (if you have the right privileges)
      • observation field output (need to have used the observation field before it becomes available)
      • does not handle one-to-many data relationships well (ex. multiple photos per observation)
      • project page variant
  • List taxon downloads
  • Place KML downloads

4. Quick detour: Getting data from GBIF

  • A lot of iNaturalist data is pushed to GBIF, which has some functionality that iNaturalist doesn’t.
  • What gets pushed to GBIF (research grade + properly licensed), and how often is it updated?
  • Extra features of GBIF
    • Has other (non-iNaturalist) data, too
    • Has additional ways to filter data (ex. polygon geographic filtering)
    • Can export more observation-level data
    • Can get a DOI (Digital Object Identifier) for citations
    • More map visualizations (API)

5. Let’s Demystify the API

  • Generally what is an API?
    • Quick definition
    • For iNaturalist, the most important thing is that it standardizes the way the website, Android app, and iOS app interface with stuff on the back end.
    • Primarily it’s there for app development, but it can be used to get data. (Much of what apps do is just to display data.)
  • Pros and cons of API vs CSV downloads
  • Introduce some terms (ex. request, response, endpoint, throttling), but don’t get too technical
  • iNat has 3 APIs - an old Ruby-based API (mostly deprecated), the current (v1) Node-based API, and a (v2) Node-based API under development
    • Generally, we will talk about the (v1) Node-based API because it’s the current one
  • What kind of data is available in the iNat API?
    • Start by showing https://api.inaturalist.org/v1/docs/
      • lists most endpoints (excludes things like computer vision)
      • lists most parameters for each endpoint
      • provides an interface for constructing a request, making a request, and viewing the response
    • Since we’re generally concerned about getting data, we’re mostly interested in GET endpoints
    • Most of the GET endpoints can be accessed anonymously (and should be accessed anonymously)
      • (The GET endpoints that might provide different results if authenticated are Observation endpoints that can provide private coordinates.)
    • Generally, these endpoints will return data in the form of a json file or an image file (map tiles). (The deprecated endpoint also provided a few other output file formats.)
    • Use https://api.inaturalist.org/v1/docs/ to make some API requests and view the results
      • Observation search query (returns json)
        • explain the general structure of most of these json files (includes total_results, results in various formats depending on endpoint, and usually per_page and page)
        • talk about API request limits
          • rate (around 1 request per second, except for autocomplete endpoints)
          • records (usually around 200 per page, up to top 10000 records for a given set of parameters)
      • Observation map query (returns image file)
      • (show the special case of UTFGrids?)
      • Show how to make an authenticated request on the Observation Search endpoint via https://api.inaturalist.org/v1/docs/
    • Go over to the old API docs just to talk about it, and show an example of how it can output stuff to, say, CSV.
  • Point to the Recommended Practices doc (https://www.inaturalist.org/pages/api+recommended+practices) for more guidance.
  • The rest of the series will cover how to use specific tools / platforms to access the API.

6. Getting json data from the API into Excel using Power Query

  • Excel is one of the most popular and mature tools for working with data
  • Power Query (part of Office 365 and available in versions 2010 and up) provides a quick way to import data json data.
  • Go through the process
    • Construct an API request
    • Show how to format the data into a table
    • Show how to refresh the data
    • Show how to change the query
  • Make something interesting with the data (or show something interesting that has been made)
  • Usage notes:
    • this is good for aggregated data (ex. histogram data) and small sets of records that don’t span multiple pages of requests (ex. up to 200 observations)
    • it’s possible to have Excel pull in multiple pages, but this is usually better accomplished in other tools. (So we will not cover this here.)

7. Getting json data from the API into data.world

  • data.world is a promising (online) platform for sharing data and connecting data from various sources.
  • (A free account provides a limited number of data sets and projects, and offers access to a broad catalog of community-created data)
  • Go through the process
    • Construct an API request
    • Show how to format the data into a table file
    • Show how to refresh the data (or set for automatic refresh)
    • Show how to change the query
  • Make something interesting with the data (or show something interesting that has been made)
  • Usage notes:
    • this is good for aggregated data (ex. histogram data) and small sets of records that don’t span multiple pages of requests (ex. up to 200 observations)
    • data can be shared with other data.world users

8. Getting map tiles from the API into ArcGIS Online (AGOL)

  • ArcGIS Online is the cloud variant of the popular ArcGIS platform (think maps).
  • AGOL can be used in a limited way even without an account, but a free Public Account allows you to save and share your creations with the public, provides greater access to a huge catalog of data/layers available in AGOL, and offers limited access to some features available only in the online platform (ex. creating Story Maps).
  • First, quick detour: Provide a basic overview of how tiled maps work
    • why deliver data as tiles?
    • introduce some basic concepts like zoom level, x, y
  • Give a quick overview of the tiles that iNat offers
    • Observation tiles (pins, density grid, heatmap, circles)
    • Place tiles
    • Taxon range tiles
    • Taxon place tiles
    • Note: observation tiles and taxon range tiles have color options
  • Go through the process of getting data from iNaturalist
    • Construct an API request (AGOL uses {level}/{col}/{row})
    • Show how to add data to map
    • Show how to limit layers to certain zoom levels
  • Make something interesting with the data (or show something interesting that has been made)
  • Usage notes:
    • Data automatically refreshes
    • Observation tiles are not interactive by themselves. Making them interactive requires some additional programming using UTFGrids (which will not be covered here).
  • AGOL can also map out data given GPS coordinates or WKT points, but that will not be covered here, since that’s relatively easy to figure out.

9. Getting map tiles from API into QGIS

  • QGIS is a free and powerful GIS application with lots of community-created extensions
  • Show how to add some basemaps
  • Go through the process of adding iNat data
    • Construct an API request (QGIS uses {z}/{x}/{y})
    • Show how to add data to map
    • Show how to limit layers to certain zoom levels
    • Show how to change the change XYZ tileset setup
    • Make something interesting with the data (or show something interesting that has been made)
  • Go through and add some data from GBIF while we’re at it.
    • Construct an API request, etc.
  • Usage notes:
    • Data automatically refreshes
    • Observation tiles are not interactive by themselves. Making them interactive requires some additional programming using UTFGrids (which will not be covered here).
  • QGIS can also map out data given GPS coordinates, but that will not be covered here, since that’s relatively easy to figure out.

10. Getting json data from API for use with R via RStudio (desktop, probably)

11. Getting json data from API for use with Python (via TBD, probably Jupyter)

12. Getting json data from API into Observable (sort of Javascript)

15 Likes