What would you like to learn about getting data from the system?

pisum · January 4, 2021, 12:49am

A while back, I had contemplated a series a videos to help people learn how to work with data in the system (https://forum.inaturalist.org/t/wiki-external-code-tools-etc-for-working-with-inat/15906/13).

A week or two ago, I started working on a more detailed overall outline of what I thought would be useful to cover in such a series. My plan was to first create some tutorials in the Forum for each of the topics, and then use those tutorials as the basis for companion video tutorials.

What I’m realizing now is that this is potentially a lot of work, and more importantly, what I think might be useful to cover might not be what folks are actually interested in learning. So I’d like to hear what you all are actually interested in learning about when it comes to getting data from the system.

Some questions:

What are some examples of data from the system that you consume? How do you currently consume it? Are there data that you can’t figure out how to get out of the system?
Are there any particular tools / languages that you’d like to learn to use to work with iNaturalist data?
Do you prefer to learn this kind of information via text & pictures, sample projects + code, and/or videos? or some other format / method?
Do you have any particular knowledge or sample tutorials, projects, code to share that might be useful for others to learn from?
etc…

pisum · January 4, 2021, 12:49am

For reference, this is my draft outline of the video series that I had in my head (I’m going to try to make this a Wiki so that others can edit it, if they like):

1. Overview of series

iNat is primarily for connecting people to nature and creating a community of folks interested in nature, but data is a nice byproduct.
Show some interesting / meaningful examples of how people have used the data from iNaturalist
Talk about the goals of the series
- Aim for breadth rather than depth. (It’s simply not possible to cover every way to interact with data.)
- When talking about tools, focus on ones that are easy to use and/or provide others easy access to things made in those platforms
(Overview of the rest of the series)

2. Getting the most out of the web UI

For a lot of people, the website provides everything they need to get all the data they will ever need.
Goal 1: Share tips for using the screens better (with emphasis on Explore screen, whose basic functions are covered well in https://forum.inaturalist.org/t/the-inaturalist-explore-page-an-introduction/12966)
- Pin types and colors
- Notes on obscured and private data
- What kind of data is excluded by default from the Explore screen? (casual & spam, plus non-mappable in the maps)
- Location vs. place filtering, filtering by bounding box, filtering by radius
- Enhanced filter interface in the Identify screen
- Filtering using parameters in the URL (https://forum.inaturalist.org/t/how-to-use-inaturalists-search-urls-wiki/63, https://forum.inaturalist.org/t/how-to-use-inaturalists-search-urls-wiki-part-2/18792), including:
  - exclusion filters
  - filter by annotations
  - filter by observation fields
Goal 2: Provide a broad overview of other data-related stuff available in the system
- Taxon screen stuff
  - seasonality, phenology, and related graphs
  - taxonomy
  - enhanced image viewer with filters
  - enhanced map (which can be viewed in a standalone version - ex. https://www.inaturalist.org/taxa/map?taxa=3,4#2/0/0)
- Heat map (https://www.inaturalist.org/observations/map)
- Experimental Compare Tool (https://www.inaturalist.org/observations/compare)
- Life list, plus old lists & filtering by lists
- Project stats
- Year in Review

3. Basic ways to download iNaturalist data

Sometimes the screens just don’t provide data in the format that you need. But the system also provides some ways to download data so that you can use it in your own creations.
Observation CSV download
- Show how to use
- Considerations:
  - up to 200,000 observations per request
  - handles a wide variety of filter input parameters and output field options, but not always easy to replicate exact setups (if changing these)
  - can get unobscured coordinates (if you have the right privileges)
  - observation field output (need to have used the observation field before it becomes available)
  - does not handle one-to-many data relationships well (ex. multiple photos per observation)
  - project page variant
List taxon downloads
Place KML downloads

4. Quick detour: Getting data from GBIF

A lot of iNaturalist data is pushed to GBIF, which has some functionality that iNaturalist doesn’t.
What gets pushed to GBIF (research grade + properly licensed), and how often is it updated?
Extra features of GBIF
- Has other (non-iNaturalist) data, too
- Has additional ways to filter data (ex. polygon geographic filtering)
- Can export more observation-level data
- Can get a DOI (Digital Object Identifier) for citations
- More map visualizations (API)

5. Let’s Demystify the API

Generally what is an API?
- Quick definition
- For iNaturalist, the most important thing is that it standardizes the way the website, Android app, and iOS app interface with stuff on the back end.
- Primarily it’s there for app development, but it can be used to get data. (Much of what apps do is just to display data.)
Pros and cons of API vs CSV downloads
Introduce some terms (ex. request, response, endpoint, throttling), but don’t get too technical
iNat has 3 APIs - an old Ruby-based API (mostly deprecated), the current (v1) Node-based API, and a (v2) Node-based API under development
- Generally, we will talk about the (v1) Node-based API because it’s the current one
What kind of data is available in the iNat API?
- Start by showing https://api.inaturalist.org/v1/docs/
  - lists most endpoints (excludes things like computer vision)
  - lists most parameters for each endpoint
  - provides an interface for constructing a request, making a request, and viewing the response
- Since we’re generally concerned about getting data, we’re mostly interested in GET endpoints
- Most of the GET endpoints can be accessed anonymously (and should be accessed anonymously)
  - (The GET endpoints that might provide different results if authenticated are Observation endpoints that can provide private coordinates.)
- Generally, these endpoints will return data in the form of a json file or an image file (map tiles). (The deprecated endpoint also provided a few other output file formats.)
- Use https://api.inaturalist.org/v1/docs/ to make some API requests and view the results
  - Observation search query (returns json)
    - explain the general structure of most of these json files (includes total_results, results in various formats depending on endpoint, and usually per_page and page)
    - talk about API request limits
      - rate (around 1 request per second, except for autocomplete endpoints)
      - records (usually around 200 per page, up to top 10000 records for a given set of parameters)
  - Observation map query (returns image file)
  - (show the special case of UTFGrids?)
  - Show how to make an authenticated request on the Observation Search endpoint via https://api.inaturalist.org/v1/docs/
- Go over to the old API docs just to talk about it, and show an example of how it can output stuff to, say, CSV.
Point to the Recommended Practices doc (https://www.inaturalist.org/pages/api+recommended+practices) for more guidance.
The rest of the series will cover how to use specific tools / platforms to access the API.

6. Getting json data from the API into Excel using Power Query

Excel is one of the most popular and mature tools for working with data
Power Query (part of Office 365 and available in versions 2010 and up) provides a quick way to import data json data.
Go through the process
- Construct an API request
- Show how to format the data into a table
- Show how to refresh the data
- Show how to change the query
Make something interesting with the data (or show something interesting that has been made)
Usage notes:
- this is good for aggregated data (ex. histogram data) and small sets of records that don’t span multiple pages of requests (ex. up to 200 observations)
- it’s possible to have Excel pull in multiple pages, but this is usually better accomplished in other tools. (So we will not cover this here.)

7. Getting json data from the API into data.world

data.world is a promising (online) platform for sharing data and connecting data from various sources.
(A free account provides a limited number of data sets and projects, and offers access to a broad catalog of community-created data)
Go through the process
- Construct an API request
- Show how to format the data into a table file
- Show how to refresh the data (or set for automatic refresh)
- Show how to change the query
Make something interesting with the data (or show something interesting that has been made)
Usage notes:
- this is good for aggregated data (ex. histogram data) and small sets of records that don’t span multiple pages of requests (ex. up to 200 observations)
- data can be shared with other data.world users

8. Getting map tiles from the API into ArcGIS Online (AGOL)

ArcGIS Online is the cloud variant of the popular ArcGIS platform (think maps).
AGOL can be used in a limited way even without an account, but a free Public Account allows you to save and share your creations with the public, provides greater access to a huge catalog of data/layers available in AGOL, and offers limited access to some features available only in the online platform (ex. creating Story Maps).
First, quick detour: Provide a basic overview of how tiled maps work
- why deliver data as tiles?
- introduce some basic concepts like zoom level, x, y
Give a quick overview of the tiles that iNat offers
- Observation tiles (pins, density grid, heatmap, circles)
- Place tiles
- Taxon range tiles
- Taxon place tiles
- Note: observation tiles and taxon range tiles have color options
Go through the process of getting data from iNaturalist
- Construct an API request (AGOL uses {level}/{col}/{row})
- Show how to add data to map
- Show how to limit layers to certain zoom levels
Make something interesting with the data (or show something interesting that has been made)
Usage notes:
- Data automatically refreshes
- Observation tiles are not interactive by themselves. Making them interactive requires some additional programming using UTFGrids (which will not be covered here).
AGOL can also map out data given GPS coordinates or WKT points, but that will not be covered here, since that’s relatively easy to figure out.

9. Getting map tiles from API into QGIS

QGIS is a free and powerful GIS application with lots of community-created extensions
Show how to add some basemaps
Go through the process of adding iNat data
- Construct an API request (QGIS uses {z}/{x}/{y})
- Show how to add data to map
- Show how to limit layers to certain zoom levels
- Show how to change the change XYZ tileset setup
- Make something interesting with the data (or show something interesting that has been made)
Go through and add some data from GBIF while we’re at it.
- Construct an API request, etc.
Usage notes:
- Data automatically refreshes
- Observation tiles are not interactive by themselves. Making them interactive requires some additional programming using UTFGrids (which will not be covered here).
QGIS can also map out data given GPS coordinates, but that will not be covered here, since that’s relatively easy to figure out.

10. Getting json data from API for use with R via RStudio (desktop, probably)

R is a powerful platform for data analysis and programming, used by lots of data scientists, with lots of packages for doing all sorts of things
Talk about different ways to fetch data
Mention rinaturalist (which is based on the deprecated API) and hanly’s iNatTools package (which is based on the v1 API)
Go through the process of fetching iNat data
- Construct an API request
- Show how to put the data into a data frame
- Show how to construct a loop to get multiple pages of information
Make something interesting with the data (or show something interesting that has been made). possible examples:
- https://forum.inaturalist.org/t/what-things-are-misidentified-as-large-milkweed-bug/12571
- https://forum.inaturalist.org/t/inaturalist-visualization-what-introduced-species-are-in-my-place/12889

11. Getting json data from API for use with Python (via TBD, probably Jupyter)

Python is one of the most popular programming languages in the world
Go through the process of fetching iNat data
- Construct a basic API request
- Talk about PyiNaturalist (https://github.com/niconoe/pyinaturalist, which can hit both Ruby and Node APIs)
- Show how to get data (probably via PyiNaturalist, to simplify things), and view the contents of the results
Make something interesting with the data (or show something interesting that has been made). Possible examples:
- https://forum.inaturalist.org/t/tool-for-exporting-inaturalist-data-to-irecord/19160

12. Getting json data from API into Observable (sort of Javascript)

Observable is an interesting online notebook-style platform for coding(?) that is focused on modularity and sharability (is that a word?)
Go through the process of fetching iNat data
- Construct a basic API request
- Show how to view the results
- Construct a loop to get multiple pages of data
Make something interesting with the data (or show something interesting that has been made). Possible examples:
Usage notes:
- Observable generally updates instantly with each interaction. So must be careful about setting up input fields for typing in API request fields (ex. https://observablehq.com/@kueda/inat-observations-histogram-to-csv), since a request could be made with each keystroke.
Discuss other ways to work with Javascript (including a plug for inaturalistjs)

astra_the_dragon · January 4, 2021, 1:10am

I would absolutely be interested in something like this. I don’t currently use iNat in a professional workplace, but I would like to further my abilities to use its data.

I’m a beginner when it comes to data consumption. I know computers fairly well, but only a small amount of code.
The language that would probably be most useful to me is SQL. While I am fine parroting someone else’s actions/code, I’d prefer to take a dedicated beginners’ course before actually attempting to use it, or any programming language.
As for tools, I don’t even know what there is. the API, GIS. and… Microsoft Excel??? lol
You introduced me to data.world, which is great, but I’ve been getting help from a friend to work with it (she’s an actual programmer, unlike yours truly)

I think it is important to note which classes are intended for what audiences. There are levels of technical competence and experience, as well as, independently, levels of familiarity with iNat.
Also, would someone have to watch all earlier lessons to make sense of later ones? Some? Or would each be independent?

pisum · January 4, 2021, 2:03am

i’ll address some of your points after others have weighed in, but for now, i’d just like to address one thing.

we don’t actually have direct access to run queries against iNaturalist’s database as far as i know, but if you get the data into a database or something like data.world, then you can definitely run SQL queries against that data.

different implementations of SQL may vary slightly in their syntax and capabilities, but data.world’s flavor of SQL does cover most standard query functionality with relatively standard syntax. if you want to learn SQL the data.world way, they actually have a pretty good tutorial for this. Go to https://docs.data.world/tutorials/, and select the SQL tutorial to start learning.

sbushes · January 4, 2021, 2:22am

Ooh, Observable looks nice! I hadn’t come across this yet. They look much slicker than Jupyter Notebooks. I’d probably watch a tutorial on that, if I don’t already investigate under my own steam.

Videos would be nice I think as a format - easier to digest, jump around, watch at speed of your choosing. Though it could also be good to do a Jupyter Notebook or Observable which is a basic walkthrough and interactive tutorial in and of itself. Personally, I rarely find passive learning sticks or engages me as much as actively doing something and making mistakes, etc…

I was thinking of doing a video to accompany the Python notebook I made - but if I did it would be likely aimed at UK users, so might be more niche.

muir · January 4, 2021, 2:47am

Thanks pisum, I would really appreciate something like this, especially:

5. Let’s Demystify the API

I have an interest in working with iNat data and have done some exploratory analyses (see below), but feel like an ignoramus when it comes to the basics of API and coding programs, and end up doing a lot of inefficient, time-consuming things because of my own limitations. What you’re proposing (if I could keep up!) seems like it would really be helpful. I’ve asked for help before, but I’m such a beginner when it comes to Ruby, Python, etc, that it’s a challenge to ask people for such basic assistance.

In terms of “What are some examples of data from the system that you consume? How do you currently consume it?”

I’ve manually copied data into Excel using the observation search tool, filtering by place, taxon, date observed, date added. One of the metrics I like to look at it is # observations/newly recorded unique taxon. I’ve done that for

my own account (The hunt for 5,000 taxa before 20,000 observations)
the state of Alaska (Alaska iNat passes 100k observations & 5k unique taxa)
the DC metro area (DC Metro iNat data: 2017 and observations per new species)

Those journal posts capture some of my other interests – forecasting predictions, comparing iNat records to external species lists, regression analysis to explain differences between places. I would like to find a more dynamic way to have and update this data, and learn how to avoid the hours of manual copy and paste that I’ve been doing. There’s more re “Are there data that you can’t figure out how to get out of the system?” but I’ll stop there.

bouteloua · January 4, 2021, 2:59am

I like written tutorials with step-by-step instructions and graphics. When a tutorial is only a recorded video, I get frustrated by the pause/play/pause/play and difficulty jumping around or searching for the exact snippet I need. Videos can be nice for intro/basic explanatory background on a subject though.

I typically work through tutorials with the data I’m actually interested in, so I don’t tend to use example projects/sample datasets.

pisum · January 4, 2021, 5:59am

this is good information, with good examples of use cases.

i’ll respond to more of what you talked about in your post at some point in the future, but i just want to provide you some quick information that may help speed up your data gathering process sooner than later.

in general, it looks like you’re gathering 3 kinds of data:

counts of observations
counts of leaf taxa
counts of unique observers

because of the nature of your data, you actually have 3 methods available to you to get some or all of this data.

get counts from the system for each time period, one time period at a time. this is what you’re doing manually, and there are ways to use the API – even within Excel – to speed up this process.
download observations (with observation id and observer fields) into CSV, and use Excel’s PowerPivot functionality (available in 2013 and up, plus 365) to do a count of observations by time period, plus a distinct count of observers by time period. this method is only available in cases where the total number of observation for all periods that you’re interested in totals <= 200000 (the CSV export limit), or else you’ll have to do multiple CSV downloads and merge them together. that method starts to get unwieldy for really large sets of data, and it doesn’t handle counts of leaf taxa, since that’s a complicated kind of aggregation that i don’t Excel handles without jumping through a lot of hoops.
for the counts of observations, you can actually get a whole set of counts for sequential time periods in one request, using the observation histogram API endpoint. (that endpoint is what provides the data for the seasonality, phenology, etc. graphs on the taxon pages, and this method is the most efficient for getting counts of observations over time.)

so here’s a quick summary of the data types and methods available to gather that data:

count of	method 1	method 2	method 3
observations	Y	Y	Y
leaf taxa	Y	n	n
observers	Y	Y	n

i’ve sort of described method 2 in all the necessary detail above. i’ll cover method 1 in more detail later when i have more time, since it requires some screenshotting and such, but let me just provide a few quick notes about method 3…

if you go over to https://api.inaturalist.org/v1/docs/#!/Observations/get_observations_histogram, you’ll find an interface that provides information about the Observation Histogram API endpoint. you can see what parameters are available to the endpoint – more or less all the parameters available in the observation search (Explore page), plus additional parameters to specify which time intervals you want (day, month, year, etc.), and what kind of date you want to use as your interval (create date or submit date). you can fill in parameters values and use the interface to run and view the results.

so for example, if i wanted to recreate the first chart in your Alaska journal example, i could fill in place_id=6 (Alaska), d1=2009-01-01 (start at 2009), interval=year, date_field=observed (i think that’s what you used), and verifiable=true. then when i click the “Try it out!” button, the interface there will give me a request string like this: https://api.inaturalist.org/v1/observations/histogram?verifiable=true&place_id=6&d1=2009-01-01&date_field=observed&interval=year, and it’ll also show me the results in a nice format:

{
  "total_results": 13,
  "page": 1,
  "per_page": 13,
  "results": {
    "year": {
      "2009-01-01": 1344,
      "2010-01-01": 1964,
      "2011-01-01": 1380,
      "2012-01-01": 4580,
      "2013-01-01": 2079,
      "2014-01-01": 3385,
      "2015-01-01": 4963,
      "2016-01-01": 8345,
      "2017-01-01": 21374,
      "2018-01-01": 25520,
      "2019-01-01": 36373,
      "2020-01-01": 39384,
      "2021-01-01": 193
    }
  }
}

alternatively, you could use something that i made back in the day to get the same results with a few extras (a bar graph, plus ability to export a CSV, etc.): https://jumear.github.io/stirfry/iNatAPIv1_observation_histogram.html?verifiable=true&place_id=6&d1=2009-01-01&date_field=observed&interval=year. (notice how this page uses the same parameters – everything after the ? in the URL – as the request string for the API.)

in the future, i’ll describe how you could make the same observation histogram API request in Excel using Power Query, but i won’t get into that yet, since these other options are probably easy enough to use for now. (if you wanted to do a little internet research on your own, you could probably find existing tutorials that would tell you how to make the API request above from within Excel + Power Query.)

but i do intend to come back sooner (within the next few days) and describe how to use Excel + API to accomplish method 1. so stay tuned for that…

kiwifergus · January 4, 2021, 6:27am

I would like to register a semi-interest. 2 years ago I would have been champing at the bit for an introduction like this but life events since have diminished the need. I would be interested purely out of a curiosity at what might have been possible.

krancmm · January 4, 2021, 6:46am

@pisum What a generous offer you’ve made. Thanks for even considering doing this. Sign me up!

My data use is primarily to incorporate a bunch of fields from observations of Canada & US moths that gets fed into the Moth Photographers Group mapping component along with data from other sources. Darwin Core doesn’t contain all that I need so GBIF is a no-go. Neither the creaky old CVS Export nor the convoluted URL work-around searches are of use for that purpose either.

I now use the API exclusively to download to Excel, GETs of observations and identifiers. I would be interested in learning more about the API, and non-Excel API options as keeping track of where I am when appending all those 200 record blocks in Excel can be error prone. I’m totally clueless about the mapping part of the API so especially would like instruction on API mapping and its use in ArcGIS or QGIS.

Although your 10, 11, 12 items sound like Martian to me right now, they may be the answer to issues I didn’t know I had…perhaps the recent requests I’ve had from 2 state-level wildlife managers and the US-wide Invasive Mapping Network to collaborate might be best served by data.world or Observable.

How I’d like the instruction is any or all of the possibilities listed. When I teach myself something I use all available resources so whatever is easiest for you would suit me, although I admit to a preference for sample projects + code or videos (with necessary text and picture backup).

Sorry if I seem to want it all but your offer is very exciting to me.

williampaulwhite17 · January 4, 2021, 8:33am

I’m speaking as someone using this mostly for personal reasons, who likes getting out in nature…but isn’t really interested in learning programming at all.

I would like to learn a simple way to get observations into Excel.

mrtnlowr · January 4, 2021, 3:07pm

Wow! That’s an ambitious series of tutorials and just what I need. Count me in!

I’m currently using QGIS and RStudio for mapping distributions and modelling the effects of climate change. As a supplement to my own observations INat is a impressive source of presence records for this kind of work. I’d be particularly interested in sections 5, 9 and 10 and, like bouteloua, would find them easiest to follow as " written tutorials with step-by-step instructions and graphics".

Thank you.

michaelpirrello · January 4, 2021, 3:58pm

Oh, so much wonderful about this post.

My data contribution/consumption has been varied, I’ll try to organize it the way you did:

2
I regularly use the Identify filter interface to focus on just observations without life stage annotations. Once I annotate, the Life Stage tab on the About page becomes useful (several examples in Lygaeinae and Serinethinae ought to reflect this), and it allows me to focus Needs ID efforts first on adults, then on nymphs without hopping back and forth. (My hope is that eventually the CV will be able to use data like this to suggest not only an ID, but also Life Stage.)

I use the compare tool all the time to create maps of adults and nymphs of the same species. For example: https://www.inaturalist.org/observations/compare?s=eyJxdWVyaWVzIjpbeyJuYW1lIjoiRWFzdGVybiBhZHVsdCIsInBhcmFtcyI6InBsYWNlX2lkPTk3Mzk0JnRheG9uX2lkPTUzMjI3JnRlcm1faWQ9MSZ0ZXJtX3ZhbHVlX2lkPTIiLCJjb2xvciI6IiMxZjc3YjQifSx7Im5hbWUiOiJXZXN0ZXJuIGFkdWx0IiwicGFyYW1zIjoicGxhY2VfaWQ9OTczOTQmdGF4b25faWQ9NTMyMjYmdGVybV9pZD0xJnRlcm1fdmFsdWVfaWQ9MiIsImNvbG9yIjoiI2ZmN2YwZSJ9LHsibmFtZSI6IkVhc3Rlcm4gbnltcGgiLCJwYXJhbXMiOiJwbGFjZV9pZD05NzM5NCZ0YXhvbl9pZD01MzIyNyZ0ZXJtX2lkPTEmdGVybV92YWx1ZV9pZD01IiwiY29sb3IiOiIjMmNhMDJjIn0seyJuYW1lIjoiV2VzdGVybiBueW1waCIsInBhcmFtcyI6InBsYWNlX2lkPTk3Mzk0JnRheG9uX2lkPTUzMjI2JnRlcm1faWQ9MSZ0ZXJtX3ZhbHVlX2lkPTUiLCJjb2xvciI6IiNkNjI3MjgifV0sInRhYiI6Im1hcCIsInRheG9uRmlsdGVyIjoibm90X2luX2NvbW1vbiIsInRheG9uRnJlcXVlbmNpZXNTb3J0SW5kZXgiOjIsInRheG9uRnJlcXVlbmNpZXNTb3J0T3JkZXIiOiJkZXNjIiwibWFwTGF5b3V0IjoiY29tYmluZWQiLCJoaXN0b3J5TGF5b3V0IjoiaG9yaXpvbnRhbCIsImhpc3RvcnlJbnRlcnZhbCI6IndlZWsiLCJjb2xvclNjaGVtZSI6ImNhdGVnb3JpY2FsIn0%3D (I have many other more extensive bookmarked examples, but these URL’s are so long…)

Fabienpiednoir and I also focused on Lygaeus kalmii subspecies for a bit, and the compare tool allowed us to define the current distribution and compare against what had been previously published in the 1920’s. I think it may show the shift of the effective 100th meridian.

The search URLs wiki is a key reference. I just consulted it to find that I could use introduced=false as a way to focus on potential native plants for landscaping (and with a quick shift in focus, to identify plants that still need an establishment means added).

4
GBIF query: I wanted to get a sense of what moths I could expect to see on Cape Island, NJ so I ran a GBIF query on a polygon I created as part of my pre-trip preparation and then used the list as a checklist and to note first observations of species that weren’t already on the list (https://doi.org/10.15468/dl.a9qak9 and https://www.gbif.org/occurrence/download/0023232-200613084148143). We should definitely encourage everyone to license their data so it can go to GBIF.

If you’re into arthropods, https://scan-bugs.org/portal/imagelib/search.php queries multiple databases and will return both iNat and non-iNat data. I usually ask it to search everything but iNat.

10
Alexis18’s Sankey and Network diagrams are a great way for me to decide what genera/species to learn next. By learning the aposematic insects that are frequently mis-identified as each other, it becomes a knowledge-depth feedback loop.

And that’s just three of the sections in your outline. I’m eager to learn about the stuff in all the other sections!

jcook · January 4, 2021, 6:17pm

This sounds great! This would be a much-needed source of info for both new and experienced users.

I’m also interested in making these data more accessible, mainly via pyinaturalist, and I would be happy to help with the python-specific details. For example, if you want to describe some use cases I could put together some Jupyter notebooks, and could also add them to the pyinaturalist repo so they can get updated to reflect any future code changes. Also, I put together some data visualization examples here, and plan to add more in the future.

As an aside, one common denominator between Observable and python (besides the similarity to Jupyter) is data visualizations using Vega-Lite. Python has the Altair library, which is based on Vega-Lite. I think it would be worthwhile to show an example with Jupyter + Altair, and a similar example in Observable that accomplishes the same thing, in order to highlight the differences between the two platforms. This would be helpful for anyone who knows both python and JS, or anyone who knows neither language but wants to pick one to learn.

One minor correction: the “latest” docs you linked are for the pre-release version (dev branch); the “stable” docs are for the current stable release (master branch) and are better starting point, unless there’s a specific unreleased feature you want to test out.

pisum · January 4, 2021, 9:14pm

muir earlier specifically mentioned Python as one of the languages he was trying to learn, and i think showing how to create something like the observation + taxon + observer count line graph in muir’s second example use case:

… could cover many bases:

the observation counts could be retrieved from /observations/histogram (which i suspect is probably the most useful endpoint for the majority of people who want to make time series charts)
the taxon and observer counts could be retrieved by iterating through each month to get the total_results value from /observations/species_counts and /observations/observers, respectively.
- you could show that such a request with per_page=0 is a way to get just the record count
- you could also put a delay between requests just to reinforce iNat’s recommendation to try to limit request rates to about 1/sec.
connecting it to a visualization package would be good, too.

something like sbushes’s export would cover additional bases:

hitting /observations is probably another very common activity
highlights PyiNaturalist’s ability (i think) to simplify multi-page requests and formatting / parsing of all the different kinds of values in the results.
CSV exports (in cases where the standard download won’t work for some reason) seem to be a commonly requested thing.

sounds reasonable. the Observable notebooks i highlighted as possible examples above do use Vega-Lite for some charts, and if we wanted the Jupyter and Observable examples for this effort to be exactly the same, i think it would be relatively straightforward to make a new Observable example that matches your Jupyter example.

i’m not sure which link you’re referencing. i did make a link to the Github Repo for PyiNaturalist, and the link there to docs seems to take you to the docs for the stable version. let me know if you still see an issue, or feel free to make a change to the wiki-style draft outline above.

alexis18 · January 4, 2021, 9:44pm

I think it might be useful for new data users something like “What do these licenses mean?” and/or “being an ethical data user” introducing people to citing data from GBIF, how to make sure they’re not violating usage licenses, and other considerations with data?

jcook · January 4, 2021, 10:31pm

i’m not sure which link you’re referencing.

Whoops, that was from an entirely different thread. Too many tabs open!

astra_the_dragon · January 4, 2021, 11:06pm

no such thing exists

jcook · January 4, 2021, 11:11pm

you could also put a delay between requests just to reinforce iNat’s recommendation to try to limit request rates to about 1/sec.

Sure, this is already the behavior for functions that request multiple pages (like get_all_observations()), which is where you’re most likely to exceed the suggested rate limit. But we could also show that explicitly in an example that makes several subsequent requests to different endpoints.

Along those lines, it would be worth mentioning that for especially large datasets (say, 10000+ observations), it’s better to use the export tool, as mentioned in the API Recommended Practices.

CSV exports (in cases where the standard download won’t work for some reason) seem to be a commonly requested thing.

GET /observations from the older Rails-based API actually has an option for CSV, among other data formats. See python wrapper here. I think that’s one of the few useful features remaining in the old API that aren’t yet available in the new one.

Would also it be useful to show the more general case of converting a JSON response into CSV? The only trick there is that if you want nested data (for example, taxon records within an observation response), those need to be flattened first. To handle that, pandas.json_normalize() is pretty convenient.

the taxon and observer counts could be retrieved by iterating through each month to get the total_results value from /observations/species_counts and /observations/observers, respectively

Since you mentioned the /observations/observers endpoint and have experience with the API in general, do you have any insight into what’s going on with this issue on GitHub?

@willkuhn and I ran into that recently while working on adding that endpoint to pyinaturalist. It’s possible that the limit of 500 results (regardless of per_page) is the intended behavior, but the other issues described there currently make that endpoint difficult to use.

pisum · January 4, 2021, 11:17pm

this is a continuation from the earlier post (in reply to @muir) but i’m going to change the example a bit. here, i’m going to get incremental (new) “species” (leaf taxa) for Alaska in each month in 2020. to get there, here’s my thought process:

on the Explore page in iNaturalist, the Species tab data + count come from the /observations/species_count API endpoint (https://api.inaturalist.org/v1/observations/species_counts)
in order to get incremental counts for each month, i need to start by getting a series of cumulative counts for each month, plus an extra one from the end of the previous year
i want to filter for Alaska (place_id=6) and verifiable=true and get cumulative counts based on submit date (create_d2={last day of month}). also, i just need a total count, not any details (per_page=0). so my API request would be: https://api.inaturalist.org/v1/observations/species_counts?per_page=0&verifiable=true&place_id=6&created_d2={last day of month}
in Excel 2013 and up, there is a =WEBSERVICE(url) formula that i can use to retrieve the results returned from a URL
the raw results returned by iNaturalist for this kind of request will always follow a specific format. so i can use SUBSTITUTE to eliminate the extra stuff i don’t need

so i would set up a spreadsheet with the following structure:

column A: months 2019-12-01 to 2020-12-01
column B: last days of the months from column A
column C: a WEBSERVICE request to the appropriate URL, varying the formula in each row by last day of month
column D: get just the number from column C
column E: subtract the cumulative count from the previous row’s cumulative count

so assuming “Month” is in cell A1, this is what that looks like:

`(A)` Month	`(B)` Last Day	`(C)` Raw JSON	`(D)` Cumulative Species	`(E)` Incremental Species
2019-12-01	2019-12-31	`=WEBSERVICE("https://api.inaturalist.org/v1/observations/species_counts?per_page=0&verifiable=true&place_id=6&created_d2="&B2)`	`=SUBSTITUTE(SUBSTITUTE(C2,"{""total_results"":",""),",""page"":1,""per_page"":0,""results"":[]}","")`
2020-01-01	2020-01-31	`=WEBSERVICE("https://api.inaturalist.org/v1/observations/species_counts?per_page=0&verifiable=true&place_id=6&created_d2="&B3)`	`=SUBSTITUTE(SUBSTITUTE(C3,"{""total_results"":",""),",""page"":1,""per_page"":0,""results"":[]}","")`	`=D3-D2`
2020-02-01	2020-02-29	`=WEBSERVICE("https://api.inaturalist.org/v1/observations/species_counts?per_page=0&verifiable=true&place_id=6&created_d2="&B4)`	`=SUBSTITUTE(SUBSTITUTE(C4,"{""total_results"":",""),",""page"":1,""per_page"":0,""results"":[]}","")`	`=D4-D3`
2020-03-01	2020-03-31	`=WEBSERVICE("https://api.inaturalist.org/v1/observations/species_counts?per_page=0&verifiable=true&place_id=6&created_d2="&B5)`	`=SUBSTITUTE(SUBSTITUTE(C5,"{""total_results"":",""),",""page"":1,""per_page"":0,""results"":[]}","")`	`=D5-D4`
2020-04-01	2020-04-30	`=WEBSERVICE("https://api.inaturalist.org/v1/observations/species_counts?per_page=0&verifiable=true&place_id=6&created_d2="&B6)`	`=SUBSTITUTE(SUBSTITUTE(C6,"{""total_results"":",""),",""page"":1,""per_page"":0,""results"":[]}","")`	`=D6-D5`
2020-05-01	2020-05-31	`=WEBSERVICE("https://api.inaturalist.org/v1/observations/species_counts?per_page=0&verifiable=true&place_id=6&created_d2="&B7)`	`=SUBSTITUTE(SUBSTITUTE(C7,"{""total_results"":",""),",""page"":1,""per_page"":0,""results"":[]}","")`	`=D7-D6`
2020-06-01	2020-06-30	`=WEBSERVICE("https://api.inaturalist.org/v1/observations/species_counts?per_page=0&verifiable=true&place_id=6&created_d2="&B8)`	`=SUBSTITUTE(SUBSTITUTE(C8,"{""total_results"":",""),",""page"":1,""per_page"":0,""results"":[]}","")`	`=D8-D7`
2020-07-01	2020-07-31	`=WEBSERVICE("https://api.inaturalist.org/v1/observations/species_counts?per_page=0&verifiable=true&place_id=6&created_d2="&B9)`	`=SUBSTITUTE(SUBSTITUTE(C9,"{""total_results"":",""),",""page"":1,""per_page"":0,""results"":[]}","")`	`=D9-D8`
2020-08-01	2020-08-31	`=WEBSERVICE("https://api.inaturalist.org/v1/observations/species_counts?per_page=0&verifiable=true&place_id=6&created_d2="&B10)`	`=SUBSTITUTE(SUBSTITUTE(C10,"{""total_results"":",""),",""page"":1,""per_page"":0,""results"":[]}","")`	`=D10-D9`
2020-09-01	2020-09-30	`=WEBSERVICE("https://api.inaturalist.org/v1/observations/species_counts?per_page=0&verifiable=true&place_id=6&created_d2="&B11)`	`=SUBSTITUTE(SUBSTITUTE(C11,"{""total_results"":",""),",""page"":1,""per_page"":0,""results"":[]}","")`	`=D11-D10`
2020-10-01	2020-10-31	`=WEBSERVICE("https://api.inaturalist.org/v1/observations/species_counts?per_page=0&verifiable=true&place_id=6&created_d2="&B12)`	`=SUBSTITUTE(SUBSTITUTE(C12,"{""total_results"":",""),",""page"":1,""per_page"":0,""results"":[]}","")`	`=D12-D11`
2020-11-01	2020-11-30	`=WEBSERVICE("https://api.inaturalist.org/v1/observations/species_counts?per_page=0&verifiable=true&place_id=6&created_d2="&B13)`	`=SUBSTITUTE(SUBSTITUTE(C13,"{""total_results"":",""),",""page"":1,""per_page"":0,""results"":[]}","")`	`=D13-D12`
2020-12-01	2020-12-31	`=WEBSERVICE("https://api.inaturalist.org/v1/observations/species_counts?per_page=0&verifiable=true&place_id=6&created_d2="&B14)`	`=SUBSTITUTE(SUBSTITUTE(C14,"{""total_results"":",""),",""page"":1,""per_page"":0,""results"":[]}","")`	`=D14-D13`

… and here’s what the final result looks like:

`(A)` Month	`(B)` Last Day	`(C)` Raw JSON	`(D)` Cumulative Species	`(E)` Incremental Species
2019-12-01	2019-12-31	`{"total_results":4961,"page":1,"per_page":0,"results":[]}`	4961
2020-01-01	2020-01-31	`{"total_results":4975,"page":1,"per_page":0,"results":[]}`	4975	14
2020-02-01	2020-02-29	`{"total_results":5002,"page":1,"per_page":0,"results":[]}`	5002	27
2020-03-01	2020-03-31	`{"total_results":5022,"page":1,"per_page":0,"results":[]}`	5022	20
2020-04-01	2020-04-30	`{"total_results":5095,"page":1,"per_page":0,"results":[]}`	5095	73
2020-05-01	2020-05-31	`{"total_results":5212,"page":1,"per_page":0,"results":[]}`	5212	117
2020-06-01	2020-06-30	`{"total_results":5319,"page":1,"per_page":0,"results":[]}`	5319	107
2020-07-01	2020-07-31	`{"total_results":5446,"page":1,"per_page":0,"results":[]}`	5446	127
2020-08-01	2020-08-31	`{"total_results":5571,"page":1,"per_page":0,"results":[]}`	5571	125
2020-09-01	2020-09-30	`{"total_results":5691,"page":1,"per_page":0,"results":[]}`	5691	120
2020-10-01	2020-10-31	`{"total_results":5723,"page":1,"per_page":0,"results":[]}`	5723	32
2020-11-01	2020-11-30	`{"total_results":5761,"page":1,"per_page":0,"results":[]}`	5761	38
2020-12-01	2020-12-31	`{"total_results":5810,"page":1,"per_page":0,"results":[]}`	5810	49

a few final notes:

when the formula in column C refers to the date in column B, i need to make sure it’s formatted as text in a format that iNaturalist will understand. (if it’s not text, then the formula in C will need to convert it to the appropriate data type and format.)
iNaturalist doesn’t like you to hit it with too many API requests at the same time. so in this case, you’ll probably want to go down row by row when copying down the WEBSERVICE formula in column C. (if you see it returning an invalid value, just go into the formula input box for that cell, and hit enter to trigger a refresh of that cell.) so this is a little manual, but maybe it’s still faster and less error-prone than copy-and-pasting numbers from the Explore screen.
this example hits the /observations/species_counts endpoint, but you can hit other endpoints in the same way, too. the observations count comes from /obervations, and the observers count comes from /observations/observers. if you ever need to see what endpoints are available or what their parameters are, you can go to https://api.inaturalist.org/v1/docs/ to look up that information.

ok… that’s it for now. if you have any questions, let me know. otherwise, i probably won’t provide any additional Excel + API guidance until i actually write up an actual tutorial for that…