Using Excel + API

Overview

Excel is probably a lot of people’s go-to tool for keeping track of metrics, crunching numbers, and visualizing data in charts and graphs.

Most people using Excel to keep track of iNaturalist data probably use either of 2 workflows:

  1. Filter for observations in the Explore page, and copy the count of observations, species, or observers from the screen to Excel. To get counts by group (ex. observation count by month), vary the filter criteria in the Explore page for each member of the group (ex. for month, January, February, March, etc.), and collect each variation’s count(s) in Excel.
  2. Export observation details as a CSV and then aggregate the data in Excel (using the COUNT/COUNTIF function, PivotTables, etc.) to get either a total count (of observations, species, or observers) or a count by group (ex. obs count by observer, obs count by species, etc.)

The iNaturalist API can be used to improve these workflows by providing:

  1. more automation in the data import process
  2. efficient pre-aggregated data delivery (ex. the observation histogram endpoint provides observation counts for a series of time intervals, the observers endpoint provides observation counts for each observer, etc.)
  3. more kinds of data (ex. identifications, identifiers, identified species, etc.)

Excel can access the API in 2 main ways that are analogous to the 2 workflows above:

  1. The WEBSERVICE function can return the results from a web service (like the iNaturalist API). Since many of the most useful iNat API endpoints provide a standard total_results value as part of the response, parsing the total_results value from a WEBSERVICE call response can be a quick way to get a count of observations, species, etc. for a given set of parameters.
  2. The Power Query module can transform the results of a JSON file (which is the response from many of these iNat API endpoints) into a table. This provides a fast way to get counts by group in some common use cases (ex. observations by observer, species by observer, observations by species, etc.).

In this tutorial, we’ll over some of the API endpoints that might be useful when paired with Excel, and then we’ll walk through how to make an Excel workbook that leverages the API for a hypothetical use case.

Assumptions / Prerequisites

Throughout this tutorial, we’ll assume that you have:

  1. Excel (either 365 or a version that is 2013 or later).
    • The screenshots and workflow in this tutorial will reflect 365, but they should be similar for other versions
    • To use Power Query on older versions, you may need to explicitly install the module
  2. A Windows machine. (The WEBSERVICE function relies on Windows OS features. Power Query may have limited functionality in Mac versions of Excel.)
  3. A basic understanding of how iNaturalist works
  4. (helpful) A good understanding of how to find data using the website’s Explore page, including how to filter by modifying parameters in the URL.

Useful API Endpoints

iNaturalist actually offers 3 APIs right now (as of early 2021). The table below summarizes them and includes documentation links that provide detailed information about each API, including lists of endpoints and associated parameters. Unless otherwise noted, for the rest of this tutorial, we’ll discuss only the v1 API, since it’s the current one.

Description Status Documentation Link
Original Ruby-based Deprecated https://www.inaturalist.org/pages/api+reference
v1 Node-based Current https://api.inaturalist.org/v1/docs
v2 Node-based In Development (as of early 2021) https://api.inaturalist.org/v2/docs

Below is a selected group of endpoints that may be particularly useful when paired with Excel. Endpoints that provide a total_results value can be used effectively with the WEBSERVICE function in Excel. Those providing aggregated data can be used with Power Query to get counts by group, up to the per_page maximum limit of the endpoint. (For example, ID Identifiers can return counts of Identifications made by up to 100 Identifiers.) Going above the per_page limit requires fetching multiple pages, involving more complicated coding. At that point, it probably makes more sense to use tools other than Excel to get data from the API, and that may be covered in future tutorials. It may also be possible to fall back to Workflow 2, involving CSV exports.

Description Route Includes total_results value Aggregated Data Default Results per_page Max Results per_page
Observations GET /observations Y N 30 200
Obs Species GET /observations/species_counts Y Y 500 500
Obs Observers GET /observations/observers Y Y 500 500
Obs Histogram GET /observations_histogram Y1 Y N/A1 N/A1
Identifications GET /identifications Y N 30 200
ID Categories GET /identifications/categories Y2 Y 42 42
ID Species GET /identifications/species_counts Y Y 500 500
ID Identifiers GET /identifications/identifiers Y Y 30 100

Notes:

  1. The obs histogram response is not paged, but it may be limited to a certain number of intervals, depending on the interval type and other parameters specified
  2. There are only 4 identification categories (leading, improving, supporting, maverick)

The base URL for the API is https://api.inaturalist.org/v1/. So then the request URL for a given endpoint above will be the base URL + the route. For example, the request URL for ID Species will be https://api.inaturalist.org/v1/identifications/species_counts.

Example Use Case

General Idea for an Assignment

Suppose I’m a professor teaching ecology, and want my class of 20 students to use iNaturalist as part of an assignment to document biodiversity in the area. I want to break the assignment into 2 weeks, starting 19 April 2021 (Monday) and ending 2 May 2021 (Sun).

During week 1, I want my students to just to experiment a little with the mechanics of observing in iNaturalist. Perhaps more importantly, though, I want them to spend some time identifying in iNaturalist to familiarize themselves with:

  1. the identification process
  2. what kinds of organisms are being observed in the area
  3. what kinds of features are important to capture in observations to get a proper identification
  4. the community nature of iNaturalist

Then in week 2, I’ll set them loose to document the organisms in the area.

Translating the General Idea to Metrics

From a metrics perspective, I’ll ask each student to make at least 20 identifications for others and at least 5 verifiable observations during week 1. Then for week 2, I’ll ask them to each make an additional 20 verifiable observations (including 10 that reach research grade), identify each of their own observations, and then make an additional 10 identifications for others.

Workbook Walkthrough

We can use Excel with the API to help us track our students’ progress.

Step 1: Initial Prep

1a: Set up a Workbook

Open up new workbook in Excel, and create a general layout for your main spreadsheet. Below, I’ve created a table with student names and iNaturalist logins on the left, and then additional columns to capture identification and observation statistics for each student. For simplicity, I’m not breaking out week 1 stats and week 2 stats. Instead, I’ll just capture week 1 stats for now, and then adapt a copy of the same workbook for week 2.

(As we go through this tutorial, I may refer to specific cells from this example layout. For example, the first student’s login would be in cell B4.)

(It’s a good idea to save at this point.)

1b: Get a Comma-Separated List of Logins

For many of the API requests I make, I’ll need to specify a comma-separated list of iNaturalist user ids (numbers) or logins (strings). Here’s one easy way to translate the logins from above to a comma-separated list:

  1. Highlight the cells containing the logins (B4:B23), and copy.
  2. Open up a blank Word document. (I’ll assume you have Word because you have Excel.)
  3. Right-click in the blank document, and select Paste > Keep Text Only
  4. Find and replace (Ctrl+H) all paragraph breaks (^p) with commas (,).
  5. Eliminate any trailing commas. Then save the resulting list of comma-separated logins somewhere, for use later. (I’m going to just save it in an empty cell in my workbook.)

Step 2: Get Observation Statistics by User

As noted earlier, the observation-related metrics that I want my each of my students to target are:

  • 5 verifiable observations in week 1
  • 20 verifiable observations in week 2
    • including 10 that reach research grade

That means I need to track verifiable observations by student, along with research grade observations by student. In addition, I’m going to get all observations by student just so I can have some visibility into possible issues along the way (ex. if a student is making a lot of non-verifiable observations).

We can get these (observation count by observer) stats by making requests to the API’s Observations Observers endpoint. This is the same endpoint used by the website to get the information shown in the Observers section of the Explore page.

The request URL for this endpoint is https://api.inaturalist.org/v1/observations/observers. We can get different sets of results by passing along different sets of parameters. (Parameters are just added to the end of the request URL.) In this case, we actually need to make 3 variants of this request, as summarized below. Remember that we’ll start off by getting data for just week 1 (19 April - 25 April 2021).

Description URL Parameters*
All observations ?d1=2021-04-19&d2=2021-04-25&user_id=[student list]
Verifiable obs ?d1=2021-04-19&d2=2021-04-25&verifiable=true&user_id=[student list]
Research grade obs ?d1=2021-04-19&d2=2021-04-25&quality_grade=research&user_id=[student list]

* replace [student list] with the actual comma-separated list of students that we created earlier in step 1

2a: Get all observations

First, let’s walk through how to get all observations for these students in week 1. Stringing together the request URL + parameters, this is the full URL that we’ll need for this request:
https://api.inaturalist.org/v1/observations/observers?d1=2021-04-19&d2=2021-04-25&user_id=arboretum_amy,sedgequeen,trh_blue,carrieseltzer,bouteloua,cmcheatle,dianastuder,jwidness,jdmore,kueda,melodi_96,kiwifergus,pleary,loarie,pisum,sbushes,lotteryd,cthawley,tiwane,zdanko

We’ll be using the workflow that involves Power Query. So it’s important that the API request here returns at least one record back to let Excel know how the records will be structured. You can open up the above URL in your web browser just to see if total_results > 1. (If there are no records returned, you can change the parameters slightly for now so that you will have some data to use when setting up your query. Just remember to change the parameters back to the appropriate values later.)

Now go to Excel and pull up the workbook that we made earlier. Go to Data > Get Data > From Web (circled in red below).
image

When prompted for a URL, copy the URL from just above, and paste into the input box. Then click OK.

For the next prompt, use Anonymous mode, and apply these settings to /observations/observers. Then click Connect.

In the next screen, you’ll see the top level of the API response. This gives you a record count at the top, and a results list at the bottom. Click List, under results.

Now you’ll see the records in the list. In this case, there are 5 records. At this point, click the Convert > To Table button in the top-left corner of the window.

When asked, leave the delimiter selection as None, and click OK.

At this point, the result won’t look much like a table yet. There are the 5 records from before, and the header is labeled as Column1. It’s probably a good idea to change Column1 to something more meaningful, like Observers. Then click the Expand button circled in red below.

At this point, we can select which attributes we want to get from each record. For our example, we really just need the observation_count and user. Deselect the other items, and click OK.

Now we have a table with 2 columns: observation count and user. Note that user can be further expanded. So we’ll click the Expand button, select just login, and click OK.

At this point, we have the data we need, but it’s sort of odd having the logins on the right and the counts on the left. So just click and drag the login column to the left of the count column. Also, in the right side of the screen under Query Settings > Properties > Name, change the name to something more readable, like “Obs (All)”.

Everything looks good at this point. So just click the Close & Load button in the top-left corner of the screen.

This creates a a new tab named Obs (All), with user logins and observation counts in a table. On the right side of the screen, we also have a new Power Query named Obs (All). Right click the Query item, and select duplicate.

2b: Get Verifiable Observations & Get Research Grade Observations

Now we’re in the Power Query editing screen for a new query called Obs (All) (2). Note that in the right side of the screen, there’s an Applied Steps list that shows all the same actions we took previously to go from request URL to final table. Since we want to make a slightly different API request this time, we’ll change the source of our query. Click the gear next to Source.

When prompted, update the URL with the URL that we need for verified observations (just add a parameter &verifiable=true). Then click OK.

Under Properties on the right side of the screen, give this query a new name, like Obs (Verifiable). In the Applied Steps, click the last step to see the final table. Then click Close & Load.

We now have a new Obs (Verifiable) tab and query. Duplicate this, and create another tab and query for Obs (RG) using the same kind of process. (Hint: For the URL, add a parameter &quality_grade=research.)

Once you’ve created all 3 Queries and corresponding worksheets, you can reorder worksheets as needed. You may notice that the data seems out of sync. For example, in my example, I have 5 records in Obs (All) but 6 records in Obs (Verifiable), even though verifiable should be a subset of all. This is just due to timing. (Observations were made in between the time we created Obs (All) and Obs (Verifiable).) Later, we’ll refresh these all at the same time, and that will sync the data.

(It’s a good idea to save at this point.)

Step 3: Get Identification Statistics by User

As noted earlier, the identification statistics I want my students to target are:

  • 20 identifications for others in week 1
  • 10 identifications for others in week 2
  • identify each observation made in week 2

This means that I’ll want to get counts of identifications made by each student for others and for themselves, and also for themselves for observations during a specific date range. (The last set of data will allow us to make sure observations are all identified by the observer.) We can get these stats by making 3 variations of requests to the API’s Identifications Identifiers endpoint, which has a base request URL https://api.inaturalist.org/v1/identifications/identifiers/.

To apply the proper date ranges, we’ll need to understand that identification dates are recorded at UTC. So we’ll need to adjust the identification datetimes accordingly to account for potential time zone differences. My time zone is Central Daylight (-5:00). So I’ll have to add 5 hours to my local date ranges to translate to UTC. (I do not need to offset the observed date because they are generally recorded at local time, assuming the user has the proper time zone settings in their profile.)

Description URL Parameters*
IDs For Others ?own_observation=false&d1=2021-04-19T05:00&d2=2021-04-26T04:59:59&user_id=[student list]
IDs For Self ?own_observation=true&d1=2021-04-19T05:00&d2=2021-04-26T04:59:59&verifiable=true&user_id=[student list]
IDs For Self (Selected) ?own_observation=true&d1=2021-04-19T05:00&d2=2021-04-26T04:59:59&observed_d1=2021-04-19&observed_d2=2021-04-25&verifiable=true&user_id=[student list]

* replace [student list] with the actual comma-separated list of students that we created earlier in step 1

Just for reference, this will be the full URL needed for IDs For Others:
https://api.inaturalist.org/v1/identifications/identifiers?own_observation=false&d1=2021-04-19T05:00&d2=2021-04-26T04:59:59&user_id=arboretum_amy,sedgequeen,trh_blue,carrieseltzer,bouteloua,cmcheatle,dianastuder,jwidness,jdmore,kueda,melodi_96,kiwifergus,pleary,loarie,pisum,sbushes,lotteryd,cthawley,tiwane,zdanko

The steps to get ID stats will be very similar to those for the Observer stats. We’ll use the workflow that involves Power Query again. So, again, it’s important that the API request here returns at least one record to let Excel know how records will be structured. Modify parameters temporarily, if needed, to get back at least one record.

We’ll start with IDs for Others:

  1. Go to Data > Get Data > From Web
    a. When prompted, provide the URL from above
    b. Apply Anonymous settings to /identifications/identifiers
  2. Click the List option under results
  3. Click To Table in the top-left corner of the window
    a. Select None as the delimiter
  4. (optional) Change header label from “Column1” to “Identifiers”
  5. Expand the column
    a. Select just count and user as columns
  6. Expand the user column
    a. select just the login column
  7. Move the Login column to the left of the count column
  8. Rename the query to IDs (For Others)
  9. Close & Load

Now, IDs for Self:

  1. Duplicate the IDs (for Others) query
  2. Under Applied Steps, modify the Source
    a. the URL will need to reflect own_observations=true (instead of =false)
  3. Change the name of the query to IDs (for Self)
  4. Click the last Applied Step
  5. Close & Load

Now, IDs for Self (Selected):

  1. Duplicate the IDs (for Self) query
  2. Under Applied Steps, modify the Source
    a. the URL will need to reflect a specific observed date range &observed_d1=2021-04-19&observed_d2=2021-04-25
  3. Change the name of the query to IDs (for Self) (Selected)
  4. Click the last Applied Step
  5. Close & Load

Rearrange the worksheets in the workbook, as needed, and save.

Step 4: Tie Everything Together

At this point, we have our Main worksheet, with 6 worksheets that contain the results from 6 corresponding Power Queries.

Assuming the structure from the screenshot above, add the following formulas in the cells noted below:

Cell Formula
C4 =D4+E4
D4 =IFERROR(VLOOKUP(B4,IDs__For_Others[#All],2,FALSE),0)
E4 =IFERROR(VLOOKUP(B4,IDs__For_Self[#All],2,FALSE),0)
F4 =IFERROR(VLOOKUP(B4,IDs__For_Self___Selected[#All],2,FALSE),0)
G4 =IFERROR(INDEX(Obs__All[[#All],[Observers.observation_count]],MATCH(B4,Obs__All[[#All],[Observers.user.login]],0)),0)
H4 =IFERROR(INDEX(Obs__Verifiable[[#All],[Observers.observation_count]],MATCH(B4,Obs__Verifiable[[#All],[Observers.user.login]],0)),0)
I4 =G4-H4
J4 =IFERROR(INDEX(Obs__RG[[#All],[Observers.observation_count]],MATCH(B4,Obs__RG[[#All],[Observers.user.login]],0)),0)

Basically, these formulas will look up the login from column B and return the corresponding value from the appropriate Power Query to the appropriate column in the main worksheet. Note that I used VLOOKUP for IDs and INDEX/MATCH for Obs, returning exact matches. (Functionally, VLOOKUP and INDEX/MATCH are equivalent, but I wanted to provide examples of both just for reference.) I’ve also used IFERROR to set the value to 0 if there is no match found. (I won’t explain the formulas in much more detail, but you can look at the Excel help documentation or other references online for more information.)

Once all the formulas have been added to row 4, copy them down for all the rows. Finally, get the latest data by going to Data > Queries and Connections > Refresh All:

Save the workbook.

In the future, to get the latest data, just Refresh All. (If prompted that External Data Connections have been disabled, just click the appropriate button to Enable Content.)

Step 5 (optional): Column Chart of Observations by Day

(needs additional work)

We’ll continue the example by getting data for a column chart of observations for the entire class for each day during the week. We’ll get the data from the Observations Histogram endpoint. This endpoint is used by the website in the Taxon page to produce the seasonality, history, and phenology graphs.

This endpoint is a little different from others in that it formats its response a little differently, which means that the Power Query workflow will be slightly different. Also, the response is unpaged, and you can also specify the aggregation basis by setting interval and date_field parameters. So there will be many aggregation options available in this single endpoint.

For our example, we we’ll set interval=day (since we want daily counts) and ignore date_field, since the default is observed date (which is what we want). As usual, we’ll want to make sure our request URL actually returns at least one record.
https://api.inaturalist.org/v1/observations/histogram?interval=day&d1=2021-04-19&d2=2021-04-25&user_id=arboretum_amy,sedgequeen,trh_blue,carrieseltzer,bouteloua,cmcheatle,dianastuder,jwidness,jdmore,kueda,melodi_96,kiwifergus,pleary,loarie,pisum,sbushes,lotteryd,cthawley,tiwane,zdanko

Go to Data > Get Data > From Web, and use the URL noted above. When prompted, use Anonymous mode for /observations/histogram. When you see the top level of the response, click Record under results.

Then click Record for day.

At this point, you’ll see a list of dates with counts. It’s close but not quite a table. So click Convert > Into Table.

Now you have data in a true table structure. Change the header names from Name and Value to something more meaningful, like Day and Count, respectively. Then change the Query Name to something more meaningful like Obs by Day. Then Close & Load

At this point, a new worksheet will be created with a table, and you’ll also have a new power query.

Highlight the entire table, and select Insert > Charts > Column or Bar Chart > Simple 2D Column Chart

This will add a column chart to the worksheet.

Step 6 (optional): Count of Identifications for Any Given Date Range

(needs additional work)

This final example uses the WEBSERVICE function. The key to successfully using this approach is to use an endpoint that provides a total_results value, and to pass in a parameter per_page=0 when making the API request. (We use per_page=0 because we don’t actually need record details, just the total_results count.)

Create a new worksheet called ID Count, with the basic structure shown below:

Then add a couple of formulas:

  • in C2: =WEBSERVICE("https://api.inaturalist.org/v1/identifications?per_page=0&d1="&A2&"&d2="&B2&"&user_id=arboretum_amy,sedgequeen,trh_blue,carrieseltzer,bouteloua,cmcheatle,dianastuder,jwidness,jdmore,kueda,melodi_96,kiwifergus,pleary,loarie,pisum,sbushes,lotteryd,cthawley,tiwane,zdanko")
  • in D2: =MID(C2,FIND(":",C2)+1,FIND(",",C2)-FIND(":",C2)-1)

If successful, you should get something like this:

At this point, you could do a few additional things:

  • Hide column C so that you’ll just see your date ranges and count
  • Adapt the formula in column C a bit to handle the time zone conversion for you automatically
  • Add additional rows to show counts for multiple date ranges. Just be aware that you shouldn’t try to add too many rows, since each row would represent 1 API request each. iNaturalist recommends that you keep API requests to about 1 request/sec, and if you try to send too many at one time, you’ll get incomplete results in some of those rows.

Step 7: Create a copy of the workbook for Week 2

For week 2, make a copy of this workbook. Then in the copy, go to Data > Queries & Connections > Queries & Connections to open up the list of Power Queries. Edit each of the Queries, modifying the source to reflect the appropriate date ranges. Then save the new workbook.

Additional Notes

There are a few quirks when working with API parameters:

  1. Login changes are not currently handled well. This example may not have worked well if any of the students changed their iNaturalist logins during the tracking period.
  2. We touched on the need to do a timezone conversion when specifying values for the identification date parameter. This will be necessary for any date parameter, other than observed date.

I’m putting this tutorial out there initially as a rough draft, and I’ll clean it up over time. Feel free to make edits, as needed. This post is a Wiki.

24 Likes

Ah, I’m in this class?! I’m definitely not keeping up with the reading ;)

5 Likes

Lol, and the field work starts in a week!

2 Likes

Incredible. I often tought it should be possible to do a bit more with the data.

1 Like

Amazing! Thanks for pulling this together. I’ve always been interested in getting beyond using the filter to perform the task properly (doesn’t always work the way it did) and then work with the CSV in XL.

2 Likes

Thank you so much for taking the time to put this together. I was thinking that my goal for 2023 would be to learn how go beyond the GUI tools to answer questions and then I discovered this “course!” Very grateful!!

1 Like

Good I found this topic before field work starts!

Note that kind of requirement is discouraged.