this is good information, with good examples of use cases.
i’ll respond to more of what you talked about in your post at some point in the future, but i just want to provide you some quick information that may help speed up your data gathering process sooner than later.
in general, it looks like you’re gathering 3 kinds of data:
- counts of observations
- counts of leaf taxa
- counts of unique observers
because of the nature of your data, you actually have 3 methods available to you to get some or all of this data.
- get counts from the system for each time period, one time period at a time. this is what you’re doing manually, and there are ways to use the API – even within Excel – to speed up this process.
- download observations (with observation id and observer fields) into CSV, and use Excel’s PowerPivot functionality (available in 2013 and up, plus 365) to do a count of observations by time period, plus a distinct count of observers by time period. this method is only available in cases where the total number of observation for all periods that you’re interested in totals <= 200000 (the CSV export limit), or else you’ll have to do multiple CSV downloads and merge them together. that method starts to get unwieldy for really large sets of data, and it doesn’t handle counts of leaf taxa, since that’s a complicated kind of aggregation that i don’t Excel handles without jumping through a lot of hoops.
- for the counts of observations, you can actually get a whole set of counts for sequential time periods in one request, using the observation histogram API endpoint. (that endpoint is what provides the data for the seasonality, phenology, etc. graphs on the taxon pages, and this method is the most efficient for getting counts of observations over time.)
so here’s a quick summary of the data types and methods available to gather that data:
count of |
method 1 |
method 2 |
method 3 |
observations |
Y |
Y |
Y |
leaf taxa |
Y |
n |
n |
observers |
Y |
Y |
n |
i’ve sort of described method 2 in all the necessary detail above. i’ll cover method 1 in more detail later when i have more time, since it requires some screenshotting and such, but let me just provide a few quick notes about method 3…
if you go over to https://api.inaturalist.org/v1/docs/#!/Observations/get_observations_histogram, you’ll find an interface that provides information about the Observation Histogram API endpoint. you can see what parameters are available to the endpoint – more or less all the parameters available in the observation search (Explore page), plus additional parameters to specify which time intervals you want (day, month, year, etc.), and what kind of date you want to use as your interval (create date or submit date). you can fill in parameters values and use the interface to run and view the results.
so for example, if i wanted to recreate the first chart in your Alaska journal example, i could fill in place_id=6 (Alaska), d1=2009-01-01 (start at 2009), interval=year, date_field=observed (i think that’s what you used), and verifiable=true. then when i click the “Try it out!” button, the interface there will give me a request string like this: https://api.inaturalist.org/v1/observations/histogram?verifiable=true&place_id=6&d1=2009-01-01&date_field=observed&interval=year, and it’ll also show me the results in a nice format:
{
"total_results": 13,
"page": 1,
"per_page": 13,
"results": {
"year": {
"2009-01-01": 1344,
"2010-01-01": 1964,
"2011-01-01": 1380,
"2012-01-01": 4580,
"2013-01-01": 2079,
"2014-01-01": 3385,
"2015-01-01": 4963,
"2016-01-01": 8345,
"2017-01-01": 21374,
"2018-01-01": 25520,
"2019-01-01": 36373,
"2020-01-01": 39384,
"2021-01-01": 193
}
}
}
alternatively, you could use something that i made back in the day to get the same results with a few extras (a bar graph, plus ability to export a CSV, etc.): https://jumear.github.io/stirfry/iNatAPIv1_observation_histogram.html?verifiable=true&place_id=6&d1=2009-01-01&date_field=observed&interval=year. (notice how this page uses the same parameters – everything after the ? in the URL – as the request string for the API.)
in the future, i’ll describe how you could make the same observation histogram API request in Excel using Power Query, but i won’t get into that yet, since these other options are probably easy enough to use for now. (if you wanted to do a little internet research on your own, you could probably find existing tutorials that would tell you how to make the API request above from within Excel + Power Query.)
but i do intend to come back sooner (within the next few days) and describe how to use Excel + API to accomplish method 1. so stay tuned for that…