Select observations to batch-download from list of observation IDs

I periodically need a CSV export of a specific set of records that cannot be easily filtered based on any combination of attributes, and for which I have a list of the observation IDs. Is there a way to generate a CSV export for a batch of observations based on a comma-separated (or otherwise-formatted) list of observation IDs? In the past I have either:

  1. Exported a larger set than I need, and then used a lookup against the exported CSV to select only the target observations. This works but means requesting larger downloads than I need, which is slow and I assume uses more computing resources on the iNat end (although I do try to keep the size of the exports as small as I can).
  2. Used the API, which worked, but was a pretty laborious process

I suppose I could create a specific project or tag and then use that to filter for export, but then I would have to go through and remove hundreds of observations from the project (or the tag from hundreds of observations) after each batch.

1 Like

although you can get a specific set of observations by ID (up to about 700 observations last time i checked – ex. https://www.inaturalist.org/observations?id=261373008,261373007), it doesn’t look like this carries over to the standard CSV export screen. i’m not sure if that’s just an oversight or if it’s intentional or if it’s a bug or if i just don’t know how to use that screen properly.

i typically find that with the API, the first time is always the hardest. but once you’ve got it set up, it should be easy to repeat. if you need help with this, just describe what you’re trying to do, and i can show you how i would approach this.

Thanks. Too bad that doesn’t carry over to the export screen; sure would make it easy to do what I need. Perhaps I will go back to the API and try to figure out a simpler, more-repeatable process. I’ll let you know if I run into questions.

@pisum, I’m working on figuring out an R script to do this through the API with the rinat library, but immediately remembered a complicating issue: a number of the observations in question have obscured locations, but the user has granted me permission to view the actual locations of their obscured observations. When I’m logged into my iNat account and batch-export affected observations, I get the “private_latitude” and “private_longitude” in addition to the obscured values. Is there a way to pass my credentials to the API so that I can access these values with an API call?

Looks like this R package from @hanly probably does a lot of what I need, if I can get it to install…

as far as i know, the rinat library hits the deprecated API. so i personally would not develop against that. instead, i would hit the current version of the API (currently, v1) directly or else use some module that hits the current API, such as spocc.

it depends on the capabilities of module you’re using. if you’re hitting the API directly, definitely yes.

for me, usually the easiest thing to do when hitting the API directly is to go to https://www.inaturalist.org/users/api_token while you’re logged into inaturalist.org. you’ll see a bit of text that looks something like {"api_token": "api.token.value"}. this is a JSON web token (JWT). i’ll copy the api.token.value and use this as my bearer token in my Authorization header when i make my API requests. the JWT will expire every 24 hours. so you’ll just need to get a new one if you want to do the same thing after it has expired. you don’t want to share the JWT with anyone else or store it in a way that is insecure because anyone who has access to it will be able to do have full access to your account until the token expires.

if you want to have a more automated or secure workflow, it’s a more complicated setup. you’d have to go through the process of registering an iNaturalist application and then set up a flow to get an OAuth token and then a JWT using the OAuth token. (technically it’s possible to use the OAuth token to make some requests in place of the JWT, but it’s not something i would do because i believe iNat doesn’t expire the OAuth token until you request another, and it could be a bigger deal if you lost control of it unknowingly than losing control of a JWT.)

…

it’s not R, but if you want to avoid having to write something from scratch, you’re welcome to use my JupyterLite workbook to get observation data from the API. it runs a version of Python that can be run via JupyterLite directly in your browser, which can save you the work of setting up a special environment just to run scripts.

When running from JupyterLite, all your requests happen from your machine, and your changes and results get saved in your browser’s storage, which can then be downloaded to your machine, if you like.

in its simplest usage that includes user authorization, you would:

  1. open the workbook via JupyterLite in your browser.
  2. in the line that begins jwt = , set this equal to your JWT value (ex. jwt = 'api.token.value').
  3. in the line that begins req_params_string = , set the value equal to whatever parameters you would like to apply in your request. if you’re trying to specify a list of obs IDs, it would be something like req_params_string = 'id=10000,10001,10002'
  4. in the line that begins with obs = await, you would set get_all_pages=True if you expect to get more than 30 observations, and use_authorization=True
  5. in the section that begins with parse_fields = [, define which fields you want to get in your results. i’ve pre-defined a few fields that i thought might be generally useful. so you might be able to just uncomment or comment the fields you’re interested in or not interested in. in cases where a field might have multiple values per observation, i’ve defined most of these to flatten the values (string multiple values together into one string) or else provide some aggregate result such as a count. you’re welcome to adapt these definitions in case you need something that hasn’t been pre-defined.
  6. then run each cell from the top, or run all. in your case, you want the results in a CSV. so you don’t have to run anything past the “Write Data to CSV” section. the resulting CSV file will be stored in browser storage, and you’ll be able to see it in the JupyterLite file navigation tree on the left side of the page. you can click on it to view it in the JupyterLite interface, or you can right-click and download the file to your machine so that you can work with it in your own applications on your machine.

any changes to the workbook you make should get automatically saved along the way in browser storage as you run things or as you click the save button. if you want to preserve the original workbook so that you can start over easily, it’s cleanest if, before step 1 above, you find the file in the file navigation tree and save a new version, and then work in that new version. your work will be preserved across browsing sessions as long as you don’t clear your storage. or if you do all your work in a private browsing session, everything will always reset back whenever you start a new private browsing session. just remember that you’ll want to download any workbooks to your machine if you want to preserve them long-term.

2 Likes

Thanks for all the information! I’ll give some of those options a try. Not opposed to using Python instead of R.

This is fantastic, and doing most of what I need out of the box.
There are a few fields that I would like to add: place_state_name, place_county_name, and the observation field host_plant_id. I tried adding a line
{‘ref’: ‘place_state_name’},
but it just made a blank field, so clearly I need to formulate that differently or collapse it down from a list or something. Haven’t figured out how to call specific observation fields. I poked around in some iNat API references, but couldn’t figure it out.

just uncomment the parse_fields line #{'ref': 'place_ids'}. this will get automatically replaced by the add_std_places function. you can modfiy the admin_level_xref dictionary in that function to return just the standard place types you care about.

look for the line that begins with #{'label': 'obs_field_13_eating'. you can adapt that parse_field line (or a copy of it) to suit your own need. if you’re interested in https://www.inaturalist.org/observation_fields/6586, then you would just set [{'ref': 'field_id', 'value': 6586}] instead of [{'ref': 'field_id', 'value': 13}] and then set an appropriate label for that parse_field.

Perfect, thank you! That did what I needed, and now I have a reproducible workflow.

1 Like

just remember to download your version of the workbook and save that to your machine (rather than just your browser storage) so that it won’t be lost in case you ever decide to clear your browser history.

you can also port that workbook file over to any other platform that runs Jupyter workbooks (and runs a recent enough version of Python that handles async stuff), or share it with others. if you have a GitHub account, you can also save a version in your own GitHub repo, and you can reference the version in your repo from most platforms.

Yeah, I downloaded a copy and should be able to run as a Notebook in ArcGIS Pro.
Last question for now: easy way to add the “iconic_taxon_name” field? I could figure out how to calculate it based on kingdom/order, but I assume it already exists in the system somewhere and can be pulled down with the observation. I poked around in some of the commented-out lines, but didn’t see it.

add a line to the ‘parse_fields’ list:
{'label': 'iconic_taxon_name', 'ref': 'taxon.iconic_taxon_name'},

this is based on what the structure of the /v1/observations results looks like, where iconic_taxon_name is an attribute of the taxon object in each observation object:

That did it. But now I’m getting an error when running the execution block - perhaps I’m violating some limit on API calls, and need to wait until tomorrow? Was trying to get details for 104 observations.


etc.

are you running this in ArcGIS Pro or in JupyterLite? the only error i see in your screenshot is OSError. so maybe if you’re running this in ArcGIS Pro, it’s not able to handle this version of Python? (the notebook requires a relatively recent version of Python >= 3.11 because it uses async functions.) i’m not sure.

At this point I’m still running it in a browser. I didn’t past the whole error message - here it is (slightly different instance; same error):

i’m still not sure. you can try closing the browser and trying again. if that doesn’t resolve the issue, and the only thing that was different between the last successful run and the new run is the line of code, then i would assume the code is just bad at that line. make sure it’s indented correctly, etc.

or if you did close your browser since the last run, make sure you run it from the top. the first section contains the modules that need to be loaded in each session for the rest it to work.

Seems to have been an error in my observation ID request string - I got it working. Thanks again for all your help!

1 Like

you’re welcome. glad to help.

based on https://support.esri.com/en-us/knowledge-base/faq-what-version-of-python-is-used-in-arcgis-000013224, it looks like you would need ArcGIS Pro Enterprise 11.3+ to run Python 3.11+.

i also had been waiting for Google Colab to support 3.11+, and it looks like they finally updated their base Python version in November 2024 (7 months behind schedule). to run the notebook in Colab instead of my instance of JupyterLite, you can go to https://colab.research.google.com/github/jumear/stirpy/blob/main/content/iNat_APIv1_get_observations.ipynb. it’ll run a little differently in Colab because JupyterLite runs the pyiodide kernel and makes requests via pyfetch, which is async by default, but i used TaskGroups in asyncio + urllib3 to mimic the same sort of behavior outside of pyiodide.