Downloading a CSV of all observations of a species with python

I am a master student learning python studying an invasive bee ( Anthidium florentinum) and I would like to download all occurence of one species in America to then use the geographical coordonates of all occurence to draw a map.

I started trying to understand how Pyinaturalist works but I can’t find a simple way to simply download on python the .ZIP containing the CSV of all occurence of Anthidium florentinum like I can do with this link:
https://www.inaturalist.org/observations/export?verifiable=true&page=1&spam=false&taxon_id=459050&place_id=97394&user_id=&project_id=&swlng=&swlat=&nelng=&nelat=&lat=&lng=

This is my “Search query”:
quality_grade=any&identifications=any&place_id=97394&taxon_id=459050&verifiable=true

And this is the CSV I want the python script to download:

Do you know which function I could use on Pynaturalist or any other librairie to download this .CSV on my hard-drive?
OR just to use the query " quality_grade=any&identifications=any&place_id=97394&taxon_id=459050&verifiable=true " to produce an array containing all the information?

Thanks :)

1 Like

Why not just download all the data you need in advance manually and then import the CSV?

I’m definitely not the best person to answer, but in case someone more knowledgeable doesn’t show up I’ll take a stab at it. Personally, I’ve only looked into using the API as a means of getting data out of iNaturalist for data that can’t be accessed through the web interface at observations/export.

I notice that observations/export is not identified as an API endpoint here or here, nor in the pyinaturalist documentation here. Nor does my naïve attempt to poke the API in a web browser yield anything for observations/export. So the short answer is probably: you can’t do that.

You could presumably end up at the result you want–a csv with that formatting and information–by some data wrangling with the output from pyinaturalist’s get_observations(). I don’t have any useful advice here except that if you were to go down this route, I think you will probably wish you hadn’t.

1 Like

I don’t know how to help with that package (sorry!) but just a related point:
If you’re planning to use iNat data for a publication of some sort, the preferred route of access is through GBIF which will give you a citeable doi for the dataset.

Sometimes this isn’t an option if the dataset that you want contains records that aren’t on GBIF (like casual observations, which it looks like you may be wanting), but thought it worth a mention.

4 Likes

Because it needs to me automatized and updated every time there’s a new observation.
I’m building a script which is supposed to work with every invasive species.
It’s for a geomatic automatization class.

Thank you very much, I will look into this!
I was going for only research grade so you may have gave me a very useful tip!

Thanks a lot for your input.
You’re confirming that there’s no simple function to extract a .CSV I missed out

I normally use R’s API so I’m not very familiar with the pynaturalist, but it looks like you can use the get_observations function with your search criteria [https://github.com/niconoe/pyinaturalist/blob/main/README.md#features]. It looks like this should work and the nice thing about taking the API approach is that your work will be more reproduceable.

if you’re simply visualizing things in a map, it seems to me like it would be really inefficient to download all observations first. it would be more efficient to just get the observation map tiles, plus associated UTF grids, if needed. (this is especially true if you’re trying to map millions of observations.)

then you can quickly map whatever you like. for example: https://jumear.github.io/stirfry/iNat_map.html?taxon_id=459050&verifiable=true&place_id=97394

it’s also possible to map using just the UTF grids, though that’s a little harder to code. example: https://jumear.github.io/stirfry/iNat_UTFgrid_based_density_map_for_Leaflet.html?defaultstyle=gradient&place_id=97394&taxon_id=459050&scale_factor=5

generally, if you don’t need observation-level detail, it’s more efficient to get aggregated data instead.

i also don’t understand why you necessarily need CSV output if you decide you must get observation details. although the old API does allow you to return observations in CSV format (ex. https://www.inaturalist.org/observations.csv?place_id=97394&taxon_id=459050), why is that necessary, as opposed to getting data in JSON?

3 Likes

For some reason, it hadn’t occurred to me to look at the csv output from the old API. That’s handy.

For what it’s worth, csv is desirable to me for ease of use. A table is much more plug-and-play with any kind of downstream analysis than is a more complicated data structure.

With regard to mapping, if one merely wanted to look at a map the path of least resistance would be to use the web interface. For my own usage, the reason to download data and visualize it locally is to incorporate it with other GIS data–surface ownership, other occurrence data, elevation, soil or vegetation mapping, etc.

it works within the limits of the API – meaning that each CSV file returned will be limited to however many records can be returned per page (above that limit, you would have to combine pages/results), and the total number of records will be limited by the system’s general record limit (10,000 records).

this kind of visualization can be done without downloading all observations. just overlay the observation map tiles on top of whatever other layers you want to use. for example, here are mountain goat observations in the US over a topo map: https://jumear.github.io/stirfry/iNat_map.html?view=elevation&taxon_id=42414&place_id=1. if you click on the markers, you will get the elevation, according to USGS. see https://forum.inaturalist.org/t/in-pursuit-of-mappiness-part-1/21864 for other examples.

I can’t get the map elements already created since it’s for a python programing class where I need to build every function myself.

I need all the observation details I can get and load it into a python / geopanda array.
I tried using the get_observations() function of Pyinaturalist but it doesnt return the Latitude and Longitude coordonates.

Mmmm, that’s weird: I useally get like 70+ research grade observations for anthidium florentinum in North America and with the old API csv I only get 30…

You are right about the JSON, it’s just that it seems simpler to me but I guess it may be a source of errors. I just never worked with Jsons a lot as opposed to csv but I should learn.

I found this blog post which works with python and the Json returned from the API but it’s not very clear because it doesnt show the output: Downloading biodiversity records from iNaturalist with Python | by biodiversityDS. | Medium .
Thanks a lot I will check that out!

True, I noticed that!..

In this blog post, I wonder how they managed to find out which term does what in the URL of api.inaturalist.org?
:
Downloading biodiversity records from iNaturalist with Python | by biodiversityDS. | Medium

Is there documentation on that or a way to find out how to generate this URL?

Is there a way to plug THIS into the api.inaturalist.org/…/ url where I extract the JSON from?
Is this the same thing?

Given my likely use cases and existing GIS skills, I guess this just seems like an unfamiliar and convoluted way of doing something that I can do more easily by other means. And one that I would expect to be less flexible for other uses of the data I might have downstream…

Of course, many of the variables here are going to be person-specific. To you, presumably using the API is “the easy way” and downloading a csv is “the hard way”, for me it’s the opposite.

https://api.inaturalist.org/v1/docs/#!/Observations/get_observations

See also here for more details on some of the parameters.

Many, but not all of the API parameters are implemented on the Export page. So if you build a query using the API docs, you may get something different when you plug it into the Export page. On the other hand, If you copy the parameters from the box on the Export page to an API call, you should get the same results.

Compare https://api.inaturalist.org/v1/observations?quality_grade=any&identifications=any&place_id=97394&taxon_id=459050&verifiable=true to your Export page screenshot.

1 Like

You’re hitting the page limit. For small numbers of pages, the path of least resistance might be adding “&page=2” and so on. From an earlier post by pisum:

i don’t see this. this endpoint is limited to a maximum of 200 records per page, and i don’t see the CSV response being defaulted to a limit of 30 records. i’m getting 100+ records. if it’s not working for you, you could specify per_page=200 and see what happens.

if you need to build things from scratch, you probably shouldn’t use PyiNaturalist. you should probably hit the API directly. you also probably shouldn’t use a mapping package either. you should probably build one yourself, right?

i still don’t like the idea of practicing how to do things the inefficient way for a class, but i guess it’s up to you to learn what you want to learn.

I think 30 records per page is the default in other contexts. So I’m assuming the same here, though of course I could be wrong.

My Jupyter Notebook offers an example that uses the API to output a CSV including lat/lon:
https://forum.inaturalist.org/t/tool-for-exporting-inaturalist-data-to-irecord-or-elsewhere/19160

Also, in the PyiNaturalist examples now there is a more detailed tutorial if you didn´t already see it:
https://github.com/niconoe/pyinaturalist/blob/main/examples/Tutorial_1_Observations.ipynb