Downloading a CSV of all observations of a species with python

I don’t know how to help with that package (sorry!) but just a related point:
If you’re planning to use iNat data for a publication of some sort, the preferred route of access is through GBIF which will give you a citeable doi for the dataset.

Sometimes this isn’t an option if the dataset that you want contains records that aren’t on GBIF (like casual observations, which it looks like you may be wanting), but thought it worth a mention.

4 Likes

Because it needs to me automatized and updated every time there’s a new observation.
I’m building a script which is supposed to work with every invasive species.
It’s for a geomatic automatization class.

Thank you very much, I will look into this!
I was going for only research grade so you may have gave me a very useful tip!

Thanks a lot for your input.
You’re confirming that there’s no simple function to extract a .CSV I missed out

I normally use R’s API so I’m not very familiar with the pynaturalist, but it looks like you can use the get_observations function with your search criteria [https://github.com/niconoe/pyinaturalist/blob/main/README.md#features]. It looks like this should work and the nice thing about taking the API approach is that your work will be more reproduceable.

if you’re simply visualizing things in a map, it seems to me like it would be really inefficient to download all observations first. it would be more efficient to just get the observation map tiles, plus associated UTF grids, if needed. (this is especially true if you’re trying to map millions of observations.)

then you can quickly map whatever you like. for example: https://jumear.github.io/stirfry/iNat_map.html?taxon_id=459050&verifiable=true&place_id=97394

it’s also possible to map using just the UTF grids, though that’s a little harder to code. example: https://jumear.github.io/stirfry/iNat_UTFgrid_based_density_map_for_Leaflet.html?defaultstyle=gradient&place_id=97394&taxon_id=459050&scale_factor=5

generally, if you don’t need observation-level detail, it’s more efficient to get aggregated data instead.

i also don’t understand why you necessarily need CSV output if you decide you must get observation details. although the old API does allow you to return observations in CSV format (ex. https://www.inaturalist.org/observations.csv?place_id=97394&taxon_id=459050), why is that necessary, as opposed to getting data in JSON?

3 Likes

For some reason, it hadn’t occurred to me to look at the csv output from the old API. That’s handy.

For what it’s worth, csv is desirable to me for ease of use. A table is much more plug-and-play with any kind of downstream analysis than is a more complicated data structure.

With regard to mapping, if one merely wanted to look at a map the path of least resistance would be to use the web interface. For my own usage, the reason to download data and visualize it locally is to incorporate it with other GIS data–surface ownership, other occurrence data, elevation, soil or vegetation mapping, etc.

it works within the limits of the API – meaning that each CSV file returned will be limited to however many records can be returned per page (above that limit, you would have to combine pages/results), and the total number of records will be limited by the system’s general record limit (10,000 records).

this kind of visualization can be done without downloading all observations. just overlay the observation map tiles on top of whatever other layers you want to use. for example, here are mountain goat observations in the US over a topo map: https://jumear.github.io/stirfry/iNat_map.html?view=elevation&taxon_id=42414&place_id=1. if you click on the markers, you will get the elevation, according to USGS. see https://forum.inaturalist.org/t/in-pursuit-of-mappiness-part-1/21864 for other examples.

I can’t get the map elements already created since it’s for a python programing class where I need to build every function myself.

I need all the observation details I can get and load it into a python / geopanda array.
I tried using the get_observations() function of Pyinaturalist but it doesnt return the Latitude and Longitude coordonates.

Mmmm, that’s weird: I useally get like 70+ research grade observations for anthidium florentinum in North America and with the old API csv I only get 30…

You are right about the JSON, it’s just that it seems simpler to me but I guess it may be a source of errors. I just never worked with Jsons a lot as opposed to csv but I should learn.

I found this blog post which works with python and the Json returned from the API but it’s not very clear because it doesnt show the output: Downloading biodiversity records from iNaturalist with Python | by biodiversityDS. | Medium .
Thanks a lot I will check that out!

True, I noticed that!..

In this blog post, I wonder how they managed to find out which term does what in the URL of api.inaturalist.org?
:
Downloading biodiversity records from iNaturalist with Python | by biodiversityDS. | Medium

Is there documentation on that or a way to find out how to generate this URL?

Is there a way to plug THIS into the api.inaturalist.org/…/ url where I extract the JSON from?
Is this the same thing?

Given my likely use cases and existing GIS skills, I guess this just seems like an unfamiliar and convoluted way of doing something that I can do more easily by other means. And one that I would expect to be less flexible for other uses of the data I might have downstream…

Of course, many of the variables here are going to be person-specific. To you, presumably using the API is “the easy way” and downloading a csv is “the hard way”, for me it’s the opposite.

https://api.inaturalist.org/v1/docs/#!/Observations/get_observations

See also here for more details on some of the parameters.

Many, but not all of the API parameters are implemented on the Export page. So if you build a query using the API docs, you may get something different when you plug it into the Export page. On the other hand, If you copy the parameters from the box on the Export page to an API call, you should get the same results.

Compare https://api.inaturalist.org/v1/observations?quality_grade=any&identifications=any&place_id=97394&taxon_id=459050&verifiable=true to your Export page screenshot.

1 Like

You’re hitting the page limit. For small numbers of pages, the path of least resistance might be adding “&page=2” and so on. From an earlier post by pisum:

i don’t see this. this endpoint is limited to a maximum of 200 records per page, and i don’t see the CSV response being defaulted to a limit of 30 records. i’m getting 100+ records. if it’s not working for you, you could specify per_page=200 and see what happens.

if you need to build things from scratch, you probably shouldn’t use PyiNaturalist. you should probably hit the API directly. you also probably shouldn’t use a mapping package either. you should probably build one yourself, right?

i still don’t like the idea of practicing how to do things the inefficient way for a class, but i guess it’s up to you to learn what you want to learn.

I think 30 records per page is the default in other contexts. So I’m assuming the same here, though of course I could be wrong.

My Jupyter Notebook offers an example that uses the API to output a CSV including lat/lon:
https://forum.inaturalist.org/t/tool-for-exporting-inaturalist-data-to-irecord-or-elsewhere/19160

Also, in the PyiNaturalist examples now there is a more detailed tutorial if you didn´t already see it:
https://github.com/niconoe/pyinaturalist/blob/main/examples/Tutorial_1_Observations.ipynb

Thank you very much for the links!
I am curently fiddleling with your code. I removed the “user-id” so that it takes all observations at the location.

I wonder why this first Date says None when there’s in fact a date on the observation?

Also, I would like to try modifying the code so that it adds a “user-id” column to the final dataframe.
Can you tell me quickly which part you would modify so that it would add that column? (Only the general steps). This is an introductory class to python so I think I will try understanding your code as a practice.

I added this part but…
image
… It returns this list or dictionnary?:


I will try to refer to the index for the name of the account inside of obs[‘user’]

EDIT: I managed to make it work: It’s a dictionnary inside of a dictionnary:


And it returned this:

I still have a question about this part:
Why 11 and 10? Are these the values reserved for “male” and “female” on Inaturalist?

1 Like

Yes

If this is a bug it´s not one I´ve noticed before.
Is it the same if you run with original values or only something triggered by your edits?

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.