I am a master student learning python studying an invasive bee ( Anthidium florentinum) and I would like to download all occurence of one species in America to then use the geographical coordonates of all occurence to draw a map.
Do you know which function I could use on Pynaturalist or any other librairie to download this .CSV on my hard-drive?
OR just to use the query " quality_grade=any&identifications=any&place_id=97394&taxon_id=459050&verifiable=true " to produce an array containing all the information?
I’m definitely not the best person to answer, but in case someone more knowledgeable doesn’t show up I’ll take a stab at it. Personally, I’ve only looked into using the API as a means of getting data out of iNaturalist for data that can’t be accessed through the web interface at observations/export.
I notice that observations/export is not identified as an API endpoint here or here, nor in the pyinaturalist documentation here. Nor does my naïve attempt to poke the API in a web browser yield anything for observations/export. So the short answer is probably: you can’t do that.
You could presumably end up at the result you want–a csv with that formatting and information–by some data wrangling with the output from pyinaturalist’s get_observations(). I don’t have any useful advice here except that if you were to go down this route, I think you will probably wish you hadn’t.
I don’t know how to help with that package (sorry!) but just a related point:
If you’re planning to use iNat data for a publication of some sort, the preferred route of access is through GBIF which will give you a citeable doi for the dataset.
Sometimes this isn’t an option if the dataset that you want contains records that aren’t on GBIF (like casual observations, which it looks like you may be wanting), but thought it worth a mention.
Because it needs to me automatized and updated every time there’s a new observation.
I’m building a script which is supposed to work with every invasive species.
It’s for a geomatic automatization class.
I normally use R’s API so I’m not very familiar with the pynaturalist, but it looks like you can use the get_observations function with your search criteria [https://github.com/niconoe/pyinaturalist/blob/main/README.md#features]. It looks like this should work and the nice thing about taking the API approach is that your work will be more reproduceable.
if you’re simply visualizing things in a map, it seems to me like it would be really inefficient to download all observations first. it would be more efficient to just get the observation map tiles, plus associated UTF grids, if needed. (this is especially true if you’re trying to map millions of observations.)
generally, if you don’t need observation-level detail, it’s more efficient to get aggregated data instead.
…
i also don’t understand why you necessarily need CSV output if you decide you must get observation details. although the old API does allow you to return observations in CSV format (ex. https://www.inaturalist.org/observations.csv?place_id=97394&taxon_id=459050), why is that necessary, as opposed to getting data in JSON?
For some reason, it hadn’t occurred to me to look at the csv output from the old API. That’s handy.
For what it’s worth, csv is desirable to me for ease of use. A table is much more plug-and-play with any kind of downstream analysis than is a more complicated data structure.
With regard to mapping, if one merely wanted to look at a map the path of least resistance would be to use the web interface. For my own usage, the reason to download data and visualize it locally is to incorporate it with other GIS data–surface ownership, other occurrence data, elevation, soil or vegetation mapping, etc.
it works within the limits of the API – meaning that each CSV file returned will be limited to however many records can be returned per page (above that limit, you would have to combine pages/results), and the total number of records will be limited by the system’s general record limit (10,000 records).
I can’t get the map elements already created since it’s for a python programing class where I need to build every function myself.
I need all the observation details I can get and load it into a python / geopanda array.
I tried using the get_observations() function of Pyinaturalist but it doesnt return the Latitude and Longitude coordonates.
Mmmm, that’s weird: I useally get like 70+ research grade observations for anthidium florentinum in North America and with the old API csv I only get 30…
You are right about the JSON, it’s just that it seems simpler to me but I guess it may be a source of errors. I just never worked with Jsons a lot as opposed to csv but I should learn.
Given my likely use cases and existing GIS skills, I guess this just seems like an unfamiliar and convoluted way of doing something that I can do more easily by other means. And one that I would expect to be less flexible for other uses of the data I might have downstream…
Of course, many of the variables here are going to be person-specific. To you, presumably using the API is “the easy way” and downloading a csv is “the hard way”, for me it’s the opposite.
See also here for more details on some of the parameters.
Many, but not all of the API parameters are implemented on the Export page. So if you build a query using the API docs, you may get something different when you plug it into the Export page. On the other hand, If you copy the parameters from the box on the Export page to an API call, you should get the same results.
You’re hitting the page limit. For small numbers of pages, the path of least resistance might be adding “&page=2” and so on. From an earlier post by pisum:
i don’t see this. this endpoint is limited to a maximum of 200 records per page, and i don’t see the CSV response being defaulted to a limit of 30 records. i’m getting 100+ records. if it’s not working for you, you could specify per_page=200 and see what happens.
if you need to build things from scratch, you probably shouldn’t use PyiNaturalist. you should probably hit the API directly. you also probably shouldn’t use a mapping package either. you should probably build one yourself, right?
i still don’t like the idea of practicing how to do things the inefficient way for a class, but i guess it’s up to you to learn what you want to learn.