Hi,
I’d like to download records for a selected taxonomic group in the form of json file via an url https://api.inaturalist.org
I created url with selected parameters (https://api.inaturalist.org/v1/observations?taxon_id=49083&verifiable=true) which generates a json with correct statement on the number of observations (~120k) but contains only one page of results. Why is there only one page and can I get with this approach all results?
I might be fundamentally misunderstanding the interface so just pointing me towards a Topic or guideline where this is adressed would be much appreciated.
Or, get the research grade ones from GBIF as suggested at the top of the page.
Even if you wanted to use the API directly, that’s a rather large API request so you’d want to check first if you actually needed anything additional that is returned there and not the regular download.
you should read through all of these pages. they answer your question and also provides suggestions for other ways to get observation data.
after reading these pages, if you still think you need to get information from the API and still aren’t sure how to get what you need, please describe why you need to go through the API, what you’re trying to accomplish, and which language / tool you’re using to interact with the API.
Thanks for hints and all suggestions! I was not aware of pagination - the fact that the page 1 of results is not https://api.inaturalist.org/v1/observations?taxon_id=49083&verifiable=true?&page=1
I want all photos associated with an observation, so the web querry which returns only the first photo is not usable for me.
Research grade ones would limit the number of observations - perhaps not for this taxon but for some other taxa I’d like to work with and which generally much less frequently reach research grade.
Thanks for the hints.
I believe I need API since it seems to be the only option to get all photos from observations, not just the first one as I would get from the observation export web interface. My strategy is to get jsons, then extract photo urls, then download the photos. My preferred language is bash.
I tested curl commands to download jsons (as my approach to download jsons for more than 10k obs. would be to have a simple script that will download all the jsons in small batches using the “&order_by=id&order=asc and id_above” strategy describe in https://www.inaturalist.org/pages/api+recommended+practices) but it does not give me the expected output. E.g. I’d expect that the following will generate a json with 200 observations, sorted, started with 201st observation: curl -o page1_200obs.json https://api.inaturalist.org/v1/observations?taxon_id=118903&per_page=200&page=2&verifiable=true&order_by=id&order=asc
However, it seems that all url arguments except for the taxon ID are ignored: I get only 30 observations with this download, they are not sorted by id in ascending order, and the option selecting the 2nd result page (page=2) also has no effect. Is curling of the api url not a valid procedure?
Note that I plan to download ultimately photos for at most ~ 80k observations in many batches over few weeks - I plan to work with batch of photos for few thousands of observations first and only when test everything on them I’d ultimately download all photos for my selected taxon.
It seems that getting a single photo per observation via the observation export web is much easier and I will go that way if I will not figure out how to get all photos. Motivation for using the api urls was to fully utilize the available data for my tests but also learn something along the way.
API requests can be made with curl. Are you sure you don’t need quotes around the URL? You might also prefer to use API v2 which allows specifying return fields for much smaller responses if you only want to return some data. Depending on how the photos will be used, please consider fetching photo licenses and attribution as photo license can dictate how they can be used and must be attributed, e.g. curl -o page2_200obs.json "https://api.inaturalist.org/v2/observations?taxon_id=118903&per_page=200&page=2&verifiable=true&order_by=id&order=asc&fields=id,taxon.id,photos.url,photos.license_code,photos.attribution".
As you note in your response, if you are planning on fetching more than 10k observations, you won’t want to use the page parameter at all, rather you’ll want to use the id_above parameter, e.g. curl -o page2_200obs.json "https://api.inaturalist.org/v2/observations?taxon_id=118903&id_above=137890&order_by=id&order=asc&verifiable=true&fields=id,taxon.id,photos.url,photos.license_code,photos.attribution"
that’s not entirely true. the developer’s page noted above links to the AWS Open Date Set, which might be iNat’s preferred avenue for folks to get a ton of photos. it is limited to only licensed photos, but that might be appropriate for your use case.
(the same page also mentions the GBIF Darwin Core archive file, which should also give you all photos, although it’ll be a more limited set of observations than the AWS Open Data set.)
i’m not saying these are the best option in your case, but these are options.
if you are downloading unlicensed photos (photos not included in the AWS Open Data set) then the API is your only option as far as i know, but just keep in mind that you will need to observe the media download limits mentioned in developer’s page. depending on the size of the photos you plan to download (ex. small, medium, etc.) and how many in your set are unlicensed, you may need to be careful not to exceed the limits.
yup. i believe that bash would expect single quotes around the URL.
i think it’s probably easier to go this way, but using both paging andid_above could allow for the data to be extracted faster (if getting in parallel page requests with incrementally delayed starts), although depending on long it takes you to write the more complicated code to set that up, it may not be worth the effort.
@pisum@pleary
Thanks for the suggestions and correcting my bash syntax :)
I was able to get the jsons for my focal taxonomic group in a series of ~ 400 requests with 200 observations each. As a source of the id_above, I used a list of all observation ids exported from the web export and extracted every 200th observation id.
Thanks also for pointing out the licensing - I plan to teach a model to classify these observations based on several criteria and then apply it on all observations and if everything works I gather some notable insights into the biology or ecology of the organisms I will aim to publish them in an open access scientific journal. I’m aware of the debate concerning the legality - or morality - of using copyrighted material to teach computer vision or other models. This is the first time I’ll be performing this type of work. Do you have any recommendations - i.e. if I use for the above described purpose photos that include also “all rights reserved” photos, would you suggest avoiding using them? All-rights-reserved photos are ~25% of the photos available for the taxon.