API get observations > only selected fields

As far as i can see get observations through the api include all fields of an observation. To reduce data transfer it would be useful to select only the fields that are needed.

are you talking about what’s returned by the observation search (ex. https://api.inaturalist.org/v1/observations?id=13311978,1741463) vs. the observation detail (ex. https://api.inaturalist.org/v1/observations/13311978,1741463)?

they do look very similar, but the detail seems to return additional information at least about annotations, community ID opt-out at the user level, and community taxon (relevant if user has rejected or opted out of the community ID).

i sort of agree that the observation search probably doesn’t need to return all that information, unless it’s doing that to support apps that have already been built to assume it returns all that information. if backward compatibility is the reason it returns so much, i think it would make more sense to have it return everything that the observational detail returns.

the challenge right now is that if i want to get, say, annotations for all observations in a project, i think i’d have do one of two things. either i have to do the observation search and then retrieve observation detail for each observation that has annotations, which seems sort of inefficient. or maybe i could use the old API to get a more limited observation search result and then loop through and get observation details for each observation using the new API, and i’d rather not use the old API.

if the observation search is going to return everything it’s already returning, maybe just add the extra little bit of information, and then i won’t have to pull in additional observation details at all?

i think it would also be helpful if when pull back multiple observations from the observation detail, they could be returned sorted. right now, they seem to be unsorted, and i have to apply my own sort on them.

i was looking at the observation search parameters, and there’s an option to return only the ID. so that partially addresses part of what i was talking about before, though i think it still would make more sense for the observation search to return everything the observation detail does if the only_id option is false.

I think the original post was about using the API to download data to be used in a personal maps/projects or so, or to complement an data-set of a region for example.

iNaturalist however limits the number of records one can download via the API, as each time all the data for a record is downloaded this amounts to a large volume of data, even for a relatively small number of records link 10.000 observations.

The idea would be to predefined the set of fields (maybe only a very few fields are needed for certain projects) that are returned by the API, thus reducing the data volume and permitting to download larger data-sets without putting too much load on the iNaturalist system.

Or is there another approach to automatically download a large amount of observations?
(the normal download tool can not be automated and the GBIF data-set is not daily updated and does not include all the observations that may be desired)

i think the way they limit the total number of records returned by the API is based mainly on number of requests, as far as i can tell. it seems to start refusing requests if you do more than 1 request per sec on average over the last 60 seconds, or something like that. even if all you pull back is a single ID with each request, you’ll still be limited by the number of requests.

since you can only get 200 observations per observation search request, if you pull back more than 60 pages (12,000 observations) all at once, you’ll hit that limit. you could wait 1s between each request or 1m after 60 pages, and that might help you avoid hitting the limit (though it’s unpredictable due to asynchronous requests). (but if you request page numbers > 100 you’ll have to do that via an authenticated request, according to the API notes.)

i’m not sure what your mapping use case is, but they do serve up marker+UTFGrid tiles for observations. so if you just want to see where the observations are on a map, you don’t need to pull down, say, 10,000 observations just to plot them on a map. if you’re doing more advanced number crunching for fancier plotting, then the map tiles might not be useful for that.

it looks like this is on the roadmap for the upcoming v2 API. from https://github.com/inaturalist/inaturalist/issues/2679:

Regarding reducing response sizes (which are, admittedly and to our own detriment, enormous), that’s something we’re working on in v2 of the API, which will require clients to specify what fields they want in the response.

i wonder if this feature request can be closed, since it looks like it’s been adopted, even if it’s not implemented yet?

We are adopting this and it is partially implemented, just not released (yet; working toward alpha). My experience so far is that it puts a lot of burden on the client to both understand what can be returned and make large requests when specifying what it wants. It’s probably a net win, but definitely a tradeoff.

3 Likes

any update?

you can see what’s been developed so far for the v2 API here: https://api.inaturalist.org/v2/docs/. the first section on that page, starting with “By default, all endpoints will return a very minimal response” explains how the new API will handle what’s being discussed in this thread.

Only marginal progress, though I guess it’s a bit further along than it was in June. It’s a very big task and I’m been dividing my limited feature development time between that and providing a way for curators of collection projects to view the hidden coordinates of project members who provide consent, which is also complicated but less sweeping than API v2. So far I’ve been focusing API v2 development on supporting obs detail functionality, and it’s mostly there.

3 Likes