Currently, when retrieving data on observations or identifications by at least some iNaturalist API interfaces, one runs into the restriction that their number must not exceed total 10,000 per run (50 pages x 200 observations or IDs).
Yet in many cases it is difficult to predict whether searches will fall within these limits, for instance if one retrieves data on a daily basis and faces daily variations above and below the limit. In addition, some searches will necessarily retrieve more data, yet it is a pity to abandon them just because of limited technical capacity or any other reason.
URL : https://api.inaturalist.org/v1/observations?per_page=200&created_d1=2021-05-01&created_d2=2021-05-01&hrank=species&place_id=97391&taxon_id=1
In this example (Animals of Europe) 17059 exceeds the 10000 limit.
Feature request : if a search for “50 pages x 200 hits” falls outside the 10,000 output limit, then make it possible to retrieve a 10,000 random sample from any larger output, as opposed to providing only the most recent 10,000 hit fraction.
The issue is illustrated by the graphic below : the top barplot shows an example of the current situation, where a search leading to, say, 50,000 observations will make it possible to retrieve only the 10,000 most recent fraction (black bars closest to point x=1 representing the time at which the search is performed), thereby leaving an unexplored blank area that may be very large.
The middle and bottom barplots illustrate two possible outputs from randomly sampling 10,000 observations or IDs out of a 50,000 output (blue bars). Statistical coverage is complete ; the samples are representative.
Is this right ? Is this how the API currently works ? Can it be modified as per this feature request ?