Platform(s), such as mobile, website, API, other: Website
URLs (aka web addresses) of any pages, if relevant:
Description of need:
The number of observations to export is maximised at 200K. It seems slow to get even this amount as well, and it is OK. However, for showing robust trends (e.g. number of observations over time), it is not very important to get every single observation. A random subsample of the whole bunch of observations would be enough. Nevertheless, it should definitely be random in order to exclude any spurious correlation through filtering with a latent meaningful variable.
Feature request details:
There should be a tick-box asking whether I need all the data or just a random subsample, and if I choose the second option, there should be a field for giving the number of observations I want.
If you really want to get a random sample of observations, it’s possible to do that via the API. you can simply specify the parameter order_by=random when retrieving data. If you’re interested in doing that, I can describe in more detail how to do that.
depending on what you’re trying to achieve, there are ways to get more observations, such as via GBIF, the AWS Open Dataset, etc…
Thanks. At this point, I don’t want to dig into API programming. I made a try once through R, but it was too slow for large databases (>500K observations). The ideal solution would be to have a random sample downloaded just by one click.
There are different types of random. For serious analysis that needs random observations, it would be better for the user to define what type of random to use instead of depending on a third party unknown random algorithm. If you need 200,000 random observations, you can download more than 200,000 observations, then use a random algorithm that fits your analysis to select a subset of 200,000 observations.
just based on what you’ve said so far, i would say that this is not true. as i noted earlier, you can get aggregated data instead. that would allow you to include all your observations in your analysis, and you could get that much faster than by downloading individual observations. you haven’t described any cases that would require / benefit you to get observation-level data for your analysis.