The title captures the question. Basically, for Who Eats Whom we are currently pinging the iNaturalist API every time we want to do a search. This is probably adding unnecessary load to the iNat servers and slows down searches on Who Eats Whom. So we were wondering whether there was documentation about whether/under what conditions we could just store a small, periodically refreshed (maybe once a day or week or something) server-side cache of observation data instead, on a server owned by my university. This would include both the observation data itself (taxon info, spatiotemporal info) and photos, since we use both on the site. And it would only be for the observations submitted to our Who Eats Whom project (currently ~15k observations). Obviously, if any observations are “all rights reserved” etc. we wouldn’t port that data over (similar to what GBIF does, presumably?). But just trying to touch base about whether this is ok.
Are you wanting to include photos, or just the media-less data? I can imagine that some photos shouldn’t be stored on and distributed from your own server, but I’d be very surprised if non-media observation data aren’t fully open source.
No specific technical knowledge about this, but I would assume that anything dataset that would be hosted on GBIF would also be allowed on your owner server (provided it meets the license requirements, which seems likely). GBIF datasets can be redistributed following the license terms.
I think your issue might be that some observations in your project will not have licenses compatible with GBIF/sharing (all rights reserved on either photos or observation, though whether all rights reserved on observation data actually means anything legally enforceable is up for debate). If you can live with not including those observations, I think you’re probably fine.
Thanks @cthawley . That’s along the lines of my thinking too. It wouldn’t be a big deal to exclude observations where the media or observation are “all rights reserved” and then share with contributors that their observations won’t appear on the site unless they are under a CC license. Maybe we will look more closely at exactly what data GBIF uses/doesn’t use since, if they are storing this data on their own server (I assume so?) then we should be allowed to as well.
To be “safe”, I guess the only iNat material you should host on your local server should come with licences highlighted with a green “Good choice for sharing with scientists” label on the link above, both for observations and pictures?
Although it seems that Gbif will publish these observations (with a green label), even if the photos comes with a higher limited “protection”…
Why is this necessary? All of the licenced photos are available via Amazon’s Open Data Sponsorship Program, so why not just use the relevant urls (e.g. https://inaturalist-open-data.s3.amazonaws.com/photos/570497374/original.jpg)? It seems unlikely that your university’s server can do this any more efficiently than Amazon’s
If you haven’t already done so, I would recommend reading this page: API Recommended Practices. If you follow all the advice given there, I highly doubt your site will ever get anywhere near approaching the stated limits. iNaturalist is specifically designed for sharing scientific data via its public APIs, and you appear to be using them exactly as intended (including respecting the relevant conditions of the licences). Having said that, I think your site could display its indebtedness to iNaturalist a little more prominently. The site’s about page makes this reasonably clear, but I would expect there to be a “Powered by iNaturalist” logo or some such on the home page, rather than just that little link.
GBIF does not directly store iNaturalist observations. GBIF gets Darwin Core Archives of iNaturalist observations that meet a certain criteria. iNaturalist observations have a lot of fields, and only a subset of fields are exported to Darwin Core Archives. Darwin Core Archives also use different field names than iNaturalist observations.
Another issue with storing iNat observations on your own server is if the observation is updated or deleted in iNat, will the observation be changed in your server?