How confident can we be that iNaturalist data will be preserved?

#1

This is a very general big picture question I have mainly for the iNaturalist staff and founders but I hope it’s of interest to other users here. I apologize if some version of this has been asked before.

How safe is the iNaturalist data in the long run (multi-decades)? I realize that the images are stored on some server somewhere (and backed up too). But running servers and storing electronic data takes money. And having money today is no guarantee that you’ll have money tomorrow. And as data storage formats change over time data can be lost and corrupted.

Moreover, iNaturalist is still new. From what I can tell the founders of iNaturalist are still running it. Because the dedicated users here have volunteered thousands and thousands of hours into submitting observations, providing feedback and identifying observations, they should have a real interest in knowing that their efforts will be preserved.

How confident can we be that data will not disappear if the project runs out of money?

What happens if someone new takes over and changes the direction of the project?

What’s the long-term plan to make sure that the iNaturalist data and images can be maintained and will endure in perpetuity?

I’m sure these are things the team has thought about but I’m curious.

Andy

11 Likes
#2

Hi Andy,

I’m not an expert on this topic, but the iNaturalist data is being synced with the Global Biodiversity Information Facility (GBIF), which is funded by the world’s governments (Check here: https://www.gbif.org/dataset/50c9509d-22c7-4a22-a47d-8c48425ef4a7 and here: https://www.gbif.org/what-is-gbif).
So even if iNaturalist won’t be available in the long run (which I doubt as they have been around for a decade), the data will be available over GBIF.

Hope I could help.

Cheers,
Charlotte

3 Likes
#3

I think this is a very legitimate question. For 7 years I was a volunteer with the North American Amphibian Monitoring Program (NAAMP). This was a massive database of frog-call monitoring surveys on standardized routes. There was a nationwide (US, I’m not sure if Canada and Mexico were involved) network of trained and certified volunteers, and an awesome website that mapped all the data so that different frog species could be shown by season and location. This of course had the capability to show where imperilled frog populations were winking out.

In 2016 we were told that the program was discontinued. One of the program people said that it had been de-funded, though they could have kept it going for $10K a year. Ten thousand dollars! They have carried through on their promise to keep the 21 years of data available (https://www.sciencebase.gov/catalog/item/583dc314e4b0d1899f9dea8d), but I don’t think the mapping function is there, just the raw data. And of course we’ve lost the ongoing data collection and the network of trained volunteers. What a tragedy.

Oh, what fly-by-night outfit ran the NAAMP? The US Geological Survey.

10 Likes
#4

Having it on GBIF is definitely reassuring. Do you know if GBIF backing up the images and the identification records? From what I can tell GBIF links to the iNaturalist photos, but maybe they are saving the photos as well.

Thanks!

Andy

1 Like
#5

I’m afraid I can’t give you an definite answer on that. From what I see, they do present the record with pictures, but I am not certain whether this is just linked or backed up.

Additionally - I forgot to mention this earlier - because GBIF is a scientific data infrastructure only research grade observations are imported to their database.

#6

I’m sorry to hear about that Janet. It’s all too common for these projects to get abandoned. I’m glad they have preserved the data for now at least.

1 Like
#7

I believe that GBIF is data only, photos are not archived by them. Importantly, only some observation data is sent to GBIF by iNat. That means that, currently, things like Annotations (eg. Insect Life Stage) and fields that may be able to map directly to GBIF (eg. “Count”) do not go to GBIF.

#8

i think GBIF gets only the Research Grade observations, and if there’s an obscured observation, GBIF shows the fake obscured coordinates.

#9

Yeah and the obscuring has been especially concerning as more things are being auto obscured for various reasons, in some cases without a clear reason.

#10

GBIF does not appear to get updated IDs (or it just takes very long to do so), and would only reflect a fraction of most people’s data. I suspect the iNat staff will refrain from commenting here for obvious reasons – no one knows if/when funding may run out, and what would happen in that eventuality.

I hope that in that unfortunate scenario, we’d at least be able to get a file that represents all our sightings, their photos, tags, comments, and so on. So that if the API ever resurfaces, we can just import our iNat data completely.

2 Likes
#11

Good points. It seems like there should be a contingency plan for the community as well as one for the staff. Having people preserve their own observations seems like a good way to preserve and then reassemble the data.

1 Like
#12

there’s nothing that prevents you from being able to download your data and photos now, if you want to.

#13

Just like I don’t rely on Facebook to store my family pics in perpetuity, I definitely don’t rely on iNaturalist for photo storage. They’re compressed versions anyway. People should be using other cloud and physical back-ups for photo storage.

FWIW, similar questions have been asked and responded to by staff in the past, e.g. here’s what Scott said in August 2017 (which, as such, may now be outdated):

@loarie: iNat’s assets are currently stored on Amazon Web Service, and the database is stored at Rackspace, we have backups at Datapipe.

iNat is owned by the California Academy of Sciences a museum that has been responsible for maintaining one of the world’s largest natural history collections for over 100 years. iNaturalist has been online since 2008 and we certainly expect to be around for another 10 years. The iNat program is a ‘core’ part of the Museum budget (ie not soft money) but I’d be lying if I said program-focus in the non-profit world isn’t volatile. However, we have 3 years of funding ‘in hand’ which is about as good as anyone can expect in the non-profit world. Furthermore, much of our past, current, and future work is to ensure the long term sustainability of iNat through fund-raising and partnerships.

And, as of June 2017 iNat is “jointly supported” by CAS and National Geographic Society. There is also a handy donate button at the bottom of each page on the website and apps settings pages. ;)

4 Likes
#14

Is there an iNat store yet? You know…shirts, mugs, stickers…must be worth something funding-wise. Or maybe that would somehow violate the funding guidelines that iNat already has (being a “non-profit” system).

4 Likes
#15

No, but they’re working on it.

3 Likes
#16

@andy71 – I had the same concern as you when I first started using iNat and hesitated for a few years in putting too much effort into posting records because I wanted to see if it would really last. I’m still not certain of its likelihood of being around long-term although I obviously hope it is. But I will continue to post records occasionally. Nothing lasts forever and we have yet to see how viable such internet database systems are over decades or longer.

1 Like
#17

Yeah, I download my own observations every year or two so I can put them on an arcgis map. That functions as a backup. I also have kept all my photos though more because it’s easy than because I am worried.

#18

I appreciate this question being asked. I have been thinking about this for the past few months in light of recent current events myspace, flickr, Google+, etc. I was going to post a similar question but found this in a quick search before posting.

Users are entrusting exorbitant amounts of precious and valuable data to this platform and I am very curious to know what kind of assurances are in place that this data is secure in the short- and long-term. Yes, it is prudent for users to backup their own data but I’d love to know what measures are being taken by the platform to ensure the security and longevity of their content.

  • Larry
2 Likes
#19

Thanks for all the great responses to the questions I asked above.

Nothing lasts forever and I don’t expect iNaturalist data or any museum collections last forever. However, museums and collections make a commitment to preserve and protect their physical collections in perpetuity, and I believe they write up plans in case their collections need to be transferred, moved etc.

As a researcher, I have used the Data Dryad repository to preserve data from research publications. I dug a little deeper into Dryad to get a sense of it’s preservation policies:

https://datadryad.org/pages/policies#preservation

Dryad follows the Open Archival Information System (OAIS) policy which is a standard model for preserving digital data. I don’t know if iNaturalist has a long-term preservation policy, but replicating what Dryad has committed to would essentially be the kind of commitment or assurance that I’d like to see from iNaturalist.

I think it would give us users peace of mind if the Cal Academy could explicitly commit to providing the same level of preservation for the iNaturalist photos, identifications and metadata as they do for their physical specimens. Perhaps they already have, I’m not sure.

I don’t want to presume to speak for other users here, so I’ll leave it as a question: what kind of institutional data preservation commitment would you like to see going forward? Is this a reasonable thing to ask for?

3 Likes
#20

the best kind of institutional commitment comes in the form of an endowment dedicated to whatever purpose you want to keep going. a $1MM fund with a 4% per year target withdrawal rate should last indefinitely if managed reasonably and throw off at least $40,000 per year most years.

1 Like