Is there a recommended way to backup all my iNat data?

I have been spending more than 2/3 of all my free time on iNaturalist and I have been quite worried about data loss. I am assuming that all my inat data can be lost if iNat server is physically broken for example, and then it would be a huge damage to my entire life. To prevent this, I have been looking for a way to backup or export all my inat data (especially observations) to somewhere else or into a hard disc regularly. I’d appreciate any advices on this. Thanks!

5 Likes

I’m not aware of an automated way of downloading all your photos, but you can get your observation data out as a spreadsheet here:

https://www.inaturalist.org/observations/export?user_id=Glycymeris

5 Likes

I trust the safety and longevity of what’s stored on iNaturalist infinitely more than what’s stored on my computer. But if you’re worried, you need to keep your georeferenced photos (and any corresponding notes) on your own computer and back up automatically to the cloud. iNat isn’t intended to be a place to store your data–it’s purpose is to share those data with others.

7 Likes

If you want to store your work elsewhere and want a much, much higher likelihood of it being around in 5, 10, years, then you would be far better off to use a cloud-based storage service rather than your own hard disk.

A decent cloud service’s hardware, software and maintenance processes are going to be far superior, more reliable, secure, scalable and available.

You will need to pay for such a service but the upside is that you get the service you pay for (if you choose the right one!) If your work is worth that much to you, then it’s worth paying for something decent.

4 Likes

I worry about this too… I keep pretty much all my photos on iNat, and while I trust that they won’t somehow vanish due to some online calamity, this situation has crossed my mind before.

If I kept the files on my computer or in a flash drive, it wouldn’t necessarily be more secure. Computers and their files can corrupt, flash drives can be lost and maybe broken. But if that data is kept safe, I think it’s a good backup.

It’s like scanning physical herbarium specimens onto the web: the online herbarium specimens aren’t the same (for example, you can’t take DNA/tissue samples from specimens), but if one or the other is lost the information from those specimens are still preserved. Note that for a herbarium, you can’t turn online records into actual plant specimens. With an online backup, if another website like iNaturalist were to appear (assuming the worst has occurred), that data can still be translated into something in more or less the same form as an observation on iNat.

Of course, I trust and hope that iNaturalist and the data it provides will stay secure in the future. But if you are really worried about it, I don’t think it’s a worthless endeavor to have a backup in a flash drive or some other form.

2 Likes

I’m old school and just do manual backups (of my photos which are all geo tagged). If this is just about the observation data other than the photos, I don’t have a backup of that at all I suppose.

For the photos I have a laptop (about 5 years old and 1TB of storage) that I copy my phone to regularly and an external harddrive (about 7 years old, 2 TB of storage) that I copy my harddrive to regularly. And my new phone has 256 GB. For now that’s plenty to keep all my pictures double backed up, I hate cloud since I don’t trust big companies like Google/Amazon/Apple/… at all and so feel much more comfortable that way, even if less convenient. Not sure about cost but wouldn’t really want to give $100/year to a cloud provider, I give it to inaturalist instead :)

I just checked how much data I have since I know I did run out of the 128 GB on my previous phone, but my entire lifetime’s worth of pictures is a bit over 200 GB and 52k pictures. Of those, 12k so far are from this year and 14k from 2021, then only 6k from 2020 and 3k from 2019, then less than 1k per year from before I started using inaturalist…

3 Likes

That’d be nice to have a personal backup not of just photos, but observation data + photos together, I don’t worry too much begause GBIF is still there and has most of your observations, but I don’t know what will happen if iNat will delete your account randomly for example and you’re not online and in a week or two it will be lost from GBIF too, and I doubt iNat will save that data for that long, so you have to be online very often to have control over any mistakes from the system.

2 Likes

FWIW we explicitly say that iNat is not a good place for backing photos. Photos are resized, they’re compressed, and their metadata are stripped when uploaded to iNat. To echo other responses, I would use both a separate physical drive plus a cloud service to back up your photos and data. Personally I use Backblaze but there are many services. You want at least two points of failure so that one won’t take you down.

The odds of data on iNat being totally wiped out are very, very small, as cloud data services have redundancies, but obviously no one can guarantee there will never have a catastrophic failure. Keep in mind that in this last incident it was ephemeral indices that were lost, not observations, photos (which are kept on entire different servers), comments, etc.

10 Likes

At some point I’m planning on making a script which takes an export of your iNat data and a folder full of the original images, associates them based on date, and stores the inat data (taxon, research grade status, geolocation if not present in the original image file, etc) in the image files themselves as EXIF data. That way it’d be possible to keep both together in a compact, easy to archive format, which would also enable crude offline searches based on taxon.

Backblaze gets another vote from me, they’re priced very well and will take care of much more than just your iNat data. I use backblaze in addition to multiple backup hard drives which I rotate a new one into every year or so, to avoid them all failing at the same time.

5 Likes

if by “original images”, you’re talking about the true source images saved on your machine, as opposed to the “original” images downloaded from iNat to your machine, i don’t think there’s an efficient way to do this. (although iNat does provide the original filename on photo pages, i don’t think there’s an easy way to get this information short of scraping, and even then, filenames change. you could do some sort of image matching, but that would involve downloading a bunch of images.)

there are already existing repos in GitHub that offer various kinds of code to download data and images from iNat, but i would be hesitant to use them except in extreme situations because:

you may be interested in: https://forum.inaturalist.org/t/naturtag-organize-your-photo-collection-with-inat-metadata/33959.

4 Likes

@barnabywalters Is there a particular language or tool(s) you had in mind for that? This is an interesting problem I’ve been thinking about for awhile, and am gradually working toward in naturtag. It’s not specifically meant as a backup tool, but that would definitely be doable. Offline search is also something I had in mind.

Right now it only lets you manually associate observations or taxa with local images, but eventually I want it to be fully automated. Matching observation and image timestamps might give you a reasonable guess, but I think structural image matching is going to be the most reliable method to do that. That does require (temporarily) downloading a bunch of images, but medium size should be sufficient (roughly 30-50KB per image). If your photos are CC-licensed and stored in the iNaturalist Open Dataset, that can be done fairly efficiently.

That may or may not be similar enough to what you have in mind, but I’d be happy to work with you on something like that, or at least chat about some ideas for it.

3 Likes

Oh good to know that something similar already exists! I will take a look at it. Good to see that it’s written in python, that would have been my first choice too.

My plan was to try out matching based on timestamp and/or geolocation, and see how effective it is. I anticipate it working well in most cases, and would use some sort of CV image comparison as a fallback if there were too many unmatchable images to sort manually.

5 Likes

This is what I do, basically. I add species names, tags and observation notes (where I make/made them), etc to the actual photos by adding all that to the EXIF of the photo. I also keep a spreadsheet with all the same data but if I lost that spreadsheet I could regenerate it either from GBIF or from the photos themselves because it’s in the EXIF/metadata. I’ve never actually tried to manually reconstruct any of my spreadsheets from the photo metadata, but I could do it the slow and tedious way (looking at the EXIF manually for each photo) or use something like exiftool to batch process the photos. I have all my files backed up on an internal drive in my desktop computer and on two external drives (one that lives at home and the other offsite)

Edit: Also, after thinking about it a little bit, if iNat ever went down and the exiftool approach didn’t work I suppose I could clone the iNaturalist source from GitHub and set up my own private iNat. Probably overkill but I can’t see why that wouldn’t work :)

Edit2: I’m also going to look into that nifty looking tool by @jcook That looks handy

2 Likes

Was this thread inspired by the meltdown a few days ago?

1 Like

you can clone the structure of the system, but you wouldn’t have the data, as far as i know. you’d have to build something to import the data into your own version of the system, and if you’re going to do that, it’d be easier probably to just set up your own simplified database / system rather than trying to set up an iNat clone.

i’m not trying to make anyone worry, since iNat’s processes for backing things up are probably already plenty, but i don’t think GBIF has copies of the images / sounds. i think those live only over on iNat.

some considerations:

  1. observation timestamps often have hour and minutes but lack seconds. the time zones can be unreliable. if multiple photos are attached to a given observation, those photos may not share the same underlying timestamps.
  2. photo page metadata may contain the original photo timestamps and coordinates, but as noted before, there’s not really a great interface to access that data.
  3. observation timestamps and locations can be manually set. sometimes the photo metadata may not accurately reflect the time the image was taken, such as if the photo is extracted manually from a video.

in the absence of a good way to get photo metadata from the system, i think this is probably true. but it may be worth thinking about what happens if a user has multiple similar versions of an image. you could probably use the information about the original image size (which is available via the API) to differentiate between differently-sized variants of an image (ex. an image sized for the web vs an image for print), but you might not be able to distinguish between very subtly different images such as a series of burst or bracketed shots.

2 Likes

Yeah, I agree. Before iNat I used QGIS, ArcGIS, PostgreSQL, sqlite and a hodgepodge of other stuff. Since my comment above I’ve been looking at the iNaturalist source code and I’d have to modify things to automatically create taxa (without them being in any kind of hierarchy, which is doable based on my brief perusal but seems “hacky”) to avoid a lot of work, or I could import from one of my old databases to at least have them arranged by family. Probably not worth it but I do like the iNat web interface

1 Like

Yes, they don’t import media, just observation data, and they link to the media that’s hosted on iNaturalist.

2 Likes

Back to the original post, does this thread so far mostly answer your question @glycymeri? To summarize:

  • As bad as the recent Azure outage was, the amount of data redundancy it has means that permanent loss of iNat’s observation data is just about the least likely thing that could go wrong.
  • To back up your own data, you can use the iNat export tool and upload that to a cloud storage provider along with the original photos on your computer.
  • If you wanted to automate that, it would take a little effort, but I or others here could help.
1 Like

At this point, I look at this very differently. I post thousands of photos on iNaturalist. I then delete most of the photos. I don’t need them and if I change my mind, a copy will be available on iNaturalist, poorer quality than the original but that’s the risk I take when I free up space by deleting photos. I keep on my computer and a back-up disc those photos that I actually want – ones I consider important or particularly good, or ones I just like.

I can hear you say, “But . . . but . . . don’t delete the photos – you might need them!” Oh, yeah? At the moment (but not for long) I have 75 photos of the common weed Hypochaeris radicata (Cat’s Ear, Western Dandelion). I cannot imagine a scenario in which I will need 75 photos of this plant. And if I do, well, I’ll get copies from iNaturalist.

2 Likes

I don’t have much knowledge about technical things so really appreciate comments posted here.
I cannot afford to keep paying for a cloud storage provider every year so probably what I should do at first is to get a hard disc and get data exported. I am looking at other methods suggested right now and looking forward to seeing more opinions- all are quite interesting. Thank you!

1 Like