Image files stored on media server don't seem to be getting deleted when observations and photo records are getting deleted

Image files stored on media server don’t seem to be getting deleted when observations and photo records are getting deleted.

For example, here’s a new test observation I just created (https://www.inaturalist.org/observations/138825559):

Here’s the associated photo record (https://www.inaturalist.org/photos/237340171):

And here’s the actual “medium” version of the file on the media server (https://inaturalist-open-data.s3.amazonaws.com/photos/237340171/medium.png):

After taking these screenshots above, I’m deleting the observation. You’ll notice that if you click on the links above, the observation and photo record links will take you to a “no longer exists” page like so:

… but the link to the actual file still works. This means the file has not been deleted from the media server.

I’m also fairly sure there isn’t some process going through in the background and cleaning up these files which are no longer associated with photo records because I kept a record of a link to a similar file that I tried to delete back in January 2022 (https://inaturalist-open-data.s3.amazonaws.com/photos/176798241/original.png), and I can still see the image when I go to the link. It should correspond to this photo record, which no longer exists (because it was deleted long ago): https://www.inaturalist.org/photos/176798241.

Just for completeness, besides deleting an observation, I also tried deleting the photo record directly in both the web and the Android app (and then syncing in the case of the Android app), and these also resulted in deleted photo records with remaining image files on the media server.

EDIT: It may also be worth noting that the reason I noticed this behavior was because I was keeping track of an observation that I had flagged back in the day as being inappropriate to make sure the photos actually got deleted from the system at some point.

When I flagged the photo as inappropriate, the observation was still viewable, and it took a curator flagging it as spam to actually remove the photo from view on the observation and photo page. But the underlying photo files were still on the static.inaturalist.org media server. I thought at the time that that was probably okay since the photo files would probably go away when the photo record was finally deleted by staff. But when the photo record was finally deleted, the photo file remained, and actually it’s still out there today.

This probably isn’t great, since it effectively means that:

  1. files aren’t getting deleted from the server even when people think they’re deleting them.
  2. those files no longer have a record of who loaded the file and how the file was licensed.
  3. those files are probably taking up some unknown extra space out there on the media server.
  4. a bad actor could exploit this behavior to have iNaturalist host whatever images they load, and these images could remain undetected by the community.

Another EDIT: somewhat related, it looks like if you start with a photo that is licensed and then switch to all rights reserved, that photo file will be copied to the “static” server from the AWS server, but the image is not deleted from the AWS server. i waited at least an hour to see if there would be some process that would come along and delete the AWS copy, but it’s still there. here’s my test case:

i did not try the opposite test case, but i would expect that soon after a file is copied from one server to the other, the file should be deleted from the first server.

4 Likes

Media files of free licenses are hosted by Amazon for free. Since they are licensed freely, I guess Amazon lawfully keeps them for their own use - which is the benefit they get from the deal. Non-free licenses are hosted by iNat and they probably get deleted.

1 Like

that’s the thing. it doesn’t look like they get deleted.

if you don’t believe me, try it, and tell me what you see when you try to delete a photo that is stored on the static.inaturalist.org server.

they’re licensed, but they could be any number of licenses. so it’s impossible to tell what kind of licensing should be applied. since it’s impossible to know how you can rightfully use the photo files, i would argue that they shouldn’t be out there.

2 Likes

You’ve given an example of a freely licensed photo hosted by Amazon, not by iNat

1 Like

so again, not everything on the AWS server is freely licensed, as in CC0. my example was licensed for non-commercial use under CC_BY-NC. you can see that in the second screenshot. and like i said before, it’s not great that you can no longer figure out how you can rightfully use the image, and really, it shouldn’t be available for use anyway since it should have been deleted.

i didn’t provide a specific example for an image file that’s on the “static” server, but based on what i’ve seen, it’s behaving the same way. like i said before, if you don’t believe me, try it, and tell me what you see. i’d love to have others verify the behavior.

3 Likes

I would always put my photos under a free licence, because this allows use in Wikipedia and papers and so on, which can only contribute to conservation.

However, if you publish your photo under a free (or semi-free) licence and somebody uses it (for example, uses it in a book or a webpage), from my understanding you cannot withdraw this licence any more, so the picture may stay available under that licence this way.

i’m not sure what your point here is.

just for clarification, i’m not saying i wanted to change my license. i’m saying that had i not recorded the screenshots in this post, it would be impossible to tell how that image file was licensed, since the photo record is gone.

and either way, i should be able to truly delete the file. as far as i’m aware, there’s no language anywhere that says that any image file you load to iNaturalist will remain forever on its servers even after you delete the observation and photo records.

5 Likes

You are right, if Amazon acts as a third party here and decides to keep freely licenced photos even when they are deleted on Inaturalist (which might be legally possible, which was my point above), they are required to follow the original licence (i.e., provide licence details and your name). I don’t see those provided in your link, so this might be a bug.

Amazon is not deciding to keep photos. Amazon is just the host and will act on whatever instructions iNat provides (or take no action based on no instructions provided by iNat).

one more time, for clarity:

3 Likes

I’ve had a similar question/concern: IF, for instance, I load three images with an observation and later delete one of those (by unchecking it in the Edit Observation page), is that image actually deleted? Where does it reside?
All of my images are typically CC_BY-NC, but is that relevant to an image I choose to delete?

1 Like

this is actually a little different from what i’m talking about in this thread, but the last time i checked, if in the Observation Edit page in the website you uncheck a photo (instead of directly deleting that photo record), you will disassociate that photo from the observation.

this does not actually delete anything other than the association – at least not right away. for the moment, the photo record and the image files themselves remain exactly where they are. if the photo is not associated with any other observation, it will become orphaned (as described here: https://forum.inaturalist.org/t/find-your-own-orphaned-photos-uploaded-to-inaturalist/6610).

after a certain amount of time (i forget how long), the orphaned photo records will be eventually deleted. i haven’t checked to see whether the image files themselves will also be properly deleted at that time, too, when the photo record deletion happens via this process.

5 Likes

looks like this has been resolved by https://github.com/inaturalist/inaturalist/commit/fc62a4ff3d05235da7110b8918a08f87976cee15.

i had been tracking some old photos that should have been deleted from both https://static.inaturalist.org and https://inaturalist-open-data.s3.amazonaws.com, and i can no longer access these. so it looks like the files have actually been deleted or else the server will no longer grant access to these. either way, this addresses the underlying issue.

it looks like there will be a job that runs automatically in the future to delete these files periodically.

3 Likes

This topic was automatically closed after 17 hours. New replies are no longer allowed.