Image files stored on media server don’t seem to be getting deleted when observations and photo records are getting deleted.
For example, here’s a new test observation I just created (https://www.inaturalist.org/observations/138825559):
Here’s the associated photo record (https://www.inaturalist.org/photos/237340171):
And here’s the actual “medium” version of the file on the media server (https://inaturalist-open-data.s3.amazonaws.com/photos/237340171/medium.png):
After taking these screenshots above, I’m deleting the observation. You’ll notice that if you click on the links above, the observation and photo record links will take you to a “no longer exists” page like so:
… but the link to the actual file still works. This means the file has not been deleted from the media server.
I’m also fairly sure there isn’t some process going through in the background and cleaning up these files which are no longer associated with photo records because I kept a record of a link to a similar file that I tried to delete back in January 2022 (https://inaturalist-open-data.s3.amazonaws.com/photos/176798241/original.png), and I can still see the image when I go to the link. It should correspond to this photo record, which no longer exists (because it was deleted long ago): https://www.inaturalist.org/photos/176798241.
Just for completeness, besides deleting an observation, I also tried deleting the photo record directly in both the web and the Android app (and then syncing in the case of the Android app), and these also resulted in deleted photo records with remaining image files on the media server.
EDIT: It may also be worth noting that the reason I noticed this behavior was because I was keeping track of an observation that I had flagged back in the day as being inappropriate to make sure the photos actually got deleted from the system at some point.
When I flagged the photo as inappropriate, the observation was still viewable, and it took a curator flagging it as spam to actually remove the photo from view on the observation and photo page. But the underlying photo files were still on the static.inaturalist.org media server. I thought at the time that that was probably okay since the photo files would probably go away when the photo record was finally deleted by staff. But when the photo record was finally deleted, the photo file remained, and actually it’s still out there today.
This probably isn’t great, since it effectively means that:
- files aren’t getting deleted from the server even when people think they’re deleting them.
- those files no longer have a record of who loaded the file and how the file was licensed.
- those files are probably taking up some unknown extra space out there on the media server.
- a bad actor could exploit this behavior to have iNaturalist host whatever images they load, and these images could remain undetected by the community.
Another EDIT: somewhat related, it looks like if you start with a photo that is licensed and then switch to all rights reserved, that photo file will be copied to the “static” server from the AWS server, but the image is not deleted from the AWS server. i waited at least an hour to see if there would be some process that would come along and delete the AWS copy, but it’s still there. here’s my test case:
i did not try the opposite test case, but i would expect that soon after a file is copied from one server to the other, the file should be deleted from the first server.