Website unusable on Tor due to AWS

Platform: Website

App version number, if a mobile app issue: N/A

Browser, if a website issue: Tor Browser 15.0.13, 15.0.15 and surely other versions

URLs (aka web addresses) of any relevant observations or pages: https://inaturalist-open-data.s3.amazonaws.com/photos/646701568/medium.jpg

Screenshots of what you are seeing:

Description of problem:

Step 1: Use Tor.

Step 2: (Optional) Log in to iNat.

Step 3: Most images hosted on AWS fail to load.

This is an attempt to condense the information gathered in https://forum.inaturalist.org/t/most-images-fail-to-load-not-found/78710/. Discussion of whether this is off-topic or not have already been conducted there; I make arguments for why this is on-topic below.

In short, AWS, which is used by iNat as its image storage, blocks almost all Tor exit nodes. This didn’t use to be an issue – I used iNat over Tor many times since I joined in May 2023 and images not loading happened quite rarely. (The only issue has been iNat blocking some exits, but it was always only a minority so changing the exit few times quickly fixed it.) That changed sometime between April and May this year and hasn’t resolved since then. It affects not only loading of images of existing observations but also any images I try to upload when creating a new observation.

In practice, the user experience looks like this: I log into iNat. Most observations’ images don’t load. I spend several minutes trying to find an exit for which the images load (i.e. that isn’t blocked by AWS). I open 10 observations. When I open another, the images fail to load again. I spend five minutes switching exits again before I find the next working one.

As you can see, this makes the website unusable. I did a small test and tried loading the AWS URL given above with different exits (more precisely, different Tor circuits). Out of 100 tries, 7 managed to load, the rest gave HTTP error 404 Not Found. Such vast number of exits being blocked is not defensible by it being an anti-abuse measure. As mentioned, iNat blocks some exits to prevent abuse, but the numbers are much lower. You simply can’t claim that 93% of Tor’s thousands of exit nodes engage in abuse.

Every reload of an iNat page takes at least 7 seconds for me, sometimes more. 7% exits not being blocked means only every ~14th exit works. This means that finding a working exit takes at least 98 seconds. In practice it can be much worse because 1) sometimes you hit a streak of exits blocked by AWS – during the test, I encountered over 30 blocked in a row; 2) some of the exits not blocked by AWS happen to be blocked by iNat itself.

Privacy is a human right, yet the current state of web technologies makes it trivial to take that privacy away from people. Modern browser fingeprinting techniques combined with approximate location based on one’s IP address allow assigning unique IDs to people which persist across any number of sessions – until they change their browser, their device, and/or their location. Unless one’s browser actively tries to mitigate browser fingerprinting, changing location alone is never enough. Tor Browser is one of the very few browsers capable of partially mitigating browser fingerprinting, being able to reduce one’s uniqueness to something like 1 in 50,000 people, but that’s still too unique when paired with approximate location. Using Tor for location anonymization is the only way to actually mitigate these attacks on our privacy online. Tor is also valuable to protect one’s privacy from their ISP or government because HTTPS connections include the server’s hostname (see TLS SNI), allowing the ISP and the government to see which websites a person uses despite the use of HTTPS encryption – that is, unless Tor is used. Tor encapsulates the HTTP(S) data in additional layers of encryption, allowing only the exit node to see the hostname (destination). iNat’s users deserve to be able to use the platform without compromising their privacy.

Personally, digital privacy matters a lot to me and I routinely use technologies such as Tor and Tor Browser to protect it. I make exceptions only for things which I truly cannot live without, and despite how much I like it and appreciate it, iNat is not of those. I have had to stop using the site because of this issue and I have photos lying on my disk which have not made it as new observations due to this.

I don’t disagree with your arguments, but I can’t help but chuckle at “location anonymization” on a platform where you, you know. Post your exact location. Unencrypted. Publicly. As part of your observations :sweat_smile:

Well, revealing my location with such accuracy is something I didn’t want to do indeed. My observations used to have an “accuracy” of tens of km precisely for this reason. I have eventually decided to change this to make the data usable for research and to give back to the site. It has cost me considerable privacy, as you correctly note – which is exactly why I am not willing to give away any more of it. Especially when iNat seems to be considering using Cloudflare, a company which gatekeeps literally half of the Internet, with browser fingerprinting being an essential part of that. More care about users’ privacy would be welcome. The fact that Google Maps are being used for all location-related stuff is bad enough as it is.

AWS itself is a good topic. Ideally, iNat would move to a non-US provider at some point, given the hostility towards science right now. Not to go all political on the topic and I know, AWS is the absolute standard, but the way I see it, EU law is the only one right now, securing freedom of sciences. That kinda goes hand in hand with your point of problems with AWS and Google Maps in terms of privacy. But I don’t know how realistic it is for such a large database to move to a different hoster like Hetzner.

Also on the Google Maps point: I gotta say, whatever special version of Maps iNat has embedded into the Android app is so much better than any map app I’ve ever used, especially Open Streetmap. I use it when hiking all the time because it shows more detail than normal Google Maps and hiking apps when zooming in. No idea what they did to it but it is - unfortunately - really good.

Not a Tor user so can’t tell whether that setup is feasible, but… couldn’t you “hide” the Tor exit behind some web proxy, so that AWS does not detect it?

You → Tor browser → Tor exit → Proxy → AWS (happy)

I still see the same unusable BS map that is missing any detail in the Android App. In Europe, that is.

I do not expect that Tor users will post observations on iNaturalist…

I think there is a legitimate reason: a person might want to upload organisms (maybe even with exact locations), but the person herself and the place from where the upload happens must be kept secret (e.g. if a totalitarian country striving for war makes the use of precise maps illegal).

Your expectations are wrong. I have posted over 100 observations and all of them were uploaded via Tor.

Of course, the broader question of how many people use iNat over Tor can only be answered by someone who can read iNat’s access logs.

This is discussed in Tor docs:

There are many discussions on the Tor Mailing list and spread over many forums about combining Tor with a VPN, SSH and/or a proxy in different variations. X in this article stands for, “either a VPN, SSH or proxy”. […]

You → Tor → X

This is generally a really poor plan.

Some people do this to evade Tor bans in many places. (When Tor exit nodes are blacklisted by the remote server.
[…]
Normally Tor switches frequently its path through the network. When you choose a permanent destination X, you give away this advantage, which may have serious repercussions for your anonymity.

Besides the above, from my cursory reading it would be fairly complicated to set up, possibly requiring a VPS that would handle the “Tor exit → Proxy” part. It is not a functionality that the Tor executable has built in.

Also, proxies tend to be abused the same way as Tor and as a result can be subject to blocks just like Tor exits. However while reloading a Tor circuit (to get a different exit) takes two clicks, switching the proxy would – in the setup outlined above – likely include reconfiguring and restarting the program handling the “Tor exit → Proxy” part.

Now, if you meant “web proxy” as in something like proxysite.com… Those sites essentially act as middlemen between you and the site. As a result they always have complete access to all the data exchanged, in cleartext – there is effectively no encryption between you and the target site. Using such “proxy sites” for anything requiring login is thus obviously a very stupid idea (they will see any credentials entered).

TL;DR for true proxies: It’s fairly complex, infeasible for most Tor users, and likely won’t solve much anyway.
TL;DR for web proxy sites: That’s a horrible idea.

You are forgetting about the completely legitimate reason that someone might just not want their government to know they use iNat, plain and simple. Privacy is a legitimate end in itself.

Agreed! convoluted at best, error-prone in careless hands, and it takes a private proxy + a browser properly configured (to use the proxy for a few websites only). No wonder if Tor discourages/prevents downstream proxying.
Technically the iNat platform could also implement a form of masquerading on their side, a last-resort crutch intended for the few (?) users banned from AWS access.

Can you elaborate please? I am not sure if this is what you meant, but if you’re suggesting adding a mechanism for iNat to request the images from AWS on Tor users’ behalf – that’s a good idea and should actually work.

The website could be made to detect when a user’s IP is a Tor exit (up-to-date JSON of all exits and their IPs can be easily obtained). iNat already seems to have some mechanism of turning photo IDs into AWS URLs. When a Tor user is detected, the backend could fetch the AWS URL, encode the fetched image as a data: URI (basically just encode as base64 and assemble into a URI), and use the resulting URI as the image source instead of the AWS URL. In its more complex form, the frontend could asynchronously request this from the backend to allow lazy loading.

Pros I can think of:

  • does not require any extra server storage (iNat continues to benefit from AWS’s donated storage)
  • regular users continue to benefit from AWS’s low latency
  • Tor users are able to use the site

Cons I can think of:

  • increased server load and bandwidth proportional to the number of proxied images (1 outbound request per image proxied, small amount of CPU time spent encoding each image as base64)
  • some programming work would be required to implement it (complexity depends on whether lazy loading support is desired or not, but should be relatively easy in general)
  • potentially increased latency for Tor users (depends on server latency, whether proxying is done synchronously or asynchronously, and whether lazy loading is used or not)

By the way, this issue also affects people who visit iNat over Tor without having an account, making it impossible to view observations anonymously.