Standalone Uploader

I have been playing around with pyinaturalist for a few weeks. This has resulted in a script which I can match the upload speed of the website. However I am not much of a programmer so I didn’t end up with anything which can really compete with the website uploader except in a few niche scenarios. One thing is really nice though. I no longer am worrying about a crash on the website destroying my progress.

That drove home the point that the weak point of the present upload system on the PC is the web browser. When the internet is slow, it takes quite a while to upload. When it crashes you can lose a lot of work and slow internet can really bog down the process. It would be really nice if it came as a stand-alone program.

I am thinking a combination of the existing iPhone uploader and the website uploader. Organize the photos, add the ID and annotations, then either upload a batch of observations or upload all of them. This seems like it could result in quicker times to organize photos and a much lower risk of a crash losing the work. Upload times would of course be the same, but it would be possible to just have it run in the background.

you should wrap up your work and publish it as a Windows app… or wait… maybe you’re using a Mac/Unix machine… or should it be something that will work on a Chromebook?

I sort of published it, in that I posted it to github so anyone who wants it can use it. Getting Python to a .exe file is a bit challenging though so I haven’t done that. For anyone to use it in its present state would require installing python, installing pyinaturalist, downloading the python scripts, then organize photos and run the scripts.

The only reason anyone would be crazy enough to do that is if they happen to be sitting on a few thousands or tens of thousands of geotagged photos of the same species that they want to upload. If you had ten thousand photos of pigeons, one photo per pigeon, than this script would allow you to simply put them all in a folder named ‘Rock Pigeon’ and run the script. A few hours later all would be uploaded. Should work on any operating system, but I have only tested it on Windows 10 so that might be wishful thinking.

But I digress. What I did isn’t all that useful. To be actually useful to a large audience would require a desktop program of a similar GUI to the website. That seems like it could be made to be quite a bit less fussy than the existing website.

2 Likes

I have the problem all the time, I tried to put off uploading until the internet lines gets upgraded hopefully in a few weeks. Even now when I try to upload it crashed by memory (if using MS edge) or a failed upload with the red! When using any browser, It takes about 5-6shots to get an observation across due to internet problems.

There used to be a work around: – Upload to Google Drive placing the photos in the Google Photo folder – Make sure Sync photos & videos from Google Drive’ is checked in the settings – Create Album in Google Photos – Sync iNat with Google Photos – Upload via Google Photos. Unfortunately, iNat still isn’t verified since Google’s new policy update.

The good thing about Google is they have their own pipelines running from Australia to the USA so the upload speed was much faster than anything else.

1 Like

If the image quality is decent I often post my photos on Flickr (pro account, so unlimited storage). The link from iNat to Flickr is a pretty good work-around. Of course, uploading to Flickr can also be a hassle if the internet is slow.

I wish the iNat interface for Flickr was more flexible, it’s sometimes tedious to get to an older image via the iNat window.

1 Like

ok. i agree with the general idea being discussed here, but i wonder if it could be accomplished by updating the existing loader so that it can recover from flaky connections and maybe even crashed browsers?

5 Likes

@pisum, that sounds way too sensible! :bulb:

@glmory you might want to vote for your own feature request (it doesn’t automatically do that :-)
(and thanks for the link to pyinaturalist, I didn’t know about that)

1 Like

Digression here; I found that waiting until absolutely all the metadata has loaded, on all photos, before starting to work on them has significantly reduced freezing/crashing the browser. The other day I uploaded a batch of 22 observations with no problems(!) Of course it can take 5-10 minutes for everything to upload, but it’s better than redoing all that work.

I’m using Firefox in Sierra (Mac).

3 Likes

I can confirm this - other than uploading the photos, the main bottleneck seems to be loading the metadata. I found that I had to limit my batches to about 25 observations to make the process workable. But even then, I still found it was taking far longer than it really should do, and every now and I would have to abort it due the browser locking up.

So in the end, I went down the same route as @glmory and wrote my own uploader. It also uses python, but not pyinaturalist, since I wanted more control over how the inat apis are used. Doing things this way dramatically reduces the total upload time, and I can just run it in the background while I do other things. It also allows me to add annotations at the same time and simplifies the process of grouping multiple photos.

Even if the issue with loading metadata was somehow fixed, I doubt whether I would want to use the web interface for uploading again. The only major thing missing from the current apis seems to be ID suggestions using inat’s computer vision - but I very rarely use that, so that’s not a big deal for me.

4 Likes

The metadata issue is so time-consuming that I have been trying to find a way of stripping all metadata except the date-time. There is a program for Linux - exiftools - that may be able to do it, but I haven’t tried it yet (in the queue of Things To Do).

If anyone has another way of doing this, I’d be thrilled to know about it!

3 hours for 13 pictures in 8 observations … where can i get your uploader bazwal?

I don’t know that it is actually real metadata (eg EXIF strings) that is slowing it down, if I upload large GIF images with no meta-data it seems to take a long while searching for non-existent meta-data.

That seems unusually excessive - what sort of internet connection do you have? It might be worth trying some of the online speed-testers to see what your upload speed is like (download speed is less relevant). And what is the average size (in megabytes) of the photos you are uploading? Do you crop them first?

My uploader is part of a bigger program at the moment so I’m not able to share it. But in any case, I don’t think it would help you much if you don’t have a reasonably good internet connection.

If you can send some samples to help@inaturalist.org (or make them available here via a Dropbox link or something similar - they just need to have their metadata intact) that would allow us to take a look and see if there’s anything about the photos themselves that are causing the slowdown, although RAM/internet speed are the likely culprits.

1 Like

The system has to resize images that are too large, and that will contribute greatly to the processing time of an upload. You could test it by doing a similar size batch but with the photos all pre-scaled to the iNat maximums. I know my camera is set to take photos that are nearly double in size (or 4x the qty of pixels!) to the iNat max. But even then, mine would not take more than 30 mins tops to upload 100 obs with approx 200 photos.

I think bottlenecks are going to be (in order of impact):

  • upload speed
  • activity on the site at time of upload
  • image size

I think the upload might be staggered, meaning that it doesn’t all get processed at once and the site come to a grinding halt for everyone else while it processes. This might explain why an external uploader might be getting faster results. If this is the case, do we really want large processing volume burdens being developed? It’s fine for occassional situations, but if the number of people using it grows, then you might be getting faster processing of your uploads, but causing unacceptable delays in other parts of the system such as in just loading observation views! I would encourage you to contact the developers to make sure your impact will not be detrimental.

1 Like

You are totally mistaken about all of this. If the developers were really worried about it, they wouldn’t have provided two fully documented public APIs and an interface for registering third-party applications. To quote from the API documentation I Iinked to earlier:

We will block any use of our API that violates our Terms or Privacy Policy without notice. Also note that we throttle API usage to a max of 100 requests per minute, though we ask that you try to keep it to 60 requests per minute or lower. If we notice usage that has serious impact on our performance we may institute blocks without notification. The API is intended to support application development, not data scraping.

It seems pretty clear to me that the developers know what they’re doing and have already thought all this through - so please don’t start spreading FUD about third-party uploaders.

The whole point of using these APIs is to consume fewer resources when uploading. A third-party uploader does all the pre-processing client-side and uses the minimum number of operations to upload the final data to the server. By contrast, the web interface does all the pre-processing server-side, and uses a whole bunch of costly interactive features that many people neither need nor want. So, for the same number of records, a well-behaved third-party uploader should usually put less load on the server whilst also being more efficient for the client/user.

2 Likes

If you have questions about how iNat works, please just ask. Regarding the website uploader, we do not resize images in the client. Well, actually we do, but only to make a thumbnail to upload for vision suggestions. The final cutdowns happen on the server, and yes, there are performance implications for the site as a whole. Improving server-side image processing is definitely on our radar, though frankly, it only creates serious problems during periods of extreme usage, like the CNC or the Penang Incident. A third party client that cuts everything down to 2048x2048 before upload might feel a bit faster due to reduced server-side processing time… or it might be faster b/c you’re uploading less data. We also only upload a max of 3 observations at a time in the website uploader, so if a 3rd party uploader creates more simultaneous upload requests it might get the whole job done faster, but could theoretically function as a Denial of Service attack if it swamped all our server processes.

I don’t know why the uploader is so slow for some people. I’m sure we all have our theories, but what would help us address the problem are concrete examples including excruciating amounts of contextual info: what browser you’re using, what your internet connection speed is like, specifically what files you’re trying to upload, etc. Some issues might be solved by a third party uploader, but not things like connection issues. While diagnoses might be helpful, reproducible examples will always be more helpful. There are a lot of variables.

Finally, we’re not going to make a standalone uploader, or at least not any time soon. We’ve toyed with the idea of a desktop app a lot, but ultimately, of the many potential uses for such an application (offline use, backups, etc.), faster image upload has never been on our radar. If you have the bandwidth to upload and the website’s not working, we should fix the website, not make more software that works better.

2 Likes

@bazwal someone is! And it’s not FUD… that is a competitive thing in advertising to chop down a competitiors product. I am not competing with you, but I am interested in identifying and preventing potential future problems! note:

@kueda sorry, I do tend to speculate on what is inside the black box a bit. I’ll try and phrase questions rather than make speculations

@kiwifergus FUD isn’t restricted to advertising, it has a broad range of applications. And I simply asked that you “don’t start spreading FUD”, which appeared likely to me since you didn’t seem to be aware of the terms of service for using the inat APIs. Hopefully my post made it clear why there is no need to have any concerns regarding third-party uploaders that abide by those terms.