Best Way to Download ~600 Images?

pisum · November 21, 2022, 4:22pm

i think the post referenced here is still more or less accurate, but i will add a couple of notes:

#1

since that post, there is another way to get a list of images, although it probably only makes sense to do this if you’re trying to get a lot of images: https://forum.inaturalist.org/t/getting-the-inaturalist-aws-open-data-metadata-files-and-working-with-them-in-a-database/22135.

#2

the Windows command + cURL approach that i referenced in that post to actually download the files works, but it may not scale well because the cURLs are executed serially, with a small delay in between each execution of cURL. using this approach is probably fine for downloading 600 images, but you can improve on the performance for larger sets to be downloaded by having each execution of cURL download multiple files (maybe, say, 100 files per cURL, since there’s a limit on the allowable length of the commands in Windows).

so, for example, instead of 3 cURLs for 3 files:

curl https://inaturalist-open-data.s3.amazonaws.com/photos/221611030/medium.jpg -o img001.jpg
curl https://inaturalist-open-data.s3.amazonaws.com/photos/221611153/medium.jpeg -o img002.jpg
curl https://inaturalist-open-data.s3.amazonaws.com/photos/221611164/medium.jpg -o img003.jpg

you can do 1 cURL for 3 files:

curl https://inaturalist-open-data.s3.amazonaws.com/photos/221611030/medium.jpg -o img001.jpg https://inaturalist-open-data.s3.amazonaws.com/photos/221611153/medium.jpeg -o img002.jpg https://inaturalist-open-data.s3.amazonaws.com/photos/221611164/medium.jpg -o img003.jpg

there are other ways to make this even more efficient, but they would require a little more thought and coding, and i won’t go into that here because how you would optimize would depend on the particular situation and needs.

…

it’s also worth reiterating the point from earlier post (explaining the Windows + cURL process) that there is a limit on how much stuff you should download from the iNaturalist. nowadays though, that limit applies to only the stuff living outside of the AWS Open Data set. so when downloading CC0 images hosted on https://inaturalist-open-data.s3.amazonaws.com, you don’t need to observe the limit, but if you download unlicensed photos from https://static.inaturalist.org (say, if you’re trying to download your own all rights reserved images), you still will need to observe those limits or risk being blocked by iNat.

Topic		Replies	Views
Preferred ways of batch downloading a subset of the iNaturalist data? General question	7	4103	January 27, 2021
Best way to download large datasets General	4	350	November 30, 2021
One time bulk download dataset General	11	1246	March 8, 2021
Help exporting photographs to make hard copy catalogs for personal use General	5	742	April 3, 2021
Download +10.000 pictures from iNat user General question	5	353	April 23, 2024

Best Way to Download ~600 Images?

#1

#2

Related topics