Missing intermediate ranks and default photo in the taxonomy archive file?

I’m currently diving into the richness of iNaturalist data and I didn’t find an explanation concerning my question on the documentation or on the forum.

I downloaded the iNaturalist Taxonomy DarwinCore Archive found at https://www.inaturalist.org/taxa/inaturalist-taxonomy.dwca.zip [warning: this link downloads the file] which contains the whole taxonomy along with vernacular names.

I can see that on the iNaturalist.org website, I can navigate through the taxonomy, but it also includes intermediate ranks (like subphylum, subclass, etc.) which is not included in taxa.csv file of the archive.
This information is also missing from iNaturalist data extracted from GBIF although there are empty columns for intermediate ranks.

As a regular Python user, I know that I can use the api to automatize the fetching of this information, as well as the default taxon photo used in the website, as it seems to be included in the Taxa API.

I was wondering why this information is left out of the file although it is present in the API?
Also, will my IP address be banned from the API if I fetch about 25,000 taxon (only taxon, not observations) using a Python script to retrieve those information?

Thanks in advance for your answers!

1 Like

As long as you stay within the recommended API limits, you should be fine. For this query, you can call /taxa with 30 taxon ids at a time, for a total of around 830 calls.

From the API documentation:

Please note that we throttle API usage to a max of 100 requests per minute, though we ask that you try to keep it to 60 requests per minute or lower, and to keep under 10,000 requests per day.

2 Likes

I altered the text of your link to make it less likely someone inadvertently downloads the zip file.

4 Likes

Thanks you for your answer. Indeed, 830 calls seems reasonable, I’ll use safety measures within my code to make sure I don’t exceed one request per second.

1 Like

it won’t help with default photos, but the AWS Open Dataset taxon metadata file should have ancestry. see https://github.com/inaturalist/inaturalist-open-data/tree/documentation/Metadata.

3 Likes

After trying to query the API using pyinaturalist, I get a dictionary containing a value ‘default_photo’ which provide the aws link and most importantly the id of the photo, so I can reconstruct the link to all photos. (square, medium, original). I also have eight ‘curated’ photos per entry, which is nice.

Below the image of the entry of the exported JSON file.

default_photo

But maybe you meant that I wouldn’t find the default photo for intermediate rank within the returned dictionary. Indeed, I will need to adapt my code to first fetch information no the main ranks, then querying again missing ancestors, but I will manage that.

what i was saying above was that you could get the complete taxonomy with ancestry for each taxon from a single csv file, but that file does not contain default photo information.

but maybe i should step back and ask why do you need a full taxon list with full ancestry and default photo?

Yes, I understand now.
I’m currently developing a web app to navigate through the taxonomy, so I’m thinking about the database and the way to display taxa as nicely as possible.

It’s mostly for educational purposes so I’m thinking in terms of ergonomics and I believe having photos to illustrate taxa is a nice touch.

this might be more okay if your web app is meant to just be an alternate presentation of the iNat taxonomy for iNat users, and it’s clear that that’s what it is, but if you’re presenting your app as just a general way to view the tree of life, you might be treading into territory where it’s improper to use photos served up by another site and possibly not licensed for your use. so tread carefully.

2 Likes

Yes, as pisum said, tread carefully and make sure you have permission where need along with proper crediting all when/where/if needed.

It won’t be a general way to view the tree of life because I’m not a biologist, I’m more of a neophyte, so I’m not an authority on the subject. Thus the data will be presented as iNaturalist data, photographs will be properly licensed, or I won’t use them if I can’t.

I’m currently still researching about the data, but my goal ultimately is to make a web app centered around cartography. The tree will be a part of the application, as an other way to navigate through the taxonomy.

I just saw that some default photos are not licensed and are “all rights reserved”, I guess it’s more common when species are harder to catch at close-up and/or photos are made by professional photographers.

But, for those interested, the API returns a bunch of photos in ‘taxon_photos’, and some of them are under Creative Commons license so I can use them with proper credits. That’s probably what I’ll use.

So, this raises the question: can someone reclaim his/her photos and restrict their usage at any time?
Because I won’t be updating my database every day, so some license might be revoked and I could be in trouble.

Yep. Or grant different rights for different uses to different users.
Moreover, they can delete their photo, or remove it as the taxon pic. And others can do it too.

Ok, so I’ll probably have to make some kind of automatic check regularly to look for any updates. Good to know.

Users can change their licenses, but if you were following the license they selected at the time you downloaded the photo, you can continue to use it under that license, i.e. a license change is not retroactive.

3 Likes

Yes, if you download the photo once under a CC license, you can continue to use it under that license. I don’t think that you would need to run updates solely for this reason. But you would probably want to for other reasons - sometimes observations are re-IDed and then taxon photos change. For instance, a species A is split into A and B, and the photo that was for Species A is now Species B.

1 Like

… provided that one is able to prove that the copy was performed/obtained before the license change (what about direct linking?)

Ok, but I don’t really want to download the image, for example, I will use the link below to fetch the image to be displayed with an autocomplete search tool.
https://inaturalist-open-data.s3.amazonaws.com/photos/108521205/square.jpg

The image won’t be in my database, so it’s as if I’m re-downloading the image each time (which is actually what’s happening), so the license changes should impact that behavior.

Edit: but maybe I should download the photo to avoid hindering the aws server?

even if the main focus of your proposed site is mapping, if you’re just designing an alternative (complementary) interface for the iNat system – particularly if it’s noncommercial – i think you’ll get a little more leeway for using the photos, regardless of how they are licensed or not (although if someone really wants to sue you, they could).

if your site is going to be more than that, then i would definitely curate and host your own collection of images, even you copy them – properly licensed ones, of course – originally from iNaturalist.

if you’re not going to host the images yourself, i wouldn’t worry about overburdening the AWS server (which hosts all the licensed photos) because Amazon has effectively unlimited resources. but if you were to access non-licensed photos, those are housed on another server that costs iNat money to run. it may be worth noting that if folks switch from licensed to non-licensed, or vice versa, the photos will get pushed to the other server, and if you’re using a static copy of the photo links, those will get broken.

3 Likes

Thanks for all your answers and explanations, it helped a lot and gave me confidence in my project.

I think I’ll keep only licensed photo for my own peace of mind and take them from the amazon server. I’ll reconsider in the future if I host them myself.
I really don’t have any commercial intents, it really is a personal project to educate myself and maybe others too someday.

1 Like