Is there a tool / code snippet that allows downloading of taxonomy data from the site?

I know there are a lot of various code snippets / tools / projects that have been mentioned on the forum at times (as an aside, putting together a wiki that consolidates them all in 1 place would be very helpful), to the point where I can’t remember what is out there.

Is there one out there that allows you to specify a taxa (assume by taxa id #) and then get a download of all taxa that are descendants of it ?

Thx.

6 Likes

You can use the get taxa endpoint.

For example, all Sciurus species:

Did you want it in any particular format besides json?

2 Likes

Json or any other data format is fine, what I don’t have a handle on having not done it is the pagination and getting the full result set, not just 30.

When I wrote a script for retrieving mammal taxonomy, I did pagination the lazy way: I put the call into my browser, noted the number of results, then calculated how many pages I needed and did a for loop exactly that many times in python. But you could just keep track of the total results and the per page amount and go until pages * per_page takes you over the total results.
I can post some python if that would be useful?

Sure, if you have easy access to it. My python is a little rusty but I’m sure I can muddle through.

jwidness also has some html / javascript that could be modified slightly to get stuff from the taxa API endpoint: https://github.com/jumear/stirfry/blob/master/iNat_Ungrafted_taxa.html.

here’s a simple wrapper page for the API endpoint, which won’t automatically iterate through pages (as the above code will), but it will allow you to view and page through results from the API a little easier (than just the raw json):
page: https://jumear.github.io/stirfry/iNatAPIv1_taxa.html
code: https://github.com/jumear/stirfry/blob/gh-pages/iNatAPIv1_taxa.html

note that the API won’t return more than 10,000 records for a given set of parameters, but you can work around that limit by setting id_above / id_below.

UPDATE: i added a little thing to the wrapper page to generate a csv file. like the rest of the page, it’s quick and dirty, but it should work well enough (though i’ve limited it to 10000 records for simplicity).

I cleaned it up a bit, let me know if you have questions or it doesn’t work. The output is a csv.

import urllib.request
import urllib.error
import json
import csv
import time

# see https://api.inaturalist.org/v1/docs/#!/Taxa/get_taxa for more details on parameters
# in particular, if there are more than 10,000 results, you'll need to pare it down via parameters to get everything
taxon = 45933		# specify the taxon number here
rank  = 'species'	# use '' (empty quotes) if you don't want to specify a rank

# by default calls only for active taxa, doesn't return all the names for each taxon, and 200 results per page
apiurl = 'https://api.inaturalist.org/v1/taxa?is_active=true&all_names=false&per_page=200'
	
def call_api(sofar=0, page=1):
	"""Call the api repeatedly until all pages have been processed."""
	try:
		response = urllib.request.urlopen(apiurl + '&page=' + str(page) + '&taxon_id=' + str(taxon) + '&rank=' + rank)
	except urllib.error.URLError as e:
		print(e)
	else:
		responsejson = json.loads(response.read().decode())
		for species in responsejson['results']:
			# lots of possible data to keep, here it's name, taxon id, and observations count
			csvwriter.writerow([species['name'], species['id'], species['observations_count']])
		if (sofar + 200 < responsejson['total_results']):  # keep calling the API until we've gotten all the results
			time.sleep(1)  # stay under the suggested API calls/min, not strictly necessary
			call_api(sofar + 200, page + 1)

try:
	with open(str(taxon) +'.csv', encoding='utf-8', mode='w+', newline='') as w:  # open a csv named for the taxon
		csvwriter = csv.writer(w, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
		call_api()
except Exception as e:
	print(e)
3 Likes

I would be interested to download the iNat taxonomy for all the Odonata around the world. I know that the Odonata taxonomy is based off the World Odonata List which is publicly accessible, but I would like to download the vernacular names off iNat. Is this possible? and if so how?

@robert_taylor I moved your post to this existing thread. Let me know if you’d like more information beyond what’s already mentioned above.

This is new to me - but I am keen to learn. Does iNaturalist have a page which describes how to use the iNat API? or do you have to have some knowlage of programming to use this?

Do I place your script in one of the boxes in the get_taxa API?, and if so which box? Do I just change the taxon number to the one I want e.g. 47792? and empty quotes for rank?

Will it output all the vernacular names for each species or just one per species?

The 10 000 limit will not be a problem as there are only about 3 000 odonata species. I would like the output in .csv.

I am sorry for all the questions

Typically you would use a programming language to request the data from the API then filter/format/output the data to your needs.

The script above it written in Python, you will need Python installed on your computer to run it. Or use something like Google Colab (Google account required) to run it online.

Yes, taxon id 47792 will give you all Odonata. Empty quotes for rank will include all ranks from order to species, about 7300 results (Including species[‘rank’] may be helpful)

In the script above “…species[‘name’]…” will output the latin name, adding “species[‘preferred_common_name’]” will output the (ie one) vernacular name if available. The language can be set with the ‘locale’ and or the ‘preferred_place_id’ parameters or is determined by the default language on your computer, I think.

Set “all_names=true” to get all the vernacular names, but would then need a way to sort each language into the correct ‘column’ in the cvs file.

[EDIT] This might not make much sense if you’re just starting with programming, but hopefully some pointers when you get the hang of it

2 Likes

Thank you for the advice. I am playing around on Colab. It will be a while till I have it figured out.

How do you get it to output the .csv file? or when I run it where do I find the outputs?

I have set all_names to true. What can I add to get the common names I assume that I will have to add something like - csvwriter.writerow([species[‘name’], species[‘common_name’]]) - and then add several common name columns so that I can recall all the common names?

In Colab, on the left are three icons, bottom one looks like a folder, click on it. Any files created will appear here, you will then be able to download it.

After setting all_names to true, all the names will be in species[‘names’] you would need to loop through them all…hold on I’ll post some code send you a link

Thank you, I can download the .csv and I am getting the long list of the names now!

Thank you for you help, and I look forward to your link!

if you had used the page i wrote above, you would have just opened https://jumear.github.io/stirfry/iNatAPIv1_taxa.html?taxon_id=47792 in your web browser and clicked on export (near the top of the page).

https://api.inaturalist.org/v1/docs/#!/Observations/get_observations

2 Likes

besides the API, there are 2 ways to get a full taxonomy download nowadays, though these are both monthly snapshots, as opposed to live data.

Option 1 (includes common names):

Option 2 (no common names):

2 Likes

It looks like the options discussed here just get you the list of taxa in iNaturalist. Is there a way to download the taxonomy—meaning both the taxa and synonyms?

not sure what problems you’re encountering. getting data from the API via /v1/taxa will definitely provide synonyms (at least as they are define within the system). just for example:

1 Like

Thanks! I’d tried the DWCA taxonomy export and the AWS metadata, neither of which has synonyms. Didn’t see them in your earlier link, either, but it looks like the “is_active=any” fixes that.

Do you happen to have any tricks for dealing with the 10,000 record export limit? What I’m trying to get to is the iNaturalist taxonomy for vascular plants in the continental United States, but due to the export limit it looks like I have to search through and find the list of all taxa that collectively cover the entirety of the vascular plants but individually have no more than 10,000 names. This seems like it’ll end up being several hundred taxa and thus several hundred queries & export files, which is a bit unwieldy.

i think synonyms should exist only on inactive taxa, since the iNat synonyms seem to be the result of taxon changes. so you could get active taxa from DWCA or AWS and the get only the inactives from the API. this is roughly 60000 taxa, which could probably be filtered down and/or segmented by a handful of logical groupings: https://jumear.github.io/stirfry/iNatAPIv1_taxa.html?is_active=false&taxon_id=211194.

1 Like