Is there a tool / code snippet that allows downloading of taxonomy data from the site?

cmcheatle · July 10, 2020, 6:15pm

I know there are a lot of various code snippets / tools / projects that have been mentioned on the forum at times (as an aside, putting together a wiki that consolidates them all in 1 place would be very helpful), to the point where I can’t remember what is out there.

Is there one out there that allows you to specify a taxa (assume by taxa id #) and then get a download of all taxa that are descendants of it ?

Thx.

jwidness · July 10, 2020, 6:43pm

You can use the get taxa endpoint.

For example, all Sciurus species:

Did you want it in any particular format besides json?

cmcheatle · July 10, 2020, 6:49pm

Json or any other data format is fine, what I don’t have a handle on having not done it is the pagination and getting the full result set, not just 30.

jwidness · July 10, 2020, 7:01pm

When I wrote a script for retrieving mammal taxonomy, I did pagination the lazy way: I put the call into my browser, noted the number of results, then calculated how many pages I needed and did a for loop exactly that many times in python. But you could just keep track of the total results and the per page amount and go until pages * per_page takes you over the total results.
I can post some python if that would be useful?

cmcheatle · July 10, 2020, 7:42pm

Sure, if you have easy access to it. My python is a little rusty but I’m sure I can muddle through.

pisum · July 10, 2020, 8:32pm

jwidness also has some html / javascript that could be modified slightly to get stuff from the taxa API endpoint: https://github.com/jumear/stirfry/blob/master/iNat_Ungrafted_taxa.html.

here’s a simple wrapper page for the API endpoint, which won’t automatically iterate through pages (as the above code will), but it will allow you to view and page through results from the API a little easier (than just the raw json):
page: https://jumear.github.io/stirfry/iNatAPIv1_taxa.html
code: https://github.com/jumear/stirfry/blob/gh-pages/iNatAPIv1_taxa.html

note that the API won’t return more than 10,000 records for a given set of parameters, but you can work around that limit by setting id_above / id_below.

UPDATE: i added a little thing to the wrapper page to generate a csv file. like the rest of the page, it’s quick and dirty, but it should work well enough (though i’ve limited it to 10000 records for simplicity).

jwidness · July 11, 2020, 1:01am

I cleaned it up a bit, let me know if you have questions or it doesn’t work. The output is a csv.

import urllib.request
import urllib.error
import json
import csv
import time

# see https://api.inaturalist.org/v1/docs/#!/Taxa/get_taxa for more details on parameters
# in particular, if there are more than 10,000 results, you'll need to pare it down via parameters to get everything
taxon = 45933		# specify the taxon number here
rank  = 'species'	# use '' (empty quotes) if you don't want to specify a rank

# by default calls only for active taxa, doesn't return all the names for each taxon, and 200 results per page
apiurl = 'https://api.inaturalist.org/v1/taxa?is_active=true&all_names=false&per_page=200'
	
def call_api(sofar=0, page=1):
	"""Call the api repeatedly until all pages have been processed."""
	try:
		response = urllib.request.urlopen(apiurl + '&page=' + str(page) + '&taxon_id=' + str(taxon) + '&rank=' + rank)
	except urllib.error.URLError as e:
		print(e)
	else:
		responsejson = json.loads(response.read().decode())
		for species in responsejson['results']:
			# lots of possible data to keep, here it's name, taxon id, and observations count
			csvwriter.writerow([species['name'], species['id'], species['observations_count']])
		if (sofar + 200 < responsejson['total_results']):  # keep calling the API until we've gotten all the results
			time.sleep(1)  # stay under the suggested API calls/min, not strictly necessary
			call_api(sofar + 200, page + 1)

try:
	with open(str(taxon) +'.csv', encoding='utf-8', mode='w+', newline='') as w:  # open a csv named for the taxon
		csvwriter = csv.writer(w, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
		call_api()
except Exception as e:
	print(e)

robert_taylor · September 25, 2020, 10:14pm

I would be interested to download the iNat taxonomy for all the Odonata around the world. I know that the Odonata taxonomy is based off the World Odonata List which is publicly accessible, but I would like to download the vernacular names off iNat. Is this possible? and if so how?

jwidness · September 25, 2020, 11:51pm

@robert_taylor I moved your post to this existing thread. Let me know if you’d like more information beyond what’s already mentioned above.

robert_taylor · September 26, 2020, 6:12am

This is new to me - but I am keen to learn. Does iNaturalist have a page which describes how to use the iNat API? or do you have to have some knowlage of programming to use this?

Do I place your script in one of the boxes in the get_taxa API?, and if so which box? Do I just change the taxon number to the one I want e.g. 47792? and empty quotes for rank?

Will it output all the vernacular names for each species or just one per species?

The 10 000 limit will not be a problem as there are only about 3 000 odonata species. I would like the output in .csv.

I am sorry for all the questions

lawnranger · September 26, 2020, 8:47am

Typically you would use a programming language to request the data from the API then filter/format/output the data to your needs.

The script above it written in Python, you will need Python installed on your computer to run it. Or use something like Google Colab (Google account required) to run it online.

Yes, taxon id 47792 will give you all Odonata. Empty quotes for rank will include all ranks from order to species, about 7300 results (Including species[‘rank’] may be helpful)

In the script above “…species[‘name’]…” will output the latin name, adding “species[‘preferred_common_name’]” will output the (ie one) vernacular name if available. The language can be set with the ‘locale’ and or the ‘preferred_place_id’ parameters or is determined by the default language on your computer, I think.

Set “all_names=true” to get all the vernacular names, but would then need a way to sort each language into the correct ‘column’ in the cvs file.

[EDIT] This might not make much sense if you’re just starting with programming, but hopefully some pointers when you get the hang of it

robert_taylor · September 26, 2020, 10:43am

Thank you for the advice. I am playing around on Colab. It will be a while till I have it figured out.

How do you get it to output the .csv file? or when I run it where do I find the outputs?

I have set all_names to true. What can I add to get the common names I assume that I will have to add something like - csvwriter.writerow([species[‘name’], species[‘common_name’]]) - and then add several common name columns so that I can recall all the common names?

lawnranger · September 26, 2020, 10:50am

In Colab, on the left are three icons, bottom one looks like a folder, click on it. Any files created will appear here, you will then be able to download it.

After setting all_names to true, all the names will be in species[‘names’] you would need to loop through them all…hold on ~~I’ll post some code~~ send you a link

robert_taylor · September 26, 2020, 11:08am

Thank you, I can download the .csv and I am getting the long list of the names now!

Thank you for you help, and I look forward to your link!

pisum · September 26, 2020, 12:52pm

if you had used the page i wrote above, you would have just opened https://jumear.github.io/stirfry/iNatAPIv1_taxa.html?taxon_id=47792 in your web browser and clicked on export (near the top of the page).

https://api.inaturalist.org/v1/docs/#!/Observations/get_observations

pisum · May 21, 2021, 1:48pm

besides the API, there are 2 ways to get a full taxonomy download nowadays, though these are both monthly snapshots, as opposed to live data.

Option 1 (includes common names):

Option 2 (no common names):

aspidoscelis · January 13, 2023, 8:19pm

It looks like the options discussed here just get you the list of taxa in iNaturalist. Is there a way to download the taxonomy—meaning both the taxa and synonyms?

pisum · January 13, 2023, 9:38pm

not sure what problems you’re encountering. getting data from the API via /v1/taxa will definitely provide synonyms (at least as they are define within the system). just for example:

aspidoscelis · January 13, 2023, 10:07pm

Thanks! I’d tried the DWCA taxonomy export and the AWS metadata, neither of which has synonyms. Didn’t see them in your earlier link, either, but it looks like the “is_active=any” fixes that.

Do you happen to have any tricks for dealing with the 10,000 record export limit? What I’m trying to get to is the iNaturalist taxonomy for vascular plants in the continental United States, but due to the export limit it looks like I have to search through and find the list of all taxa that collectively cover the entirety of the vascular plants but individually have no more than 10,000 names. This seems like it’ll end up being several hundred taxa and thus several hundred queries & export files, which is a bit unwieldy.

pisum · January 13, 2023, 10:28pm

i think synonyms should exist only on inactive taxa, since the iNat synonyms seem to be the result of taxon changes. so you could get active taxa from DWCA or AWS and the get only the inactives from the API. this is roughly 60000 taxa, which could probably be filtered down and/or segmented by a handful of logical groupings: https://jumear.github.io/stirfry/iNatAPIv1_taxa.html?is_active=false&taxon_id=211194.

Topic		Replies	Views
API taxonomy: list of available ranks? General api	3	950	August 15, 2019
How to retrieve all pages of results in *.json General api	10	179	June 5, 2025
API for complete list of "listed_taxa" entries? General api	9	543	March 9, 2023
Easy way to get taxon IDs for a list of 300ish taxa? General	4	874	June 26, 2023
How to download taxa General question	9	2127	July 25, 2019

Is there a tool / code snippet that allows downloading of taxonomy data from the site?

Related topics