Request: get access to a taxon name source using the API

I am trying to find my way into taxon names used by iNaturalist.

I want to use the api to generate custom made texts based on some observations in order to publish:

  • specimen labels
  • species lists with some more info
  • paragraphs formatted according to scientific journal guidelines
  • etc

But for grabbing some of the info (scientific name authorship as the simplest example) I need to have access to the original database used by iNat.as the name source (i.e. a given record id in Plants of The Word Online, Catalogue of Life, or whatever)

Is there any way to get these sources from the taxon id, preferrably using the api?

Thanks

PS - I found this thread and asked there. But that was opened for a different purpose (I am not requesting authorship availability in iNat: I just need access to the external source of a given iNat name).

PS2 - Some related links I found later:

Unfortunately there won’t be any clear/straight-cut answer in part because the referenced taxonomy on iNaturalist differs by taxa. For some, they use the taxon framework relationship system, others are one-by-one sourcing such as the original paper where the name was published, others are just the product of being auto-imported from external resources (Catalogue of Life, Encyclopedia of Life) or mass-imported, for the latter maybe from a spreadsheet that had no authors included. Some taxa on iNat are a combination of those, e.g. first imported from EOL but now have a taxon framework relationship that may have changed the concept of that taxon. Some won’t have any source attached because whoever created it didn’t attach one - would then the iNat username be considered the source? And, most confusingly, some will almost certainly be attached to a source that refers to a homonym with a different author than is meant for that taxon on iNaturalist.

5 Likes

Thanks a lot @bouteloua
From your comments and after looking at the curatorial policies described in your link, looks like I might found a number of different curatorial situations. Let me cite:

Most of our classifications come from our external name providers, but for some groups we try to adhere to different taxonomic authorities (…)
(…)
So, as a curator, what’s to be done? Basically, we try to match parts of our taxonomy to global taxonomic authorities. When that’s not possible, we try stitching together regional taxonomic authorities (…). And for everything else we cite the primary literature or use the names and classifications we import from name providers like the Catalogue of Life and uBio.

All that’s fine.
But in the end, my question is about how iNat documents the usage of all these curatorial choices, so the final user can see which sources were finally employed for a particular taxon name (even if for some taxons there is no source at all, how can I check that “no source” situation?).

In the thread I linked above, @deboas mentions the “taxonomy details” page available in the web UI for each taxon.
https://www.inaturalist.org/taxa/42048/taxonomy_details

This is more or less what I am looking for.
But I would like to use the api to get it in a more computer-friendly way: i.e. how can I get the same info in JSON format?

I don’t know of a way to get that info from the API, but there are references included in the iNaturalist Taxonomy DarwinCore Archive (linked at the bottom of this page). Here’s a quick example to get info for a given species in python:

import pandas as pd

df = pd.read_csv('taxa.csv')
record = df[df['scientificName'] == 'Thorius dubitus']
print(record.to_dict(orient='records'))

Which will give you:

{
    'id': 27562,
    'taxonID': 'https://www.inaturalist.org/taxa/27562',
    'identifier': 'https://www.inaturalist.org/taxa/27562',
    'parentNameUsageID': 'https://www.inaturalist.org/taxa/27527',
    'kingdom': 'Animalia',
    'phylum': 'Chordata',
    'class': 'Amphibia',
    'order': 'Caudata',
    'family': 'Plethodontidae',
    'genus': 'Thorius',
    'specificEpithet': 'dubitus',
    'infraspecificEpithet': nan,
    'modified': '2019-11-23T10:41:18Z',
    'scientificName': 'Thorius dubitus',
    'taxonRank': 'species',
    'references': 'http://research.amnh.org/vz/herpetology/amphibia/?action=names&taxon=Thorius+dubitus'
}

Link: https://amphibiansoftheworld.amnh.org/?action=names&taxon=Thorius+dubitus

To convert just taxon IDs and reference URLs to a JSON file:

df[['id', 'references']].to_json('taxon_references.json')

There are lots of ways to do that, obviously. Dumping the CSV into a SQLite database is another convenient way to work with it.

1 Like

Thanks a lot @jcook
So you suggest using a downloadable csv file and open it with pandas to search for taxa (in my case I prefer to use taxonID since I already got it from the observations api).
That’s a good idea. Not sure how much memory it can take. I presume the csv will include the whole taxonomic tree, and I just need to grab a few names each time, so an api access might be faster for my use case.

I’ll give it a try for sure. But anyway I think it wouldn’t hurt to have an API access to taxonomy_details page. Don’t you agree?

I was a bit confused because your bottom link points to a amphibian database website. But I searched for the “iNaturalist Taxonomy DarwinCore Archive” words and I guess you meant this file? (which I found linked at the iNaturalist developers page).

@bouteloa I wonder if its possible to move the thread from “General” to “Feature requests”.

Tried to do it myself when editing the title, but it seems I have no privileges to change that.

Thanks

Loading the whole table and working with it in-memory is faster than you might expect, but it depends on whether you’re doing a few large queries or many small queries over time.

For a single taxon at a time, SQLite may be more convenient. To import the file into a new table from the command line:

sqlite3 -csv taxa.db ".import taxa.csv taxon"

REFERENCES is a reserved keyword, so you may want to rename that column:

ALTER TABLE taxon RENAME COLUMN 'references' TO reference_url;

Example query:

SELECT id, reference_url FROM taxon WHERE id IN (27562, 27563);

Yeah, that would be nice to have. You could currently get that info through web scraping, but that would really only be viable for a small handful of taxa at a time, not large queries.

That’s just the reference link for the species in that example.

Ahh OK, I misunderstood your sentence (I was expecting a link to the DwC archive).

Thanks a lot for all your examples. Really helpful !!