Download taxonomy with synonyms

Platform(s): Website, iNaturalist Taxonomy DarwinCore Archive.

URLs: https://www.inaturalist.org/taxa/inaturalist-taxonomy.dwca.zip

Description of need: At present, the DWCA taxonomy download is a taxon list without synonymies. In order to work with iNaturalist data, and in particular to deal with both iNaturalist data and other data using other taxonomies, the full iNaturalist taxonomy including synonymies would be much more useful. At present it’s difficult to get that information by other means—see recent discussion here.

Feature request details: Changing the DWCA taxonomy download—or providing an additional download and leaving that one unchanged, I suppose—that corresponds to an “active=any” rather than “active=true” query and includes the “synonym_id” field would resolve the issue.

I think this should be very easy to implement, even if not used by a large number of people.

An update: as of submitting the feature request, I had figured out how to download the iNaturalist taxonomy, but I hadn’t yet done so. If others are curious, the “how” follows.

First, the easy part, download all “active” names at: https://www.inaturalist.org/taxa/inaturalist-taxonomy.dwca.zip.

To get all the “inactive” names, use this helpful tool created by @pisum: https://jumear.github.io/stirfry/iNatAPIv1_taxa.html?taxon_id=47604&is_active=false. On iNaturalist, navigate to the higher-level taxon whose taxonomy you want to download, e.g.: https://www.inaturalist.org/taxa/47604-Asteraceae. Copy and paste the taxon_id into the URL above. If there are more than 10,000 results, you won’t be able to download them all. Open the iNaturalist pages for all taxa at the next rank down that are included within that taxon. Copy & paste taxon_ids for each of these, and continue moving down the taxonomic hierarchy as needed.

To download the iNaturalist taxonomy for land plants took me a total of 65 iterations of the copy / paste taxon_id & export csv process. Having done it… I think I’d rather not do it again.

i think this could be done more efficiently. knowing that https://jumear.github.io/stirfry/iNatAPIv1_taxa.html?taxon_id=47604&active=false brings back roughly 30,000 records, you could aim to get 3 sets of just under 10,000 records using the id_above and id_below parameters:

this is more what i was thinking when i talked about “a handful of logical groupings” in the earlier post. sorry i didn’t provide more detail, since that may have saved on some extra work.

It could also be done more efficiently by me typing the tags correctly… accidentally writing “active” rather than “is_active” means I was actually just downloading the same data as in the DWCA file. Ho hum. Doing it correctly I end up with 65 downloads.

Also, for what it’s worth, that URL was merely illustrative—there isn’t a single iNaturalist taxon that corresponds with “land plants” a.k.a. Embryophyta, rather it would be at the highest level this set:

https://jumear.github.io/stirfry/iNatAPIv1_taxa.html?taxon_id=56327&is_active=false
https://jumear.github.io/stirfry/iNatAPIv1_taxa.html?taxon_id=311249&is_active=false
https://jumear.github.io/stirfry/iNatAPIv1_taxa.html?taxon_id=64615&is_active=false
https://jumear.github.io/stirfry/iNatAPIv1_taxa.html?taxon_id=211194&is_active=false

Also, I wasn’t aware of the “id_above” and “id_below” tags. Those might come in handy, thanks!

note that you can get multiple comma-separated taxa in one set: https://jumear.github.io/stirfry/iNatAPIv1_taxa.html?taxon_id=56327,311249,64615,211194&is_active=false

That bit of functionality I have used before, just wasn’t occurring to me in this context. :-)

I am working through a database. The selection is lichenized and lichenophilic fungi. This is a scattered group in the Fungi kingdom. The SQL script allows me to get rid of the taxon ID search, tracking of the number of returned results.
So, yes. Importing synonyms into the taxonomic tree is a good idea.

Probably a dumb question, but how would one connect the inactive synonyms with the active taxon? Typically there is an active taxon field or something similar with lists of synonyms, that can be used to link the two.

These tables are very useful, but they don’t contain:

  • sysonyms for latin (scientific) names as in the title of this thread
  • information about which vernacular name (for example English) is main one. Because first name listed in the datasheet not always is main name. For examle in case of Caloptilia stigmatella:
    • “willow leafcone caterpillar” is listed as first one in the “VernacularNames-english.csv” file extracted from the “inaturalist-taxonomy.dwca.zip” archive, but
    • “willow leafcone caterpillar moth” (listed as second one) is used as main name in the iNaturalist webpage:
      https://www.inaturalist.org/taxa/215900-Caloptilia-stigmatella

cool catch, because the export is losing the position order of site.

i submitted a fix - https://github.com/inaturalist/inaturalist/pull/4895

regarding @aspidoscelis main request of synonymy capture properly in the export itself,

I think the easiest way is to use is_valid on scientific names itself, for example, on this taxon page - https://www.inaturalist.org/taxa/946464-Eupterote-axesta, the synonym acesta is linked with original axesta, but its is_valid is false.

so I can think of correct way to do this cleanly for DWC maybe is:

we export junior synonym acesta as true single taxon in export, but link it to acceptedNameUsageID by linking to senior name axesta by finding such is_valid false scientific names during export itself; and also setting its taxonomicStatus to synonym instead of accepted for acesta