What do the iNaturalist the /taxa/ urls represent: taxa or taxonomic names?

rdmpage · August 18, 2020, 9:36pm

I’m trying to make sense of how iNaturalist models taxa, as part of a broader attempt to look at how other databases and projects such as Wikidata model taxa and taxonomic names. I’m hoping for some clarification here - for some background see Taxonomic concepts continued: iNaturalist. (I asked this question on Twitter https://twitter.com/rdmpage/status/1295629542867054592 but was encouraged to ask it here instead)

In some databases every different taxonomic name gets an identifier, regardless of whether it refers to the same species for not. In other databases, the identifier for a taxon remains unchanged, even if the name changes. Most databases seem to be somewhere in between.

Originally I thought a /taxa/ URL in iNaturalist modelled a taxon, such as a species. For example, the “Thrush-like Schiffornis” Schiffornis turdina https://www.inaturalist.org/taxa/8793 has been split into five taxa, one of which bears the same scientific name ( Schiffornis turdina https://www.inaturalist.org/taxa/513975). Given that the composition of Schiffornis turdina has changed, there is an argument to be made that its taxon identifier should change, which is what iNaturalist does, so 8793 becomes 513975.

But then there are cases such as Heraclides rumiko 428606, which iNaturalist has moved to the genus Papilio, becoming Papilio rumiko, so 428606 becomes 509627. This suggests that the iNaturalist /taxa URLs don’t identify taxa, because Heraclides rumiko and Papilio rumiko are the same species (there’s some disagreement in the literature over whether Heraclides should be a separate genus to Papilio, but not that there is a species rumiko). Likewise, the transfer of the African piculet Sasia africana 18393 to Verreauxia africana 792894 doesn’t change anything about the African piculet, but simply reflects a proposal to have it in its own genus distinct from Sasia.

So, in summary, is there some place I can go to find out more about the rationale for how iNaturalist assigns identifiers (the number after /taxa/) to the taxa in its database? Specifically, why do these change when the taxonomic name changes?

loarie · August 18, 2020, 10:30pm

The current convention on iNaturalist currently is to create a new taxon (and thus new identifier) if:

the scientific name changes (e.g. Heraclides rumiko -> Papilio rumiko)
a taxon is split e.g. Schiffornis turdina (sensu stricto) carved off from Schiffornis turdina (sensu lato)

aisti · August 18, 2020, 11:11pm

I’d bet the identifier is an automatically incrementing field in the database. It just represents any unique row (“taxon”) and isn’t assigned by anybody. @loarie 's answer then represents the cases where a new row is added. (is that right?)

loarie · August 18, 2020, 11:12pm

correct

rdmpage · August 19, 2020, 7:06am

@loarie Thanks for the reply. So these identifiers are essentially database record identifiers that track either names, or cases where the content of a name demonstrably changes. There is not (necessarily) a one to one relationship between an identifier and a taxon. It’s essentially the Darwin Core Archive model of one database row per name, with the tweak that there can be multiple rows with the same name. The relationships between names can be discovered via the API, e.g. https://api.inaturalist.org/v1/taxa/428606 tells us

"current_synonymous_taxon_ids": [
        509627
      ]

so we discover that this name has a synonym and

"is_active": false

tells us that Heraclides rumiko 428606 is not the current name. It’s interesting that iNaturalist links both Heraclides rumiko 428606 and Papilio rumiko 509627 to the same page in Wikipedia

"wikipedia_url": "http://en.wikipedia.org/wiki/Papilio_rumiko"

as Wikidata doesn’t have both links to iNaturalist.

Anyway, thanks very much for the clarification.

bouteloua · August 19, 2020, 12:02pm

rdmpage:

It’s interesting that iNaturalist links both Heraclides rumiko 428606 and Papilio rumiko 509627 to the same page in Wikipedia
"wikipedia_url": "http://en.wikipedia.org/wiki/Papilio_rumiko"
as Wikidata doesn’t have both links to iNaturalist.

There’s a redirect set up on the English Wikipedia https://en.wikipedia.org/w/index.php?title=Heraclides_rumiko&redirect=no

(I wish iNat used Wikidata for some things, but I don’t think there are any instances where they have.)

zygy · August 21, 2020, 5:15am

No that is not correct. iNaturalist is a hybrid model. While there are many taxons with multiple IDs on iNaturalist (such as the Western Giant Swallowtail), there are also many IDs with multiple scientific names (such as 153517, scroll down to the Names section). The distinction is generally when the taxonomic change happened. If a species was renamed after it was created in iNaturalist, it gets a new ID. If a species was renamed before it was created in iNaturalist, the old synonym is often included at the same ID. Keep in mind, however, that iNaturalist does not have comprehensive synonym records.

zygy · August 21, 2020, 5:29am

FWIW, Wikidata has had lots of discussions about resolving their modeling ambiguity when it comes to taxons and taxon names, but they’ve never been able to come to consensus on a proper data model. Thus their system is a bit like iNatualist’s: Wikidata items/IDs usually correspond to taxons except when taxonomic changes have happened since the creation of the initial item, in which case you get multiple items corresponding to taxon names rather than taxons. At least on iNaturalist there is a system for “blessing” accepted names, which doesn’t seem to exist on Wikidata.

cmcheatle · August 21, 2020, 1:55pm

If you ever want to either fall asleep immediately or give yourself a massive headache, try reading the discussion board on the Wikidata Taxonomy project : https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Taxonomy

The basic outcome as I understood it when trying to understand it (and this is just that my understanding) is that Wikidata is not an arbiter, it is a compendium of knowledge. Thus a species name can be both accepted and unaccepted at Wikidata, depending on which reference is being cited.

I even had a very frustrating discussion with a ‘power editor’ there about if obvious mistakes should be incorporated at Wikidata, and their answer was yes. In this case I could point to dozens of references that cited a specific statistic, yet one webpage published a different statistic. The stat was not a matter of opinion, it was an easily researched number that could be documented. Yet I was told Wikidata must accept and publish the mistake.

In fact there is a very detailed discussion about this very topic going on right now as seen here

rdmpage · August 28, 2020, 4:43am

@zygy Hmmm, not sure I follow your argument about multiple names per id. There is one scientific name, one name crossed out, and some common names.

The API only gives me the one scientific name, see https://api.inaturalist.org/v1/taxa/153517 so as far as I can tell, the model is one name per id, with a name able to have more than one id. Trying to infer the model is tricky when the content on the web page can’t be replicated using the API.

rdmpage · August 28, 2020, 5:13am

@cmcheatle As one of the participants in the Wikidata discussion about taxonomy I feel your pain, but this is a tricky topic, especially when deciding what to do requires community consensus, and the way to represent data is being decided incrementally. Projects where the fundamental decisions on data structures are made by a few people (often a single developer) tend to be much easier to manage.

Regarding facts, Wikidata can accept multiple values for the same thing, ideally linked to a reference for that value. Sometimes values may be taken at different times, sometimes there is valid disagreement about a value. There is also a mechanism where people can rank different values, saying that one is “preferred”. This means there is a way to say “there are multiple values available but this one seems best in some sense”.

Without trying gloss over Wikidata’s limitations (and at times it can drive me crazy), it is an extraordinary undertaking whose importance I think will only grow as time goes on.

cmcheatle · August 28, 2020, 5:33pm

As a lurker on the taxonomy project, and a member on a second Wikidata one, unfortunately it too often seems that the reality is there are 2 equally important competing streams. Having the discussions to develop a standard data model / approach and allocating time to spend on cleaning up data from the overwhelming percentage of contributors who wont ever read their debates/conclusions.

I can’t quite put it into words, but I do data stuff on WIkidata across several areas of interest, and taxonomy just seems ‘more broken’ than other areas. Of course how to model something when the question is ‘what is this’ and the answer is ‘it depends on who you ask’ is never going to be fully clean.

zygy · September 7, 2020, 11:36pm

@rdmpage - Crossed out means that the name is not currently accepted, but you can have any number of synonyms under a single taxon ID in iNaturalist. And the synonyms are reflected in the functionality of the site even if they don’t show up in the API. For example, if you search for “Zygoballus bettini”, it will give you “Zygoballus rufipes” as the search result. It should be noted, however, that iNaturalist only allows one currently accepted scientific name per ID. It’s not too surprising that synonyms are not offered in the iNaturalist API. iNaturalist generally hides both synonyms and inactive taxons from the interface in order to minimize confusion, while Wikidata seems to prefer maximizing confusion!

jwidness · September 7, 2020, 11:45pm

You can get the non-accepted scientific names from the API, e.g. https://api.inaturalist.org/v1/taxa?taxon_id=153517&all_names=true

system · November 6, 2020, 11:45pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Unique Identifiers: improving the database for data science General question	15	697	December 8, 2022
Improving iNaturalist's nomenclature & taxonomy General	79	2356	March 12, 2023
Filter multiple taxa in the "Identify" page? General	4	840	September 7, 2020
Retreiving the taxon ID based on the taxon name General	4	681	August 14, 2019
Include alternate names, or IDs of merged inactive taxa, to taxa data in open data taxa.csv Feature Requests	3	311	March 17, 2024

What do the iNaturalist the /taxa/ urls represent: taxa or taxonomic names?

Related topics