Unique Identifiers: improving the database for data science

My colleague and I have been making use of iNaturalist’s taxonomic backbone to maintain a database of unique identifiers to help manage a large, synthetic biodiversity dataset that includes diverse sources of historical and contemporary biodiversity data, including iNaturalist observations. Naturally, we have had to contend with the issue of shifting taxonomies, which entails continually revising the correspondences between multiple synonymous entities. Right now, when new taxa are created to replace an old or synonymous name, they hold a unique identifier distinct from the other name. I understand why this is necessary, but it also makes for a laborious process of maintaining a database of correspondences between multiple synonymous entities on iNaturalist. I am wondering whether there is a database internal to the iNaturalist infrastructure that makes these correspondences explicit, which might be used for tracking and resolving these changes?

Apologies if this issue has been treated elsewhere; I haven’t been participating in the iNaturalist forum for quite some time!

Thanks for your consideration
Andrew

Edit: I removed the issue re: nominate infrataxa as this has been addressed in the comments below

Maybe I’m misinterpreting what you’re saying, but this is not correct. Whilst all V. peregrina ssp. peregrina fall within V. peregrina, the opposite is not true. The existence of a nominate subspecies implies that there must also be other subspecies, in which case some observations of V. peregrina may correspond to this other infraspecific entity. So they do not represent the same entity.

3 Likes

If there is more than one subspecies or variety, how do you know which species level IDs correspond to which lower level categories? For that reason, I don’t think nominate subspecies are redundant.

It is slightly annoying that splits and swaps change the taxon number. This is the 3rd post recently talking about how to handle taxonomy, so there are improvements to make.

1 Like

The database issues here are beyond me. I did want to make a point about taxonomy – V. p. peregrina =/= V. peregrina. The nominate subspecies is one of several subspecies. (Although the others might not have been placed in the database…) The only reason the nominate subspecies is the nominate is that the type specimen(s) for the species belongs to it. (I don’t know what happens if it turns out that the type specimen was in fact an intergrade between two other subsequently described subspecies, although it wouldn’t surprise me if that has happened a few times.)

1 Like

@thebeachcomber —you are correct. All instances of Veronica peregrina ssp. peregrina = Veronica peregrina. However, I have not stated that other infraspecific taxa = Veronica peregrina. Those other entities, e.g., Veronica peregrina var. xalapensis, are perfectly valid and should be registered as active, distinct from Veronica peregrina. But we should not have two registered entities that correspond with the same taxon.

@psweet please explain to me how Veronica peregrina is different from Veronica peregrina ssp. peregrina. The nominate subspecies may be one among several other subspecies or varieties, but it corresponds with the taxon name that does not carry the infraspecific name.

@egordon88 —if there is an ID recognizing an infraspecific taxon distinct from the nominate variety or subspecies, then it carries a totally different name / unique ID on iNaturalist. So if a person does not know the subspecies or variety, they can apply the species name without the infraspecific annotation / identifier. If there are IDs that improve the resolution to a infraspecific name, that ID can be applied.

For example, if I see Veronica peregrina but I am unsure of the subspecies, I would just call it Veronica peregrina. If I recognize the variety as Veronica peregrina var. xalapensis, then I would use that Taxon on iNat to identify it. There is no case where you need to discriminate Veronica peregrina from Veronica ssp. peregrina because those are one and the same.

examples, please

The nominotypical subspecies is a subset of the species. It is not the same taxon and does not have the same geographic distribution as the species, although it does share the same type specimen and original description. If there are no other recognized subspecies in that species, then the nominotypical subspecies doesn’t exist.

4 Likes

@pisum —we maintain a variety of databases where we have to implement taxon swaps to keep track of ongoing changes in iNaturalist’s internal taxonomy. A simple example is of a species of Leptogium, the name of which was moved to Scytinium in recent history. Now when I refresh one of these databases, incorporating observations from iNaturalist, I have to track the change between species that formerly went under Leptogium and those now recognized under Scytinium. On iNat, it is clear that in some cases there is an internal system that registers old names as synonymous with new names, but not in all cases. It would be nice if iNaturalist tracked the correspondences between names, so that those managing databases that incorporate iNaturalist data could easily manage these changes.

No – the trinomial doesn’t correspond with the binomial! The binomial includes not only the nominate subspecies but also the other subspecies. Therefore, there can be ID’s made to species that shouldn’t be identified as the nominate subspecies (or any subspecies, in the case of many intergrades or intermediates). A good example with birds is Dark-eyed Junco – making an ID of Junco hyemalis should not be taken to imply an ID of J. hyemalis hyemalis. Many people prefer to avoid the subspecies question altogether, and in the case of juncos there are plenty of individuals where it’s impossible to assign them to any particular subspecies – we need a binomial to deal with those that doesn’t imply the nominate subspecies!

8 Likes

Exactly. Not the same. I say that as someone who is no fan of subspecies —I try to ignore them — although they are useful in some cases.

1 Like

OK, I finally see the need to maintain the species taxon alongside the nominate infraspecific taxon on iNat to recognize the set of infrataxa within a species. My original comment still stands: it would be nice if we had a way of readily resolving the correspondences between synonymous entities with reference to the iNaturalist API. Thanks for contributing to this discussion everyone.

2 Likes

for each inactive taxon, the API can provide an array of current_synonymous_taxon_ids, but i believe that this would rely on the particular taxon having been inactivated as part of a taxon change within the system.

here’s an example of a request that includes such information in the response: https://api.inaturalist.org/v1/taxa?q=Leptogium&is_active=false&per_page=100, and here’s a page that can help you see this in a more human-readable format: https://jumear.github.io/stirfry/iNatAPIv1_taxa?q=Leptogium&is_active=false&per_page=100.

not sure which examples of taxa you’re thinking of that don’t have synonymous taxon information in the API response, but to the extent this information doesn’t exist, it would imply there’s no taxon change history in the system.

Thanks so much for this @pisum —I think the main limitation we’ve been confronted may have to do with a lack of taxon change history in some cases. I think I read somewhere that this history only extends back so far?

At any rate, we will review what you’ve proposed and see if we can improve upon our system.