Is there a tool / code snippet that allows downloading of taxonomy data from the site?

Yes, I’m aware of that.

I don’t think this is well-documented or generally known. See, for instance, the discussion between @pisum & I above, where both of us were under the impression that “inactive taxa” are how iNaturalist handles synonymy. Or this discussion, which popped up when I searched to see if the topic is clarified somewhere that I’d missed. Or the curator guide, which discusses synonymy in the context of taxon changes but doesn’t appear to mention the synonyms on the ‘Taxonomy’ tab.

Also, in the raw API output for “Boechera 64070”:

“current_synonymous_taxon_ids”:[868981,902757,1068235]

I don’t see it defined in a quick check of the API documentation, but it’s surely to be expected that people would interpret “current synonymous taxon” as having something to do with synonymy.

I’ve come to view similarities between iNaturalist taxonomy and the ICNafp as being, mostly, what in language learning you’d call “false friends”—there are many superficially similar concepts, but they don’t map onto each other in ways that are easily inferred or well documented. It’s a separate system that is best understood de novo rather than by analogy.

For instance, in the API output for Polemonium villosum, when an entry in the ‘name:’ list is marked ‘“is_valid”:false’ this means it is a synonym of the entry marked ‘“is_valid”:true’. However, ‘“current_synonymous_taxon_ids”:null’ means that there aren’t any associated taxon changes in the database. Understanding the ICNafp isn’t just unhelpful in figuring out what is meant, it’s misleading.

That’s probably a bit of a rabbit hole, though, sorry!

More pragmatically… if “taxon concept” is our central concept, well, how does one know what the taxon concept is? You can look on the Taxonomy tab, but “stare at it with your eyeballs” doesn’t scale well…

1 Like

Yes, sorry, I knew that but was mentioning it for the record for other readers.

That is an unfortunate choice of API terminology, since the taxon to which it refers (Boechera 64070) is inactive and not “current.” But technically those three taxa would in fact be synonyms of Boechera 64070 if it were ever to become the active Boechera concept again. A better name for the API term might be “successor_taxon_ids.”

Where the section on taxon changes in the Curator Guide says “names associated with the input get moved to the output,” it is referring to the names publicly visible on the Taxonomy tab (all lexicons, including scientific names). I agree this could be made more explicit. For one-to-many (split) changes like happened with Boechera 64070, names associated with the “old” input taxon do not get moved, for obvious reasons.

iNaturalist allows community-maintained synonymy in addition to that automatically generated by taxon changes - curators can manually add more synonyms. This allows us to be maximally explicit about what is meant by a particular taxon name (e.g., the “taxon concept”). The ICNafp isn’t designed to help us understand this, it’s just supposed to tell us which names are correct for a particular circumscription, position, and rank (for algae, fungi, and plants). If anything is misleading, it’s the unfortunate choice of “current_synonymous_taxon_ids” as the name for that API term, as I already mentioned.

As in most taxonomic works, by seeing the other ICNafp-compliant names (= types) included in and excluded from the circumscription, and by further discussion when needed (which in iNat generally takes place on taxon flags). If relevant names are missing from the Taxonomy tab for an active taxon, they can easily be added.

That said, except for deviations from POWO documented in Taxon Framework Relationships, the ICNafp-compliant names listed in the POWO record for an active iNat taxon serve to define the concept in use, and we generally don’t try to completely replicate that information on the iNat taxonomy tabs. For iNat purposes what is most important to have on the taxonomy tab are frequently used synonyms that people might use to look up a taxon on iNat.

So back to my original reason for commenting here, I’m still not understanding where plant names in use on iNaturalist are non-ICNafp-compliant in any pervasive sense. Certainly the names could be organized and labeled better if iNat was going to serve as a primary source of automated taxonomic data, but of course we already know that is not among its prime directives.

No worries; just wasn’t sure what to make of it. :-)

I’m not sure about that—describing taxa rather than names as synonymous is one of my “false friends”. It sounds like you would plug in the ICNafp definition of ‘synonym’, but it doesn’t fit—it’s not synonymy in that sense and there isn’t an obviously correct way to map a looser “spirit” of synonymy onto taxa. I thought about this for a couple minutes when I saw ‘current_synonymous_taxon_ids’, and decided that, if you just asked what “synonymous taxa” are without context, I’d probably end up at “taxa that are nomenclaturally different but taxonomically identical”. A recent example: Crusea diversifolia and Crusea simplex. However, that’s just me seeing a new and undefined term and making my best guess at what it ought to mean—this is not the process we want.

I think the “linear series of changes” model that this implies (and which iNaturalist uses) is a very bad idea…

I don’t think I would have inferred this was a statement about synonymy.

I’ve been dealing with these issues in the process of developing and documenting the nomenclature & taxonomy data structure that I’m using. Sometimes the ICNafp terminology and concepts are not the best choice for the kind of data management I’m doing. In those cases, I think the best approach is something like this: “Here’s the definition of this term. It is not an ICNafp term, but it is similar to […]. It differs from that concept in the following ways […], because it makes [some kind of data manipulation] easier / more legible / etc. Here are a couple lines of R code that can translate between my term and the ICNafp term.” In other words, use the ICNafp terms or explicitly map your terms onto ICNafp terms. Don’t leave people guessing!

I expect people familiar with the ICNafp to assume ‘is_valid’ means “is validly published”.

Yes, but: :-)

This is a database! The whole point is enabling us to manage and analyze data in ways that are more flexible, capable, and scalable than reading text. Reading text does not work at the scale of millions of observations—if the process for understanding the data doesn’t work at the scale of the data, we have a problem.

What about “Boechera 64070”? The circumscription is not directly documented. I think there’s a pathway to get there by tracing things through the taxon changes, but the need to understand idiosyncratic and iNaturalist-specific concepts increases substantially—figuring out what’s going on becomes much less intelligible as taxonomy. Over time, we should expect that most identifications will end up in this category. If there’s been a taxon change since the time the identification was made, the taxon concept associated with the identification isn’t directly documented.

I don’t think the link to “the taxon concept in use” is credible. How many users are clicking through to POWO and reviewing the taxon concepts there? Is it “the taxon concept in use” if the people using it don’t know what it is?

In other words, this is a prescriptive statement that I don’t think is useful as description.

Hopefully my comments above help to clarify what I mean. On one level we could say that, e.g., Crusea simplex is both the name applied to a taxon in iNaturalist and an ICNafp name. True, but checking to see if two character strings are the same doesn’t get us very far. Most of the information is in the conceptual frameworks and terminologies for understanding the names and the relationships between them. I think trying to understand iNaturalist’s taxonomy by analogy to ICNafp is much more difficult than trying to understand iNaturalist’s taxonomy as an independent system that does some similar things. The terms, concepts, and relationships are similar enough to give a superficial impression that we should be able to map one onto the other, but my experience is that the closer I look the more incompatible the two are. If I see the same word appear in both places, I think my best bet is to interpret these as homonyms. They look the same, they mean different things.

BTW, this is a case where USDA PLANTS gets it right. Want the taxonomy? Download the taxonomy table. It’s got what plants crave. It’s got synonyms.

Sorry, that was verbal laziness on my part. Substitute “names” for “taxa” and that’s what I meant.

Yep, another case of unfortunate API labeling of a data piece. A more accurate label would have been “is_accepted” or more fully, “is_accepted_by_iNaturalist.”

Your qualification of “not directly” is well-placed. When Boechera 64070 was an active taxon, it was directly circumscribed by the names of the species taxa it contained (plus any synonyms on the Taxonomy tabs of those species). After the taxon change, those names (of taxa) are still accessible with the additional step of following the “successor taxa” returned by the API.

It may not be easy or obvious, but I don’t think that translates to it not being credible. If the complete POWO synonymy were always listed on iNat taxon pages (or in some better data structure within iNat), would that make it any less likely that “the people using it don’t know what it is?” Most who are going to care will be the small subset who are taxonomically inclined, and they will know (or be motivated to figure out) how to get to the information.

Unless you are asserting that the names in POWO are also non-ICNafp-compliant, then I’m still unclear. It seems like the incompatibilities you are highlighting are about data structures and labels and ease of retrieval, and not about the data content itself. ICNafp is all about the content (e.g., the names).

I don’t think the error is yours. :-) The concept of synonymous taxa is implied by ‘current_synonymous_taxon_id’. I don’t think the information in that field can be interpreted in terms of syonymous names.

That would be an improvement for scientific names, assuming that we’re both correct in believing that ‘is_valid’ is consistently used this way. I don’t know if it would make sense for common names, but I don’t know what the current ‘is_valid’ means in that context, either.

As of my comment yesterday, my guess was the same as yours in this case: The list of taxa that were included within “Boechera 64070” should be the same as the list of taxa included in the three 'current_synonymous_taxon_id’s. I wrote “I think there’s a pathway” rather than “there is a pathway”, though, because I don’t know how future taxon changes will affect our ability to infer the list of taxa included within “Boechera 64070”.

However, I’ve been thinking about it today and have poked at a few more examples. I thought that the list of included taxa disappears after a taxon change that makes a taxon inactive, because they disappear from the inactive taxon’s page. However, there is still a list of included taxa in the API output. This is not the list of taxa that were included in that taxon prior to the taxon change. I believe it is the list of taxa included in that taxon prior to the taxon change whose names changed as a result of the taxon change. “Boechera missouriensis 204256” is still in “Boechera 64070” because after the change it became “Borodinia missouriensis 1068238”. “Boechera fendleri 334620” is not on the list because it is now included in “Boechera 1068235”. If you want the list of taxa that were included in “Boechera 64070”, then, at the time of the taxon change that list would be: all taxa still included in “Boechera 64070” + all taxa included in “Boechera 1068235”. That’s nice, because it saves us the trouble of working backward through all the taxon changes at lower levels—i.e., of figuring out that “Borodinia missouriensis 1068238”, which was not included in “Boechera 64070”, corresponds with “Boechera missouriensis 204256”, which was. However, was “Boechera quadrangulensis 1403771” included in “Boechera 64070”? I happen to know that the answer is no—that name did not exist during the time “Boechera 64070” was an active taxon on iNaturalist. I’m betting that the information to figure this out exists in the back end of iNaturalist, but I don’t know. I can’t find this information in the public web page or the API features documented in iNaturalist’s public API references. So, no, I don’t believe we can infer the list of taxa included in “Boechera 64070” by the additional step of following the 'current_synonymous_taxon_id’s.

The fact that the API lets us access a list of the lower-level taxa included in an inactive taxon also lets us find lets us find cases like this. None of the taxa included in the inactive “Primula subgenus Sphondylia 742750” should be moved to a new parent under the same name, so I think we can assume that the list of taxa accessible through the API is a faithful record of the taxa included in “Primula subgenus Sphondylia 742750” prior to the taxon change. We can compare that to the list of taxa included in “Evotrochis 1452560”. Is there a one-to-one correspondence between the two lists? Four of seven taxa in “Primula subgenus Sphondylia 742750” correspond with taxa in “Evotrochis 1452560”; likewise, four of seven taxa in “Evotrochis 1452560” correspond with taxa in “Primula subgenus Sphondylia 742750”. So, no, it is not safe to assume that the list of taxa in the ‘current_synonymous_taxon_id’ is the list of taxa included in the inactive taxon.

I also started checking what happens in other contexts, e.g. what about a taxon split at the species level? Neither of the output taxa has information related to that split in the API output. In this case the information definitely does exist in iNaturalist—we’re looking at it on the taxon split page—but I can’t find a way to access that information in a form that would be usable in scripting rather than accessible by the “look at text” method.

While the needed information may exist, if there is a reliable process for inferring the circumscriptions of inactive taxa across different contexts, I can’t figure out what it is… and I don’t think the public-facing resources provide all of the pieces we would need.

Absolutely!

Nomenclature is certainly more than a list of names.

E.g. the only way I can understand what this taxon change means is by figuring out how it relates back to ICNafp concepts—priority, superfluity, etc.

Sorry, forgot this bit:

Yes—that’s why I think associating identifications with names is probably the best overall way of dealing with the situation. I.e., the identifier applies the name under which they know the taxon, the observation is associated with a taxon separately, based on what taxon includes that name.

Associating the identifications with taxon concepts is an implicit claim: “This is the taxon concept the identifier was using.” That claim is probably false a lot of the time.

But I’m saying the error was mine - I meant names. iNat calls them taxon records, but when a taxon record is first created in iNat, all it consists of is (for plants) an ICNafp-compliant name, its rank, its parent at the next higher rank, and (hopefully) a source reference (usually POWO). It only starts representing a taxon, in the sense I think you mean, when the name becomes attached to observation records, and when other names are (and are not) included in synonymy (manually or via POWO). Again, to me it’s a problem with data labeling, not data content.

At least on the public-facing interface, is_valid is implied to be relevant only to scientific names and meaningless for all others. But maybe in the back end it is used to indicate which common name in a particular lexicon is the default (=currently accepted) for that lexicon? I wouldn’t know, but maybe you can tell from your API explorations?

That makes sense now that you point it out.

If that name (or a nomenclatural synonym thereof) does not come up in the list of "all taxa still included in Boechera 64070 + all taxa included in Boechera 1068235”, then you have your answer: B. quadrangulensis 1403771 was not (and is not) part of the concept of Boechera 64070, but is instead part of a different Boechera concept.

Yes, but ICNafp is nothing more than the rules by which the correct name of a taxonomic group is determined. As long as iNat is using correct names, then there is nothing out of compliance. Granted it may be harder to determine that here than in some other taxonomic contexts, and may require reference to external sources like POWO. But that is no different than it is among other taxonomic works (including POWO), even without iNat in the picture.

I would venture no more so or less so than when identifications are made by specimen collectors, or in herbaria, museums, or other contexts outside of iNaturalist. And that’s to be expected since most such identifications are not being made by a taxonomic expert in the group under consideration, here or elsewhere.

In the context of iNaturalist, the best available assumption for plants is that the then-current POWO concept was being used (unless iNat taxonomy was deviating). And whether intentionally or accidentally, I would venture that ends up being true a lot of the time, especially since the vast majority of iNat users are not doing significant identification work there. Being able to reliably test such an assumption is probably no more or less difficult on iNat than it is in most other identification contexts.

Substituting “names” for “taxa” increases my disagreement, though.

If I understand correctly, you’re saying that, when “Boechera 64070” is an inactive taxon and the others are active, we shouldn’t interpret ‘"current_synonymous_taxon_ids”:[868981,902757,1068235]’ as synonymy in the ICNafp sense. However, this is just a polarity error—if “Boechera 64070” were active and the others inactive, it would be correctly interpreted as synonymy.

I think interpreting it as synonymy is an error either way. It sort of works in this case (except for “Boechera is a synonym of Boechera”). In other cases, this interpretation will be misleading.

I don’t know if there’s a consistent usage, but common names do have associated ‘is_valid’ values. Since these data exist, the name of the attribute they’re stored under presumably has some effect. I’ve only happened across one common name with ‘“is_valid”: false’, where it marks a name that is culturally offensive.

Boechera quadrangulensis 1403771” is included in "Boechera 1068235” but neither it nor a nomenclatural synonym was or is included in “Boechera 64070”.

My viewpoint is that the purpose of associating an identification with a taxon
is to let downstream data users know that “identification as [taxon]” can be interpreted as “identification as [any of the names included in taxon]”. Surely someone who considered Borodinia and Yosemitea to be synonyms of Boechera and decided today to list all names in the genus would include Boechera quandrangulensis. However, that particular nomenclatural circumscription of Boechera is not associated with any identifications in iNaturalist.

Well, my point was: I don’t expect ICNafp rules to predict the behavior of iNaturalist’s taxonomy. I’ve learned that “is considered an error under the ICNafp” and “is considered an error in the iNaturalist taxonomy” should be evaluated separately.

In herbaria, most of the time we have no record of the taxon concept associated with an identification. That’s different from having an incorrect record.

If you ID a plant using, say, Welsh’s Utah Flora and type the result into iNaturalist, what are the odds that the taxon you keyed to in the Utah Flora has the same circumscription as the taxon you ended up with in iNaturalist?

Presumably that’s a special case of a broader question we would want to answer: What proportion of observations belong to taxa that have the same—or close enough that it really doesn’t matter—circumscriptions across most identification resources?

That should be answerable, but it’s not the kind of question I’d expect people to have good intuitions about. I doubt “just assume it’s 100%” is a good idea.