Improving iNaturalist's nomenclature & taxonomy

earthknight · October 6, 2022, 6:42am

It seems more like they’re saying that since it’s a very niche desire that may well be limited to just you (and perhaps a few others), it makes a lot more sense to handle this via your own individual data manipulation rather than trying to revamp the entire iNat system to satisfy the desire of a vanishingly small portion of the user base.

jdmore · October 6, 2022, 8:12am

In this example, the two identification records were created 4 months ago and 7 days ago. Any number of independent comments (and/or other peoples’ IDs) could theoretically have been added during that interval. By collapsing the two ID events into one visual display, the sequence would be changed and become harder to interpret.

I do agree that this example represents a single ID, but only because the two names involved are nomenclatural synonyms that map to each other 1-to-1. But if this had been a split or a merge, instead of a swap, the mapping of taxa in old vs. new iNat taxonomy would be 1-to-many or many-to-1, and could not be considered the same ID.

So for example, if this had been a split of Yucca treculiana (s.l.) => Y. torreyi + Y treculiana (s.s.), the new and replaced ID records could not be said to represent the same ID.

BTW I do still wish this proposed change would be implemented for taxon change IDs:
https://forum.inaturalist.org/t/improve-wording-of-ids-created-by-taxon-changes/3705

Actually I’m referring back to the concept you originally stated as

My assumption is that this whole discussion is predicated on allowing an iNaturalist user to track and filter observations using different taxonomy (and taxonomic concepts) than the “official” iNat taxonomy. So if one of my two examples above represents the user’s preferred taxonomy, and the other is iNat’s current taxonomy, they are indeed two different taxonomies that do not agree. Depending on the direction in which you are trying to reconcile the two taxonomies, it is either a 1-to-many problem

Y. treculiana (s.l.) => Y. treculiana (s.s.) + Y. torreyi

or a many-to-1 problem

Y. treculiana (s.s.) + Y. torreyi => Y. treculiana (s.l.)

I know you understand what I am saying here, so I must still be missing some other concept that you are trying to convey when you say that those two different names (and the concepts they reflect) are in agreement.

pisum · October 6, 2022, 1:18pm

i wouldn’t frame the general concept as a “niche desire”, but the specific proposal does effectively end up being one because the proposal is incomplete, in my opinion, as it attempts to address only a subset of the possible taxon changes that could occur in the system, and it fails to contemplate how the system would differentiate between cases that could be handled and what should be done when an unhandled case is encountered. it also fails to explain how the system will differentiate between an identification that was swapped and accepted by the identifier vs an identification is swapped but not accepted by the identifier.

but the bigger issue is that it is incomplete in terms of its description of what would actually need to be changed to achieve its goals. primarily, the proposal seems to conflate the taxon on identifications with the taxon on observations, which, although related, are very different conceptually. it describes specifically how to store taxon on an identification but fails to describe how taxon should be handled in the the context of the observation. i doubt that the proposal purposely intends to leave the observation taxon entirely alone because that would severely limit the resulting new functionality. but then as i noted before, changing the way observation taxon is handled in the system has widespread consequences.

but there are other non-technical considerations that i think are just glossed over. just for example, for disagreements in the taxonomy that don’t already exist in the taxon change history, it fails to contemplate how this will be accomplished – specifically, who will add a taxon that is not generally accepted, just to do a taxon swap (or as proposed, some simplified variation of a taxon swap). i sort of doubt that you want just anyone to be able to make those associations, but i think it’ll be a hard sell to have existing curators add these, if they weren’t willing to make a deviation. and remember how i noted above that the system needs to know what kind of situations it would be able to handle? well, now you need curators to manually evaluate and apply those same rules when working this kind of task.

in the end, the proposal here is just incomplete, with messy consequences – not something anyone will take action on anytime soon as a system change.

aspidoscelis · October 6, 2022, 4:24pm

We agree that there’s some potential for the meaning of past comments to become less clear. My point is that I don’t think there are two identification events. An identification is the act of a person applying a name to an observation. The “added as a taxon swap” object is neither an identification nor a comment, so removing it has no effect on whether or not we’re seeing every identification and every comment.

I would still say there was one identification. :-)

To me, the underlying problem is that we have this class of objects that are stored as identification records but aren’t identifications. They’re statements of the relationship of an identification to a taxonomy. This makes correctly interpreting what’s going on more difficult, both in the UI and for anyone working with the data in a scripting environment. Intuitive assumptions about the nature of the data tend to fail when conceptually different objects are conflated in the data structure.

I agree!

The taxonomies do not agree, but that doesn’t mean the identifications are in disagreement. We have to interpret the identifications within an internally consistent viewpoint—correct for the difference between the taxonomies—in order to tell if they agree or disagree. I’ll try expanding the original example to be more explicit, in case that helps.

Taxonomy 1: Some people on iNaturalist, and let’s assume for the sake of argument I’m one of them, consider Yucca torreyi and Yucca treculiana to be separate species.

Taxonomy 2: Some, and let’s say person 2 is one of them, consider Yucca torreyi to be a synonym of Yucca treculiana .

Suppose I identify an observation as Yucca torreyi and person 2 identifies it as Yucca treculiana. If we view these identifications extensionally, my concept of Yucca torreyi includes some set of plants, let’s call it set 1, and person 2’s concept of Yucca treculiana includes some set of plants, let’s call it set 2. We would say that the two identifications agree when: set 1 = set 2; set 1 is a subset of set 2; set 2 is a subset of set 1. We would say the two identifications disagree when: set 1 includes plants excluded from set 2 and set 2 includes plants excluded from set 1.

If we interpret both identifications as if they had been made under taxonomy 1, we would say they disagree, as Yucca torreyi and Yucca treculiana are non-overlapping sets in that taxonomy. If we interpret both identifications as if they had been made under taxonomy 2, we would say they agree, as Yucca torreyi and Yucca treculiana are identical sets in that taxonomy.

Instead, we should use the set definitions from the taxonomies of the identifiers. Yucca torreyi of taxonomy 1 is a subset of Yucca treculiana of taxonomy 2; set 1 is included within set 2. My identification is more precise—specifies a smaller set—than person 2’s identification, but they are not in disagreement. The situation is analogous to one identification as Yucca baccata and a second identification as Yucca (with the “no, I’m not saying it isn’t that species” option selected in the pop-up dialogue).

Since this is a bit of a rabbit hole, I suppose I ought to link this back to iNaturalist… my interpretation is that currently iNaturalist can not handle this case well, and there is not a feasible change that I can think of that would resolve the issue. However, let’s suppose iNaturalist adopts taxonomy 2. If we store the names used by the identifiers, my identification would look like:
“Yucca torreyi
accepted name → Yucca treculiana”
It would still be possible to correctly infer the meaning of my ID. Anyone identifying a plant as Yucca torreyi is treating Yucca torreyi as a distinct species from Yucca treculiana. This also means that if iNaturalist later switched to taxonomy 1—hey, we already know to intrepret my ID as “Yucca torreyi” rather than as “Yucca torreyi or Yucca treculiana”!

aspidoscelis · October 6, 2022, 4:41pm

My proposal here is somewhere in between where iNaturalist is now and what I think would be the ideal end state. This is a response on my part to constraints that are outside my control—I know that a complete proposal has absolutely no chance of being implemented by iNaturalist for the foreseeable future, so I am trying to find some smaller proposal that might merely have a very small chance of being implemented.

In other words, I agree entirely that my proposal here is too limited and incomplete, but you’re interpreting an external constraint as an error on my part.

Agreed, I’ve not attempted to nail everything down and I’ve left some problems unsolved. I’ve focused on storing and being able to interact with the name used by the identifier in the identification records because this is a more minimal and thus more feasible change than propagating that out to the community ID on observations. Yes, this means it’s incomplete. I would certainly welcome any suggestions for improvement. Finding flaws is, in my opinion, easy but not very helpful.

aspidoscelis · October 6, 2022, 4:52pm

To me, the interesting question here is “How do we improve iNaturalist?” not “Did aspidoscelis do a good job?” @pisum — do you have any suggestions for how to move the discussion toward the first question and away from the second?

pisum · October 6, 2022, 6:07pm

i think the interesting question here is “Does iNaturalist need to be improved?” or, looked at another way, “Should we change iNaturalist to handle this use case?”, and that’s the question that I’ve been trying to help you answer. i get that you don’t like that answer, but i don’t know that any amount of additional discussion is going to change the answer. if i had suggestions that would magically resolve the big issues here, i would let you know, but i just don’t have that kind of magic to offer.

jdmore · October 6, 2022, 6:08pm

Thanks for clarifying, I think I see now the crux of the misunderstanding. You say

I say that statement is true as far as it goes, but that the conditional should be or instead of and.

That makes more sense to me now, thanks. That is not the case, however, for taxonomic splits, where the system is actually making a geographically guided identification (which sometimes can’t be resolved, and is kicked back to a common ancestor instead).

aspidoscelis · October 6, 2022, 6:55pm

Well, “and” is the definition used presently in the iNaturalist identification system. E.g., if a taxon’s current community ID is Yucca baccata and I enter an identification as Yucca, the pop-up dialogue let’s me pick between a “not disagreeing” option (the identification is simply “Yucca”) and a “disagreeing” option (the identification is “Yucca but not Yucca baccata”).

Were I ignoring the question of compatibility with existing iNaturalist usage, I might be inclined towards three categories: agreeing (set 1 = set 2); differing precision (set 1 is a subset of set 2 or set 2 is a subset of set 1); disagreement (set 1 and set 2 both contain elements not included in the other).

aspidoscelis · October 6, 2022, 7:18pm

The answer to those questions is independent of the completeness of a particular proposal, though. For instance, the answer to, “Should we change iNaturalist to handle this use case?” could well be, “Yes, but in an entirely different way than the one suggested,” or even, “Yes, but we’re not sure how.”

If neither of us have perfect solutions, surely imperfect solutions are the available space for us to explore.

jdmore · October 6, 2022, 8:09pm

I guess that’s my point - agreement or disagreement cannot be determined by looking at the taxa and their circumscriptions alone. It requires additional user input on each ID event in cases like you described. It wasn’t clear to me if or how your proposal intended to query that user input when assessing agreement or disagreement between alternate taxonomies.

In the case where taxonomic splits add common-ancestor IDs due to geographic ambiguity, such user input will not be available, as the system always adds “agreeing” IDs by default.

aspidoscelis · October 6, 2022, 10:14pm

That’s not an issue in that particular example, though. We know that Yucca torreyi does not include Yucca treculiana because there is no taxonomy in which that is the case and by ICNafp rules there cannot be such a taxonomy. For person 2’s identification as Yucca treculiana, we know that they’re using taxonomy 2 because I said so in the setup, so we know that they don’t consider “Yucca treculiana but not Yucca torreyi” to be an option. And, of course, that’s why I specified what taxonomy was being used, to make the example unambiguous! :-)

Ok, now let’s make the example ambiguous. I identify the observation as Yucca torreyi and person 2 identifies it as Yucca treculiana, but we only have that information to go on. My identification is still clear without any additional user input. We don’t know which taxonomy person 2 is using. If that person is using taxonomy 2, their identification is agreement, of the “less precise than” flavor; if using taxonomy 1, their identification is disagreement. In order to distinguish those alternatives, we would need more information.

Under the current system, if iNaturalist is using taxonomy 2 both identifications would be recorded as “Yucca treculiana”. So, for each identification, we have the following set of possibilities, which are not distinguished: 1) the identifier is using taxonomy 2; 2) the identifier is using taxonomy 1 and believes the plant is Yucca torreyi but not Yucca treculiana; 3) the identifier is using taxonomy 1 and believes the plant is Yucca treculiana but not Yucca torreyi.

Supoose we switch to users_taxon_id + taxon_id and allow identifiers to use the name they believe to be correct regardless of whether iNaturalist considers it to be an active taxon or not. Now we can infer that my ID is possibility “2” from the list above, person 2’s is “1” or “3”. If we want to narrow person 2’s identification down to “1”, we need more information. However, my proposal is based on, “How do we get more useful information into the data with the absolute minimum change?” not “How do we get all of the useful information we might want into the data?” There isn’t an easy general solution when it comes to problems of the kind represented by “narrowing person 2’s identification down to ‘1’”. In that particular example, we can easily enumerate the possible options and imagine a dialogue box question to select between them, but that’s because there are only two names. The real world, annoyingly, is not that simple. Many species have long lists of synonyms and I don’t know of a generalizable approach that wouldn’t quickly become quite complicated.

Regarding how iNaturalist would use this information in doing its agree / disagree logic for getting to a community ID, that’s easy: it wouldn’t. At least, I’m not proposing that it would. Having more information creates more options for how to use and interpret the data in more sophisticated ways—by automated processes in iNaturalist, by users in the iNaturalist UI, and for uses of the data outside iNaturalist. I think exploring those options with regard to automated processes in iNaturalist would be a good idea in the long term, of course.

If we restructure the identifications to use “users_taxon_id” and “taxon_id”, we could imagine a split that first updates the identification records by the following rule: “when users_taxon_id = Yucca torreyi, update taxon_id to Yucca torreyi; when users_taxon_id = Yucca treculiana, update taxon_id to (Yucca torreyi or Yucca treculiana).”

Compared to all identifications being ambiguous, an alternative in which some identifications are still ambiguous, but some aren’t, is an improvement.

jdmore · October 6, 2022, 11:13pm

In a hypothetical split of Y. treculiana (s.l.), this is already how it works in iNaturalist. Except, when

update taxon_id to (Yucca torreyi or Yucca treculiana)

can’t be decided based on geographic location, it updates to Yucca instead, and the burden shifts back to the community to review and re-identify the affected observations as one species or the other.

aspidoscelis · October 6, 2022, 11:20pm

You mean in a scenario where iNaturalist would have changed from taxonomy 1 to taxonomy 2 and then back to taxonomy 1, right? So long as iNaturalist is using taxonomy 2, you can’t enter an identification as Yucca torreyi, so a split couldn’t act on that data.

If iNaturalist already has some of the tools to use this data, that seems to strengthen the argument in favor of recording that data in more contexts!

jdmore · October 6, 2022, 11:46pm

It depends on how “messy” the current iNat taxonomy is. In some cases, alternate taxonomies like that have existed in parallel in iNat for a while, before community input causes a curator to discover it and clean it up. In that case, they would leave existing Y. torreyi IDs alone, and split Y. treculiana with the existing Y. torreyi taxon being one of the output taxa for the split.

But yes, otherwise Y. torreyi wouldn’t be added to the system until just before the split, and would probably be left inactive until then (and so not available as an ID choice).

aspidoscelis · October 7, 2022, 12:00am

Right, Yucca torreyi might be available for use in identifications by accident.

Do taxon changes act on identification records other than the most recent record on the observation attributed to each user, though? I assumed that’s what you meant in your earlier comment, that a taxon split would assign an observation to Yucca torreyi based on identifications that had been withdrawn due to a prior taxon change. At least, that’s the behavior in the current system that would be analogous to my potential rule based on users_taxon_id values.

jdmore · October 7, 2022, 12:17am

No, they only act on active identifications, not those that have been withdrawn for whatever reason, and a user can only have one active ID at a time on an observation. Sorry if I gave an impression otherwise.

aspidoscelis · October 7, 2022, 12:47am

Thanks for the clarification!

jasonhernandez74 · October 7, 2022, 2:00am

What if iNaturalist distinguished between typing in Yucca treculina vs. typing in Yucca torreyi, such that, even though they might both display as Yucca treculina, it would remember which name you actually wanted to use? Then, if they are later re-split, it would go by the name that had been typed in when deciding which observation was which species.

aspidoscelis · October 7, 2022, 2:01am

Yup, that’s basically what I’m suggesting. :-)

(With added details like, let users see / search based on the “Yucca torreyi” identifications.)

Topic		Replies	Views
Taxonomy changes and preferences vis a vis long-term iNat data use General	51	1297	March 5, 2022
Flag identifications that are out of agreement with iNat's taxonomic structure? General	22	2083	April 26, 2020
Opting out of taxonomy changes General	41	1066	December 14, 2022
Changing of ID from specific species to higher taxa General	30	1258	February 16, 2020
What is a "taxon concept"? General	31	1732	December 31, 2022

Improving iNaturalist's nomenclature & taxonomy

Related topics