Improving iNaturalist's nomenclature & taxonomy

aspidoscelis · October 5, 2022, 2:27am

Greetings,

This is formatted as a feature request, because I originally submitted it as one. However, the iNaturalist admins are reluctant to allow new feature requests related to flexibility & interoperability of iNaturalist data in the context of alternative taxonomies. So, I post it here instead.

Knowing that multiple taxonomies within iNaturalist is not a viable option for the foreseeable future, I have attempted to come up with the minimal set of modifications that would make iNaturalist compatible with multiple taxonomies. The most important of these would be allowing observations to be identified as inactive taxa, which could potentially be done without any other modifications to iNaturalist. The remaining components of my proposal are intended to make it easier to correctly understand and work with the iNaturalist identification data, both within the iNaturalist UI and in the context of data exported to some other workspace.

Platform(s): All.

URLs: Any pages where identifications show up.

Description of need:
The flexibility, transparency, and cross-platform interoperability of identifications are limiting factors in some use cases.

Suppose we have two taxonomic viewpoints. I’ll use a Yucca example. Taxonomy 1: Some people on iNaturalist, and let’s assume for the sake of argument I’m one of them, consider Yucca torreyi and Yucca treculiana to be separate species. Taxonomy 2: Some, and let’s say person 2 is one of them, consider Yucca torreyi to be a synonym of Yucca treculiana. If both I and person 2 are IDing plants to the best of our ability, we’re going to end up with a lot of unproductive non-disagreements. I call a plant Yucca torreyi, they call it Yucca treculiana, but we don’t actually disagree about what the plant is. How do we resolve this? One approach is to decide on the “official” taxonomy and try to get everyone to follow it. This has a few problems, three of which seem most significant to me.

If we pick one side and declare it correct, we’re also telling the other side they’re wrong. And we’re asking them to make IDs that they believe to be incorrect. This tends to be discouraging and alienating. Sure, we can tell people to get over it, that an emotional reaction here is silly. Inconveniently, people have emotional reactions whether they’re silly or not. Trying to tell them what emotional reactions they should have is, again, discouraging and alienating. Do you want to spend time on a platform that devalues your viewpoint?

Related to the above, but with less focus on emotion: Asking people to use a taxonomy that they think is incorrect is likely to degrade their ability to provide accurate IDs. It imposes an additional cognitive load (you have to do constant “correct taxonomy” to “iNaturalist taxonomy” translations) and you may be asking people to use a taxonomy that they do not understand as well. Also, in practice, many people just aren’t going to do it, often without even realizing it. Suppose iNaturalist accepts taxonomy 1. I’m happy with that, but person 2 might just identify everything as Yucca treculiana, leading to unproductive non-disagreements as mentioned above. If person 2 is a casual iNatter just entering the name they believe to be correct in the app, there’s a good chance person 2 won’t even know that the iNaturalist taxonomy differs from their own. Suppose iNaturalist accepts taxonomy 2. Person 2 is happy, but most of the time I can’t enter the identification I believe is correct. Do I ID Yucca torreyi as Yucca, because that’s the most precise, correct ID I’m allowed to provide? Do I ID Yucca torreyi as Yucca treculiana, going along with the iNaturalist taxonomy whether I think it’s correct or not? Do I just steer clear of providing IDs on any of the things? (As someone who encounters this situation, personally I have done all three in different contexts, but for taxa I particularly care about I usually opt for genus-level ID.) How well will anyone be able to infer my intent from the IDs I provide?

The third big problem I see is coordination between groups in and outside of iNaturalist. For just about any research in which it can be useful to trade less complete per-observation documentation (compared to physical specimens) for the greater number and wider accessibility offered by digital observations, iNaturalist is the best tool out there. OK, I haven’t systematically tested every platform out there, but so far as I can tell none of them are even close. I think iNaturalist was intended to be a citizen science tool and has, perhaps somewhat accidentally, ended up being one of the best sciencey science tools out there. But! Suppose you’re working on a big ecological monitoring project. You’d like to have as good documentation field crews’ plant IDs as you can, but you’re collecting plant observation data at a scale that physical specimens can’t remotely cope with. iNaturalist is the obvious solution. Collecting digital observation data without iNaturalist would require you to reinvent the wheel, and your organization has IT constraints that would make it very difficult to ever achieve the kind of accessibility, feedback, and collaboration outside your organization that iNaturalist offers. (This is the situation I’m in for my day job.) How well can you take advantage of all the capabilities that iNaturalist offers without committing your organization to use iNaturalist’s taxonomy? When the taxonomies differ you start running into all kinds of problems. It’s confusing for the field crews to translate between “our name” and the “iNaturalist name”. The basic goal of documenting what plants were called what name by the field crews gets a lot harder when iNaturalist doesn’t allow you to enter the name used by the field crews. And when the name changes afterward due to subsequent IDs, the user-friendly ways of searching observations stop working (unless the crews all opt out of community ID!). Suppose you’ve got a situation like the yuccas above–if the field crews record both Yucca torreyi and Yucca treculiana on a plot in southern New Mexico, I’m going to want to check those IDs. If iNaturalist only recognizes Yucca treculiana, it’s hard to keep track of what’s going on. More generally, getting my organization to commit to the iNaturalist taxonomy increases internal resistance to iNaturalist as a solution. It also makes it more important to me / my organization that the iNaturalist taxonomy be “correct”, and as the specific details of that taxonomy become more important to more people the likelihood of zero-sum conflicts over taxonomy go up.

I know there are potential workarounds for many of these issues. I also know that they’re convoluted and cumbersome enough that you start to lose the advantages that make iNaturalist great. Within iNaturalist as it is now, the best solution I’ve come up with for addressing these issues in a systematic fashion is to shoehorn a parallel taxonomy into a new observation field and ignore the iNaturalist taxonomy entirely. This is not a good solution for many reasons, but it’s doable and gets a lot of birds with one stone.

When I’ve tried to have this discussion in the past, I’ve mostly gotten two kinds of responses: people telling me that these aren’t actually problems, I’m just doing taxonomy or iNaturalist (or both) wrong; people thinking I am proposing something that would be very confusing to iNaturalist users and very difficult or just not feasible to implement technically. With regard to the first category of response: Please, don’t. Just accept that other people have different experiences and use cases than you. With regard to the second category, I think I can identify a pretty small and minor set of changes to iNaturalist that should not adversely affect the user experience for most iNaturalist users or involve anything radical on the technical side.

Previous responses are also why I’m going into a level of detail here that probably seems wildly excessive. I’ve been led to believe that the baseline level of skepticism is very high when it comes to believing that there is any real concern here that could be taken seriously, and that there is a way to make progress that wouldn’t be wildly destabilizing to iNaturalist as a whole.

Feature request details:
There are two parts:

At present, when there is a change in the iNaturalist taxonomy, this is handled by creating new identification records. Instead of creating new identification records, have a change in the taxonomy be reflected by a change in the content of a new field or fields for existing identification records.
Allow any name, including “inactive” names, to be entered as an identification. Use the new field(s) from “1” to store the corresponding accepted name in the current iNaturalist taxonomy.

For both parts, have user settings where the default is going to be simpler and close to the current iNaturalist experience. For “1”, a default like “just show me iNaturalist’s current accepted name, and only use that name when searching by taxon”, with the alternative being “show me both the original name and iNaturalist’s current accepted name, and give me the option to search by either one”. For “2”, a default like “only let me enter iNaturalist’s current accepted name as an ID” with the alternative being “allow me to enter any name in iNaturalist as an ID”. Under the default settings, everything looks the same except that the greyed out original IDs wouldn’t be there. We could go from this:

To this:

Or maybe even drop the little taxon swap flag (I’m not sure how useful it is for most users) and get this:

I think the current greyed out original ID / separate new taxon swap ID system is a compromise that is not great for anyone. I’m guessing it’s confusing and unnecessary for most users who aren’t interested in the taxonomic details without really meeting the needs of those who are. All I can say for certain, though, is that as a user who wants the details, I find it counter-intuitive and unhelpful.

I’m also thinking that, viewed under default settings, if I entered an inactive name that iNaturalist maps onto Tomostima cuneifolia, my ID would show up just like the second or third images above.

For users selecting the “more detail” option for “2”, I’m imagining the UI would look something like this:

In terms of what the underlying change in the data would look like, let’s take the two identifications in the first image above and make a table with a minimal set of fields needed to convey the situation. I end up with the table below, where “id” is an ID number for each identification record, “taxon_id” is an ID number for each taxon, “is_change” is “false” for an identification entered by the user, true for an identification created automatically from a taxon change, and “taxon_active” is “true” when a taxon is active, “false” when a taxon is inactive:

If we reformat the same data as suggested in “1”, we create a new field, “users_taxon_id” to store the ID number for the taxon entered by the user, we drop the “is_change” field, and we change the name of “taxon_active” to “users_taxon_active” to indicate that it describes the thing in “users_taxon_id”, not the thing in “taxon_id”:

The current code to implement taxon swaps would basically need to be changed to update taxon_id to the “new” taxon_id, rather than creating a new identification record.

Let’s suppose I want to create an identification as Draba cuneifolia after that taxon swap was implemented and Draba cuneifolia became inactive on iNaturalist. Now, when we type text in the taxon box, the system searches for that text in the set of active taxa. Search all taxa instead, put the matching taxon_id in “users_taxon_id”, put the taxon_active value in “users_taxon_active”, and if “users_taxon_active” is “false” do a quick lookup to find the current accepted taxon_id, put that “taxon_id”. We end up with:

If I want to create an identification under the current active name, Tomostima cuneifolia, users_taxon_active = TRUE should just mean the value from users_taxon_id is copied to taxon_id.

The real data has more fields, of course, but that’s the basic idea. Restructure the data to give a straightforward and explicit relationship between what the user called the plant and what the iNaturalist taxonomy is calling the plant, and treat a user entering an inactive taxon after it’s inactive exactly the same as we treat a user entering an inactive taxon before it’s inactive.

What if we want to use a name that just isn’t in iNaturalist? We already have the “taxon_active” field, all we need is a little checkbox or something in the “new taxon” UI so that when we’re creating a new taxon we can say it’s inactive. Then we need one of those little taxon lookup doohickies from the taxon swap page so that we can tell it what the currently accepted taxon is. The parts exist, they’re just not attached to each other in the UI at the moment. One could, also, simply use the new taxon interface as-is, then create and commit a new taxon swap for it. Functional enough, but inefficient.

The various places where you can search observations by taxon would need a little checkbox or something for “search ‘taxon_id’” vs. “search ‘users_taxon_id’”. Same functionality, just a switch to point at one field vs. the other.

How feasible this is to implement on iNaturalist, I don’t really know. When it comes to coding, I use R and I have delusions of adequacy. If I were working with tabular data in R and a shiny web app, I know that the changes I’m describing here would not be a big deal, though they would take longer than I expect because everything does. The scale of iNaturalist would create a lot of difficulties I don’t know anything about. Luckily, people at iNaturalist, unlike me, know about those difficulties. So I’m hoping the ineptness differential might cancel out the scaling difficulty differential. :-)

Does this solve the problem?:
It gets a lot of the way there and would lay groundwork for further improvements.

Let’s return to the Yucca example. If iNaturalist uses taxonomy 1, there’s still potential for unproductive non-disagreements. If iNaturalist uses taxonomy 2, both I and person 2 can enter the IDs we believe to be correct. For users using the default settings, our non-disagreements get hidden and don’t affect the community ID or any of the other functionality they’re interacting with. We can also both search for “the plants I called Yucca treculiana” easily. For users who set their preferences to more taxonomic / identification detail, they can look at our non-disagreements, search for the plants each of us identified as one taxon or the other, and so on. So, big improvement in one context, little change in another context.

With regard to the “you’re wrong” issue, personally I would consider this major progress, though not perfect. If iNaturalist adopted taxonomy 2, I might still be irritable about it on occasion. To me, though, data integrity related to accurately capturing my ID and having it be findable in the data is much more important. If I think it’s Yucca torreyi, I can ID it as Yucca torreyi, and I and others who may care can easily and reliably tell that I IDed it as Yucca torreyi, that gets rid of something like 90% of my heartburn. Having the community ID also be Yucca torreyi would be nice, but the community ID is not something over which I feel like I should have control, while my ID is. I don’t know how idiosyncratic I am in this regard.

With regard to putting people in a position where they’re less able to do good ID work: I’m inclined to think this problem would be basically solved in a structural sense. People would be able to use the names they believe to be correct and leave translating to the iNaturalist taxonomy to iNaturalist. Names that just aren’t in iNaturalist would continue to be an issue to some extent.

I think coordination would basically be in the same boat, solved in a structural sense but with some ongoing content limitations arising from names that just aren’t in iNaturalist. Crews could just enter the name as used in our organization without worrying about whether iNaturalist uses the same name. Searching for “the plant the crew called by this name” would become easy and reliable whether our taxonomy is the same as iNaturalist’s or not–just select the “search users_taxon_id” filter. And so on.

There are quite a number of details that would remain to be worked out, but I think these are basically details that aren’t handled well now, so immediately solving all of them doesn’t seem like a reasonable expectation. I’m ignoring these details for the moment because, believe it or not, this is me trying to be concise.

Also: yes, Yucca treculiana is the correct spelling. :-)

earthknight · October 5, 2022, 4:14am

That would seem to introduce an enormous amount of unnecessary complexity and informational confusion at multiple levels.

I certainly understand the desire, especially in the case of taxon changes where species level identifications are lost and everything has to be redone (looking at you Siberian/Amur Stonechat split - among others), but in cases such as the Yucca sp one given I’d prefer to keep iNat using a single nomenclature source (such as the KEW Plants of the World database) and sticking with it, even if there are later found to be errors or changes necessary. If you enter in a name that’s considered older or a synonym iNat generally recognizes it, and in the info page for the species there is also usually additional information concerning debate over whether different names refer to different species or are synonyms.

Personally, I don’t think this is a necessary change, and I think it is one that would add to complexity, confusion, and potential misinformation.

aspidoscelis · October 5, 2022, 4:20am

I’m curious where you see complexity and confusion arising.

jdmore · October 5, 2022, 4:26am

After reading through your ideas a couple of times, I may still be missing something, but I’m wondering if the following would be adequate (if not ideal) to address your use case. If so, I think it would fit even better into the existing iNaturalist infrastructure and functionality (and therefore maybe have a better chance of actually being implemented someday). It definitely puts more of the burden on those who want to use alternate taxonomy, but I’m guessing that use case will remain pretty small relative to the overall user base of iNaturalist, so maybe such burden is appropriate.

Per your item #2, allow selection of inactive names for identifications (maybe by adding a check-box for “include inactive names” in the name-picker).
Ignore IDs of inactive taxa for purposes of calculating current iNaturalist Community ID, Observation ID, agreeing/disagreeing IDs, ID counts and leaderboards, etc.
Improve observation search capabilities to make it easier to filter observations by any ID taxon (active or inactive) or any ID-taxon + ID-user combination.
Taxon swaps and other changes create both inactive taxon records and inactive identification records. Have that remain the case. To create/revive an identification using an inactive taxon after a taxon change, the identifier would need to add a new, active identification on the observation, selecting an inactive taxon as the ID taxon.
A user could still have only one active ID at a time on any given observation. Their choice whether they want it to be of an active or inactive taxon, with the above consequences pertaining depending on the choice.

If this last item is a deal-breaker, maybe it wouldn’t be too difficult to allow one active ID each for an active taxon and an inactive taxon at one time. But again, the more change to existing functionality, the less likely implementation becomes…

Also, BTW,

That checkbox (radio button, actually) already exists, last time I looked…

Unless the names are nomenclatural synonyms, I would argue that you do actually disagree about what the plant is, because the names represent different taxonomic concepts (circumscriptions). In your example, Yucca treculiana contains all of Yucca torreyi, but Yucca torreyi does not include all of Yucca treculiana.

aspidoscelis · October 5, 2022, 4:42am

In case this clarifies: Suppose iNaturalist uses taxonomy 2 and considers Yucca torreyi a synonym of Yucca treculiana. I go to an observation and type “Yucca torreyi” in the ID box. iNaturalist looks up the name and gives me Yucca treculiana as an option in the list. I select Yucca treculiana and my original ID as Yucca torreyi goes poof and disappears. Here’s the change I’m suggesting: it gets saved in users_taxon_id.

Suppose you have no interest in what’s in the users_taxon_id field. You leave your user settings at the defaults and you never see it.

marina_gorbunova · October 5, 2022, 5:30am

I see the problem in the base of it, our botanists just deal with iNat sticking to POWO, if someone else wants a split or thinks instead its one species, you can contact them and if you have a paper showing you’re correct, they’ll make a change. Otherwise, there’re pretty much always child taxa you can use. If none, and nothing can be done and you still believe you’re correct, add a description or tag and just id the species you can choose. Why adding a completely new “user taxon” if you already can see what users think it is because they write about it?

aspidoscelis · October 5, 2022, 5:31am

Greetings, Jim,

I think the only substantive difference between my proposal and your suggested alternative is that I think it would be better to restructure the identification data so that a taxon swap changes the “taxon_id” field in an existing identification record rather than creating a new identification record. However, while I think this would reduce potential misinterpretation of the data & make it easier to work with, it’s not a necessity for bare minimum interoperability of iNaturalist data with data using other taxonomies.

That change in structure then has a bearing on your last bullet. Everything gets reframed from “which identification record do you want to look at” to “which field do you want to look at”. I’m not sure of any reason why one would want to have multiple active IDs, nor any reason why I, at least, would want to opt out of community ID.

If your ID is there and you can choose a setting to see and interact with observations labelled by that ID, I’m sure there are still people who would find the idea of the community ID being something else grating. However, this gets into “I want to control how other people interact with my data even though I can interact with my data in precisely the way I want” territory. At that point I think it’d be reasonable to just say, “Sorry, but no,” and kill off the opt out option entirely.

Even without that change in structure, I can’t think of a reason I would want to have more than one active ID. So, not a deal-breaker, not even something I had thought was on the table.

“That checkbox (radio button, actually) already exists, last time I looked…”

That’s convenient. :-) I guess I just forgot about it, since I’ve never used it. I’m not sure, in the current iNaturalist system, when I would want to.

“Unless the names are nomenclatural synonyms, I would argue that you do actually disagree about what the plant is, because the names represent different taxonomic concepts (circumscriptions).”

It’s easier if we think about the hierarchical relationships separately of the names and ranks. If one person identifies a plant as “A or B” and a second person identifies a plant as “A”, do they disagree? “A or B” and “A” are different things, but not incompatible; one is a subset of the other.

aspidoscelis · October 5, 2022, 5:42am

Imagine that iNaturalist as a whole worked this way. If you want to know what taxon each observation is, you have to open it and read through what’s in the Notes and Comments.

jdmore · October 5, 2022, 5:44am

Being able to do both would allow one to simultaneously attach one’s preferred inactive taxon to an observation, and also contribute to the community ID using active iNat taxonomy.

It’s often done when creating new output taxa prior to a taxon split or other change. If the time until committing the change is likely to be extended, having new output taxa sitting there actively available for new IDs may not be a good thing.

It depends on whether you interpret the broader taxonomic concept as “A or B” or “A and B.”

aspidoscelis · October 5, 2022, 7:21am

Suppose I add an identification record as an inactive taxon, and iNaturalist automatically generates a second “added as part of a taxon swap” identification record. As a visual UI, the whole grayed out with a line through it schtick is not good, but if I can filter by “my most recent identification that isn’t one of the taxon swap thingies”, it’s just visually annoying.

To me, at least, restructuring it as a single identification record and presenting it in a format like my third image reduces the visual annoyance dramatically. “If there are two lines of text, I want to look at the first one,” is the kind of visual processing rule that is quickly and easily learned. The restructured data would also be easy for me to interpret and work with in R scripts, whereas working with the data in its current form would require me to stare at the API documentation for a while and run various different test cases to figure out whether or not different field values really mean what I think they mean. There’s probably less common ground among people when it comes to what’s intuitive in coding than for a visual interface, though.

Ah, that makes sense.

Well, “A or B” lets people play nicely together, so why not stick with that? :-)

I was being simplistic, the longer answer is that I think we need to translate between taxonomies to understand what’s going on. So here’s a version with more verisimilitude. Person 1 says:
A is a synonym of B;
this plant is B.

Person 2 says:
A is not a synonym of B;
this plant is A.

The first statement gives a taxonomy, the second an identification. If we translate person 1’s identification “this plant is B” into the taxonomy of person 2, we get “this plant is A or B”. Why not “this plant is A and B”? Well, for the same reason that “言” is not a translation of “toast” into German. In the taxonomy of person 2, “A and B” is not an identification, it’s gibberish. (We could treat it as untranslatable gibberish rather than translating to “A or B”, but I can’t think of a context in which this would be remotely helpful.)

If we translate person 2’s “this plant is A” into the taxonomy of person 1, we get “this plant is A”. Or “this plant is B”, or “this plant is A or B”, or “this plant is A and B”. All of these are equally valid identifications in the taxonomy of person 1, and they’re all completely equivalent. (Yes, I’m leaving out priority—I said “more verisimilitude” not “perfect verisimilitude”).

In other words, neither taxonomy allows us to distinguish “A or B” and “A and B” as separate options. In one taxonomy, “A and B” isn’t a valid ID at all. In the other, “A or B” means precisely the same thing as “A and B”. (And if we want to pick concise English that is coherent in both taxonomies, “A or B” is it.)

jdmore · October 5, 2022, 8:05am

As long as you can filter observations for IDs of any taxon (active or not), I don’t see how this would be an issue. The IDs touched by a taxon swap would be recently-active and newly-activated iNaturalist taxa, not a previously already-inactive taxon.

To me this is a separate and larger issue, whether or not any suggestions from this topic ever get implemented. I would love to have a checkbox in the observation view to “hide inactive IDs.” But sometimes I would also want to view those inactive IDs in sequence with everything else. (Particularly since hiding them would throw any contemporaneous comments out of context.)

I’m guessing that coding convenience is probably not going to be a strong argument here given the relatively “niche” use-case.

Maybe I’m still missing something, but it seems like this supports my original suggestion that the differing IDs…

aspidoscelis · October 5, 2022, 9:15am

Yeah, that was my setup for saying “this is a bad UI issue, not an underlying functionality issue”.

Hence my suggestion to go to a user-selectable toggle between something like the two alternatives below. :-)

The issue of comments is something I’ve thought about a little, but not in detail. The taxon swap IDs shouldn’t have anything in the comments field, although I’m sure there are weird exceptions I haven’t thought of. So, if we combine the separate identification records into one, just plop in the comments from the user’s ID and drop the comments from the automated ID. This surely means that some of the comments end up being odd or confusing when they show up outside of their original context. On the other hand, interpreting what’s going on with a bunch of grayed out IDs is pretty weird and counterintuitive as it is, so I’m not sure the net change on that front is negative.

You’re probably right from a rhetorical standpoint. My intuition, though, is that there will always be very few people doing any coding with iNaturalist data compared to the number of people making observations and IDs on the platform, but that almost all use of iNaturalist data beyond our own little ecosystem is going to go through the hands of those few people in one way or another.

If the data format’s confusing and requires a lot of familiarity with iNaturalist’s particular quirks to correctly interpret, they’re going to be tripped up by that just like anyone else—often without realizing it.

I can’t tell what that disagreement is.

Yucca and Yucca treculiana are different taxonomic concepts. That doesn’t mean they disagree. They could disagree, if by “Yucca” I really meant “some kind of Yucca but not Yucca treculiana”. In the case of synonymy, that possibility is excluded. There is no “Yucca treculiana but not Yucca torreyi” when the two are synonyms.

The option that’s gibberish is the one that would produce disagreement.

aspidoscelis · October 5, 2022, 10:07am

I think you’re viewing the taxonomy and the identification as a single, inseparable unit, whereas from my point of view they can and should be separated and treated differently. I think a comparison of two IDs is only meaningful if they are translated to a common taxonomy. Once in a shared taxonomy, we can compare the sets of plants that each ID assigns a particular plant to. When all members of set 1 are also members of set 2, an ID as set 1 is not in conflict with an ID as set 2. The taxonomic concepts are different, but we’re not comparing taxonomic concepts, we’re comparing identifications and we need to do the translation step first.

I think the key point here, for me, is that constraining the set of allowed identification inputs doesn’t in any way reduce the complexity of the underlying processes, we’re just limiting the system’s ability to record what’s going on and do accurate downstream translations or QA/QC. How much do we trust that giving people a constrained set of ID input options and a kind of ad hoc set of synonymies means that the translation step is happening accurately?

aspidoscelis · October 5, 2022, 10:13am

For that matter, suppose iNaturalist considers Yucca torreyi a synonym of Yucca treculiana… if you’re arguing that these are disagreeing IDs, you should probably find iNaturalist’s current behavior rather alarming!

(Sorry, getting carried away. On this topic I think the actual technical fixes are pretty straightforward, but the conceptual issues are hard to deal with without wanting to tackle all of them at once.)

jdmore · October 5, 2022, 10:35am

I was actually thinking about the stand-alone comments that might refer to a previous but now inactive/hidden ID in the activity stream on the observation.

But even with comments attached to IDs, I think I’m hearing you agree that those could lose their original intent if transferred to a replacement ID using a different name. As messy as it can be visually, there’s no good substitute for seeing each and every ID and comment in chronological sequence.

There is:

“Yucca treculiana including Yucca torreyi,” and
“Yucca torreyi but not Yucca treculiana”

1 agrees with (includes) 2
2 disagrees with (does not include) 1

I don’t think that supports the idea that

At best, you don’t know whether or not you agree or disagree.

To me, an identification always implies a particular taxonomy, via the name being used (and the junior synonyms included or excluded from that name). Unless two names are based on the same type, there can be no bi-directional translation between the two.

pisum · October 5, 2022, 12:22pm

how do your proposed changes work in the following situations?

recently, Andropogon tenuispatheus was carved out from Andropogon glomeratus. A. glomeratus still exists, but the carve-out was done in a way where all of the A. glomeratus in my area was translated to A. tenuispatheus. what if i believe that A. tenuispatheus is a case of unnecessary taxon splitting? with your proposed changes, am i allowed to continue to identify these as A. glomeratus while others identify this as A. tenuispatheus, without resulting in conflicting identifications?
suppose i’m searching for cases where people have identified plants as A. glomeratus. how does the system differentiate between a case where i’m looking for the original identification of the user vs. the translated identification?
when i export the results, i assume then even though i identified as A. glomeratus, i’m going to still get the results exported as A. tenuispatheus, right? or am i wrong about that?

aspidoscelis · October 5, 2022, 4:11pm

Yup.

From my viewpoint, the IDs are the acts made intentionally by users on that observation—this is one ID:

So I think collapsing the two identification records into one still means we are seeing each and every ID and comment in chronological sequence.

The “added as part of a taxon swap” identification records don’t report identifications made on that observation, they report changes in the iNaturalist taxonomy. That change history certainly is useful in some contexts and it still exists in both alternatives. The question is whether there’s a net benefit to copying part of the taxonomy change log into each observation’s identification records and presenting that bit of the change log as if it were in fact a set of identifications made by users. I don’t think so, in part because I view these faux identifications as misinformation that we have to train ourselves to interpret in a way that is at odds with how they are presented. Of course, once we’ve trained ourselves to do it, we tend to dismiss the possibility that this is misleading to others because we aren’t misled.

Not in the same taxonomy, there isn’t. You’re making an implied translation into a third—unstated and self-contradictory—taxonomy.

In any case none of this changes in my proposal. The alternatives are:

a record that says “aspidoscelis identified this as Yucca torreyi, which in our taxonomy is a synonym of Yucca treculiana”;
a record that says “aspidoscelis identified this as Yucca treculiana”.

The relationship between my identification of the plant and the name Yucca treculiana is identical in both cases. If you are correct, and I neither agree nor disagree with an identification of the plant as Yucca treculiana, surely this is an argument in favor of the first option.

I disagree, but I’m not sure I can give a satisfactory reply here without going too far down more rabbit holes. :-) I’ll send a couple of documents I’ve written in case they clarify anything.

In any case, I’m not proposing any change in the amount of taxonomic translation that happens on the iNaturalist side. It’s just about preserving more of the information needed to understand the taxonomic translations that are already happening.

aspidoscelis · October 5, 2022, 4:59pm

This would be handled identically to the current system.

You can identify any plant you like as Andropogon glomeratus, but if you and other users are defining the taxon differently there will be some set of observations where you have apparently conflicting IDs because of disagreement in taxonomy rather than disagreement in the identity of those observations. I don’t think it’s possible to resolve this without changes to iNaturalist that I don’t think would be feasible for the foreseeable future, so it’s in my set of “current problems that would still be problems under all alternatives”.

Suppose we think about taxonomic translation as a two-step process:

Look up the set of names included within the input taxon in the observer’s input taxonomy;
Look up the taxon (or taxa) in which the names in that set are included in the data user’s output taxonomy.

In this case, the input taxonomy (yours) is:

And the output taxonomy (iNaturalist’s) is:

If we’re translating from the input to the output, basically we walk left from “taxon” to “name” in the input taxonomy to get the set of included names, drop down to the second taxonomy, then walk right from “name” to “taxon” using that set of names. Andropogon glomeratus → (Andropogon glomeratus or Andropogon tenuispatheus). In order for iNaturalist to capture the data that would be needed to handle this properly, there would need to be some new structures for users to provide user-specific input taxonomies. In order for iNaturalist to do the translation internally, there would also have to be support for “A or B” identifications.

What I’m imagining is a checkbox somewhere, default is “search in taxon_id”, alternative is “search in users_taxon_id”. If I were dealing with tabular data in R, the code might look something like:

taxon_search_field ← “taxon_id”

if (checkbox) {
taxon_search_field ← “users_taxon_id”
}

Then the search function acts on whatever’s specified by taxon_search_field.

I haven’t tried to nail things down at this level of detail, yet—but, for me, I think the ideal situation that I don’t think would be a pain to implement would be a checkbox at the export step that says “give me a table with all the identification records, too”. Then if I want to assign a new ID field to the observations based on “most recent users_taxon_id by the observer”, that’s easy to do. Otherwise, presumably the values in the observation table would be the same as they currently are in exports.

For the moment, I’m ignoring some further details about whether “taxon_id” and “users_taxon_id” are given as ID numbers vs. names. That should just be a little conceptually easy but practically annoying data wrangling, though.

vreinkymov · October 5, 2022, 5:40pm

Thanks for suggesting this, it’s been interesting learning more about how peeople interact with iNat, especially from your comments in: Why opt out of Community ID

Could using the Scientific Name observation field address some of these concerns? Its been available for nearly a decade and would at least allow some searchability and immutability from taxon splits and disagreements, no?

Otherwise, you might be running into an issue similar to using Excel vs. rolling out something custom in R, where iNat is like Excel, where it’s used by most people to do things with data, but where it’s not always the perfect but gets the job done and R would probably do it better and faster, but it’s hard to convince people to switch.

pisum · October 5, 2022, 6:14pm

if it’s not going to handle a common case like this, then i don’t think what you’re suggesting here is the right path.

that’s a pain to implement.

that’s even more of a pain to implement. taxon is one of the main keys in the system. so a change like this is no small change.

Topic		Replies	Views
Taxonomy changes and preferences vis a vis long-term iNat data use General	51	1381	March 5, 2022
Flag identifications that are out of agreement with iNat's taxonomic structure? General	22	2133	April 26, 2020
Opting out of taxonomy changes General	41	1154	December 14, 2022
Changing of ID from specific species to higher taxa General	30	1323	February 16, 2020
What is a "taxon concept"? General	31	1811	December 31, 2022

Improving iNaturalist's nomenclature & taxonomy

Related topics