Improving iNaturalist's nomenclature & taxonomy

I really like this idea, especially since POWO (Plants of the World Online)'s lack of proper coverage of Southeastern US plant taxonomy makes me want to rip my hair out on occasion*.

I do agree it would be complex to implement, and I would want some imput from coders on how hard this would be to set up.

*So many species are synonymized under POWO that are valid splits (looking at Prunus alabamensis here, synonymized with Prunus serotina on iNat yesterday to follow POWO but a valid species genetically and morphologically). This is a discussion for another thread.

8 Likes

That’s the best solution I’ve found within the current structure of iNaturalist, though I would probably create a new observation field rather than using “Scientific Name” or any other existing observation field.

I don’t think this is a very appealling solution, though, for a variety of reasons. For instance, if I were to go to an observation field based approach for my own data and identifications, I would also cease using the iNaturalist taxonomy entirely—entering duplicate IDs on everything is just not viable, to me.

Ah, but I’m not suggesting anything that you couldn’t just do in Excel! Meant literally and metaphorically—I’ve played the “yes, you can implement relational databases in Excel although you really shouldn’t” game. :-)

I think iNaturalist is reinventing the wheel. There’s a whole pile of stuff in the iNaturalist taxonomy (taxon frameworks, taxon swaps, deivations…) that is unique to iNaturalist and just as complicated as any alternative I could come up with. iNaturalist is developing its own parallel-ish version of taxonomy, where a desire to create a simplified and more user-friendly approach to taxonomy is gradually giving us something just as complicated, but incompatible. So the solution isn’t to make it more complicated, but to restructure for compatibility and interoperability rather than becoming more entrenched in idiosyncracy.

4 Likes

It’s certainly not a perfect path. It’s an attempt to find the most improvement with the least change to iNaturalist. I’m interested in any alternatives, of course.

Could be. My estimation of what’s easy vs. difficult is mostly based on “would I know how to do this in a web application created by the shiny package in R?” In this case, the answer is, “Sure, that’s easy. Do a join, drop the extra fields, and send the output to the download widget.” This is about as simple as relational data tasks get, so I’m making the assumption that a platform that handles a ton of relational data can handle it. I could be wrong, of course, since I’m making my best guesses without having worked with the code in the development environment the iNaturalist folks are using.

Following your comment earlier, though, I might suggest that if iNaturalist can’t handle a basic task like this, it’s probably not on the right path. :-)

I’m curious why that would be the case.

2 Likes

to deliver something like identification details, you’re basically asking that iNat would deliver an optional second CSV file that contains identification details, along with the existing observation CSV. i would argue that most people wouldn’t know what to do with data delivered in this way. they would almost need to be delivered a single file with the identification flattened out, each into their own columns. but then that kind of flattened format actually makes it harder for those who know how to work with relational data.

if you’re going to get 2 different files, and you have to join them separately, i would argue that you should have the ability to figure out how to extract the same data from the existing API, into whatever format you like, and process the data however you like.

taxon shows up on just about every single part of system. if you’re fundamentally changing the way taxon works, you’re going to have to modify every part of the system. and then you have to ask for what benefit? we both already agree that

… so why would you want to put that much effort into redoing your entire system if the end result is not something that is robust enough to handle common use cases?

…

if you think you can handle you particular use case outside of the system easily enough, i think you should write your own interfaces or processes to automate your use case. the case for action that you’ve presented here is not strong, in my opinion. so if you want anyone to take action, you really just need to perform the actions yourself, i think.

the other way to handle your issues outside of the system is just to make the case that your preferred taxonomic structure is the one people should use. so you could make that argument with the curators of the taxa in iNat, or if iNat is getting stuff from POWO and curators are unwilling to deviate in specific cases, then make your argument to POWO.

3 Likes

I’ve always been puzzled why iNat seems to change my identifications after a swap, and that carrying out a swap updates the content in each affected observation record - sometimes many thousands of them. Why does it need to do that?

A collection management system I was involved in designing has a clear separation between name changes due to re-identifications, and name ‘interpretations’ due to changes in taxonomic opinion over time. I’m sure there are similar systems out there. iNat seems to lump these two quite different things too closely together. In those systems a specimen will get an initial identification, and perhaps subsequent re-identifications, but otherwise there are no other changes applied to identification records.

Separate to that is a taxonomic interpretation system. That deals with the nomenclature and taxonomy, and potentially multiple taxonomies, and the consequences of lumping and splitting. That is used dynamically to interpret the names currently applied to specimens but otherwise does not change them. There is no need to update thousands of records - just the way those records are currently interpreted.

When you make that fundamental separation between identification(s) and interpretation(s) then more sophisticated options become easily available - like handling multiple parallel taxonomies. The problem with such systems is that you potentially confuse end-users who might not understand the difference between the name used by the current ‘identifier’ and the current interpretation(s) of that name as presented on the screen, and I suspect it makes scaling to large data-sets problematic because of the difficulty in creating a supporting search index.

5 Likes

The data doesn’t exist in the API. :-)

Suppose you duplicate taxon_id and give the second copy a different name, “taxon_id2”. What needs to change to use taxon_id2 rather than taxon_id? References to that name.

I’m hopelessly lazy at handling variables in an intelligent fashion in my own scripts, so probably I would do it the stupid way, by a find / replace and checking for unintended bycatch. Although I am not a good coder, I do know that good coders have centralized variable definitions so that they don’t have to replace a ton of instances scattered all over the place. When I bowdlerize scripts from good coders, I rejoice at the glorious simplicity and ease this creates. :-) However, even doing it the stupid way, it’s not difficult, just tedious.

I’m assuming that aspects of coding that are sufficiently basic that I’m aware of the problem and aware of the solution—even if not very good at implementing the solution, myself—are probably not very challenging to people who, unlike me, actually know what they’re doing. That assumption might be wrong, but I think it’s a reasonable assumption.

For my own data handling in an R workspace, that’s where I am now.

I’m baffled by your viewpoint, here. It seems like you’re just saying that any change in iNaturalist is prohibitively difficult, while any data manipulation outside iNaturalist is trivially easy.

1 Like

Yes, your thoughts parallel mine in many ways. I think keeping different kinds of data separate and well-defined generally looks like more work in the short term, but lumping carries a lot of hidden costs in the long term.

Trying to avoid throwing too much information at users who will find it distracting or confusing rather than helpful is definitely an issue. This is not really the aspect of this I’m best suited to thinking through since I’m at the opposite extreme, but I don’t think the current system is very good at this, and that there are better solutions to be found. I’ve also been trying to think about this in terms of what kind of information would transfer well. New users are going to have to learn some new concepts either way, and some subset of new users are going to want to get more involved in taxonomy over time. So it would be nice to have the concepts they’ve learned on iNaturalist provide some good preparation. As someone going the opposite direction, though, I’m finding that a lot of the concepts you need to learn to figure out how iNaturalist works are pretty counter-intuitive to someone coming from a taxonomic background. I can only assume that this is the case in both directions.

it’s because you would need to do the translation just once at the time of the change vs every single time and point taxon is read / used, and you don’t need to keep a separate complete history of such changes.

not all taxon changes are simple 1:1 translations.

the observation taxon is determined based on a community ID algorithm applied to potentially many IDs, not just based on one ID.

3 Likes

Reducing the apparent conflict between UI legibility and back end processes here is part of the rationale for my suggested change in data sructure for identification records. Store the user’s original ID and the accepted name in the iNaturalist taxonomy in a single record. You only need to do the translation once, but you keep the original ID and the accepted name together, with their relationship clearly identified. In the UI you wouldn’t get this weird “it’s changing my identifications” dynamic where it’s hard to keep track of the relationship between the name a user put on an observation and the name an automated process in iNaturalist put on an observation.

And, as @cooperj mentions, this is not a new problem in iNaturalist, but something collection management software has been dealing with for a while. There might not be perfect solutions, but there are solutions that are known to be viable because they’ve been implemented. There are probably implemented solutions that are better than mine, and I just don’t know it.

1 Like

Among other things, it would increase nomenclature confusion among users.

As previously mentioned, many of the specific issues you raise concerning debates over taxonomy are already addressed if you look at the species specific information (not just the observation page).

This suggestion would seem to just add layers of unnecessary complexity.

If there is a taxon change that revitalizes a previously merged taxon, then you already get a notice about it, so it’s not like anything is lost there either.

iNat has to find a balance between being streamlined & user friendly, and being data rich (which is often the opposite of the previous two goals). Sticking with a single source for nomenclature (and said sources are listed), as well as already having accessible documentation concerning changes admins have made to iNat taxons, meets both goals decently well, and addresses pretty much all the concerns you’ve raised.

Does it take an extra step or two? Yes, but for someone who is concerned about that sort of thing it’s an easy couple of clicks to make to get more information, and the links are already right there on the observation page.

3 Likes

I’m afraid you’ve lost me. I can’t really connect anything in your post to what I’ve written; e.g., “specific issues”—I don’t know which issues you’re thinking of, or what information in the species pages might address them.

Species pages already shows synonyms, so info of potential different ids is there.

You know what’s hilarious about this whole thing? We’ve just admitted that scientific names have all the same shortcomings as the common names we disdain.

6 Likes

Well that’s always been the case. All scientific names are wrong in one way or another since they’re just human constructs that try to neatly define what’s not readily open to being categorized.

7 Likes

Then it shouldn’t be too difficult to allow someone to choose which synonym they consider most representative of a particular organism in a particular observation

Or just use one name because they all mean the same thing? If those are really different species - write to e.g. POWO and they can change it, then iNat will, if experts will say it’s actually one species with no subtaxa, then everything else has no point, just learn a new name or use common name that stays the same.

That’s not one of the problems I’m trying to solve. :-)

(The point is to have a record of the name an identifier actually applied to an observation, not to have a list of names an identifier might possibly have applied.)

If they’re synonyms you can say all of those names were applied to the observation. I think it’s fine to add the previous name to a new name or if you choose synonym and it will be just written on the id, but honestly I don’t see how iNat with so few workers and big problems to solve and aims to gain will implement the search tool, when we wait for 3 years for some changes, if your aim is to show user’s id, maybe it’ll be enough to just have it there and maybe in the future there’ll be a new url and all the other stuff?

Whenever we’re dealing with taxonomic synonymy, we have to start from the assumption that there are two alternatives that are in principle equally valid: that two names are synonyms, and that they are not.

Even with nomenclatural synonyms, there can be meaningful information conveyed by which one is used. Figuring out when that might be the case is left as an exercise for the reader. :-) I’m not sure how practical it would be to try to do anything useful with that information in a database context, though.