I really like this idea, especially since POWO (Plants of the World Online)'s lack of proper coverage of Southeastern US plant taxonomy makes me want to rip my hair out on occasion*.
I do agree it would be complex to implement, and I would want some imput from coders on how hard this would be to set up.
*So many species are synonymized under POWO that are valid splits (looking at Prunus alabamensis here, synonymized with Prunus serotina on iNat yesterday to follow POWO but a valid species genetically and morphologically). This is a discussion for another thread.
Thatâs the best solution Iâve found within the current structure of iNaturalist, though I would probably create a new observation field rather than using âScientific Nameâ or any other existing observation field.
I donât think this is a very appealling solution, though, for a variety of reasons. For instance, if I were to go to an observation field based approach for my own data and identifications, I would also cease using the iNaturalist taxonomy entirelyâentering duplicate IDs on everything is just not viable, to me.
Ah, but Iâm not suggesting anything that you couldnât just do in Excel! Meant literally and metaphoricallyâIâve played the âyes, you can implement relational databases in Excel although you really shouldnâtâ game. :-)
I think iNaturalist is reinventing the wheel. Thereâs a whole pile of stuff in the iNaturalist taxonomy (taxon frameworks, taxon swaps, deivationsâŚ) that is unique to iNaturalist and just as complicated as any alternative I could come up with. iNaturalist is developing its own parallel-ish version of taxonomy, where a desire to create a simplified and more user-friendly approach to taxonomy is gradually giving us something just as complicated, but incompatible. So the solution isnât to make it more complicated, but to restructure for compatibility and interoperability rather than becoming more entrenched in idiosyncracy.
Itâs certainly not a perfect path. Itâs an attempt to find the most improvement with the least change to iNaturalist. Iâm interested in any alternatives, of course.
Could be. My estimation of whatâs easy vs. difficult is mostly based on âwould I know how to do this in a web application created by the shiny package in R?â In this case, the answer is, âSure, thatâs easy. Do a join, drop the extra fields, and send the output to the download widget.â This is about as simple as relational data tasks get, so Iâm making the assumption that a platform that handles a ton of relational data can handle it. I could be wrong, of course, since Iâm making my best guesses without having worked with the code in the development environment the iNaturalist folks are using.
Following your comment earlier, though, I might suggest that if iNaturalist canât handle a basic task like this, itâs probably not on the right path. :-)
to deliver something like identification details, youâre basically asking that iNat would deliver an optional second CSV file that contains identification details, along with the existing observation CSV. i would argue that most people wouldnât know what to do with data delivered in this way. they would almost need to be delivered a single file with the identification flattened out, each into their own columns. but then that kind of flattened format actually makes it harder for those who know how to work with relational data.
if youâre going to get 2 different files, and you have to join them separately, i would argue that you should have the ability to figure out how to extract the same data from the existing API, into whatever format you like, and process the data however you like.
taxon shows up on just about every single part of system. if youâre fundamentally changing the way taxon works, youâre going to have to modify every part of the system. and then you have to ask for what benefit? we both already agree that
⌠so why would you want to put that much effort into redoing your entire system if the end result is not something that is robust enough to handle common use cases?
âŚ
if you think you can handle you particular use case outside of the system easily enough, i think you should write your own interfaces or processes to automate your use case. the case for action that youâve presented here is not strong, in my opinion. so if you want anyone to take action, you really just need to perform the actions yourself, i think.
the other way to handle your issues outside of the system is just to make the case that your preferred taxonomic structure is the one people should use. so you could make that argument with the curators of the taxa in iNat, or if iNat is getting stuff from POWO and curators are unwilling to deviate in specific cases, then make your argument to POWO.
Iâve always been puzzled why iNat seems to change my identifications after a swap, and that carrying out a swap updates the content in each affected observation record - sometimes many thousands of them. Why does it need to do that?
A collection management system I was involved in designing has a clear separation between name changes due to re-identifications, and name âinterpretationsâ due to changes in taxonomic opinion over time. Iâm sure there are similar systems out there. iNat seems to lump these two quite different things too closely together. In those systems a specimen will get an initial identification, and perhaps subsequent re-identifications, but otherwise there are no other changes applied to identification records.
Separate to that is a taxonomic interpretation system. That deals with the nomenclature and taxonomy, and potentially multiple taxonomies, and the consequences of lumping and splitting. That is used dynamically to interpret the names currently applied to specimens but otherwise does not change them. There is no need to update thousands of records - just the way those records are currently interpreted.
When you make that fundamental separation between identification(s) and interpretation(s) then more sophisticated options become easily available - like handling multiple parallel taxonomies. The problem with such systems is that you potentially confuse end-users who might not understand the difference between the name used by the current âidentifierâ and the current interpretation(s) of that name as presented on the screen, and I suspect it makes scaling to large data-sets problematic because of the difficulty in creating a supporting search index.
Suppose you duplicate taxon_id and give the second copy a different name, âtaxon_id2â. What needs to change to use taxon_id2 rather than taxon_id? References to that name.
Iâm hopelessly lazy at handling variables in an intelligent fashion in my own scripts, so probably I would do it the stupid way, by a find / replace and checking for unintended bycatch. Although I am not a good coder, I do know that good coders have centralized variable definitions so that they donât have to replace a ton of instances scattered all over the place. When I bowdlerize scripts from good coders, I rejoice at the glorious simplicity and ease this creates. :-) However, even doing it the stupid way, itâs not difficult, just tedious.
Iâm assuming that aspects of coding that are sufficiently basic that Iâm aware of the problem and aware of the solutionâeven if not very good at implementing the solution, myselfâare probably not very challenging to people who, unlike me, actually know what theyâre doing. That assumption might be wrong, but I think itâs a reasonable assumption.
For my own data handling in an R workspace, thatâs where I am now.
Iâm baffled by your viewpoint, here. It seems like youâre just saying that any change in iNaturalist is prohibitively difficult, while any data manipulation outside iNaturalist is trivially easy.
Yes, your thoughts parallel mine in many ways. I think keeping different kinds of data separate and well-defined generally looks like more work in the short term, but lumping carries a lot of hidden costs in the long term.
Trying to avoid throwing too much information at users who will find it distracting or confusing rather than helpful is definitely an issue. This is not really the aspect of this Iâm best suited to thinking through since Iâm at the opposite extreme, but I donât think the current system is very good at this, and that there are better solutions to be found. Iâve also been trying to think about this in terms of what kind of information would transfer well. New users are going to have to learn some new concepts either way, and some subset of new users are going to want to get more involved in taxonomy over time. So it would be nice to have the concepts theyâve learned on iNaturalist provide some good preparation. As someone going the opposite direction, though, Iâm finding that a lot of the concepts you need to learn to figure out how iNaturalist works are pretty counter-intuitive to someone coming from a taxonomic background. I can only assume that this is the case in both directions.
itâs because you would need to do the translation just once at the time of the change vs every single time and point taxon is read / used, and you donât need to keep a separate complete history of such changes.
not all taxon changes are simple 1:1 translations.
the observation taxon is determined based on a community ID algorithm applied to potentially many IDs, not just based on one ID.
Reducing the apparent conflict between UI legibility and back end processes here is part of the rationale for my suggested change in data sructure for identification records. Store the userâs original ID and the accepted name in the iNaturalist taxonomy in a single record. You only need to do the translation once, but you keep the original ID and the accepted name together, with their relationship clearly identified. In the UI you wouldnât get this weird âitâs changing my identificationsâ dynamic where itâs hard to keep track of the relationship between the name a user put on an observation and the name an automated process in iNaturalist put on an observation.
And, as @cooperj mentions, this is not a new problem in iNaturalist, but something collection management software has been dealing with for a while. There might not be perfect solutions, but there are solutions that are known to be viable because theyâve been implemented. There are probably implemented solutions that are better than mine, and I just donât know it.
Among other things, it would increase nomenclature confusion among users.
As previously mentioned, many of the specific issues you raise concerning debates over taxonomy are already addressed if you look at the species specific information (not just the observation page).
This suggestion would seem to just add layers of unnecessary complexity.
If there is a taxon change that revitalizes a previously merged taxon, then you already get a notice about it, so itâs not like anything is lost there either.
iNat has to find a balance between being streamlined & user friendly, and being data rich (which is often the opposite of the previous two goals). Sticking with a single source for nomenclature (and said sources are listed), as well as already having accessible documentation concerning changes admins have made to iNat taxons, meets both goals decently well, and addresses pretty much all the concerns youâve raised.
Does it take an extra step or two? Yes, but for someone who is concerned about that sort of thing itâs an easy couple of clicks to make to get more information, and the links are already right there on the observation page.
Iâm afraid youâve lost me. I canât really connect anything in your post to what Iâve written; e.g., âspecific issuesââI donât know which issues youâre thinking of, or what information in the species pages might address them.
You know whatâs hilarious about this whole thing? Weâve just admitted that scientific names have all the same shortcomings as the common names we disdain.
Well thatâs always been the case. All scientific names are wrong in one way or another since theyâre just human constructs that try to neatly define whatâs not readily open to being categorized.
Then it shouldnât be too difficult to allow someone to choose which synonym they consider most representative of a particular organism in a particular observation
Or just use one name because they all mean the same thing? If those are really different species - write to e.g. POWO and they can change it, then iNat will, if experts will say itâs actually one species with no subtaxa, then everything else has no point, just learn a new name or use common name that stays the same.
Thatâs not one of the problems Iâm trying to solve. :-)
(The point is to have a record of the name an identifier actually applied to an observation, not to have a list of names an identifier might possibly have applied.)
If theyâre synonyms you can say all of those names were applied to the observation. I think itâs fine to add the previous name to a new name or if you choose synonym and it will be just written on the id, but honestly I donât see how iNat with so few workers and big problems to solve and aims to gain will implement the search tool, when we wait for 3 years for some changes, if your aim is to show userâs id, maybe itâll be enough to just have it there and maybe in the future thereâll be a new url and all the other stuff?
Whenever weâre dealing with taxonomic synonymy, we have to start from the assumption that there are two alternatives that are in principle equally valid: that two names are synonyms, and that they are not.
Even with nomenclatural synonyms, there can be meaningful information conveyed by which one is used. Figuring out when that might be the case is left as an exercise for the reader. :-) Iâm not sure how practical it would be to try to do anything useful with that information in a database context, though.