Plans for 2019 Clements bird taxonomy updates

thanks rjq - yes these are all examples of where species were added as ‘wholly new’ when they should have been split. As you say, I think retroactive splits like you’ve drafted are the best way to handle these. I’ve been dragging my feet on these because all involve hundreds if not thousands of observations that will get touched which means we should schedule a time to commit these to make sure they don’t bog down the site (which leads into cmcheatle’s question).

Each time a taxon change is committed, all of the current identifications become non-current and a bunch of new identifications are created. This triggers a cascade of jobs involving reassessing the community IDs of observations, and a bunch of things involving checklists. Its causing issues for two reasons:

  1. alot of these things just weren’t built to operate at our current scale - particularly things involving lists and we’re spending more and more time keeping the site from getting bogged down. The longer term solution - which we’re working on is to refactor some of these things so that they work at a larger scale. But in the meantime, it just means that sometimes we’ll see issues with the site and can trace it back to someone having committed a swap involving tens of thousands of observations.

  2. until we refactor, some of these big taxon changes involving thousands of observations can take many hours to fully process. While things are still processing, we can’t do things like redeploy the site etc. We’re increasingly finding that we have to hold off on fixing something while we wait hours for some taxon change to finish processing. The solution will probably be to (a) refactor to make these changes more efficient, but also probably (b) make it so that taxon changes are ‘scheduled’ ie someone sets them up and then at a later time when load is less high and at some less frequent interval they are committed. In addition to helping with the load problems, this could also help with the ‘making sure taxon changes get properly reviewed before being committed’ problem. But I think the latter is much harder since it requires some sort of protocols for how things get reviewed.

6 Likes

Thanks for the feedback! I added this key https://www.inaturalist.org/pages/how+taxon+changes+work

5 Likes

Wow. Much to think about. That said, I’m not sure building this toolset should be a precondition for finding more taxon curators for different parts of the tree. There’s a increasingly large pile of ungrafted taxa for locked-down parts of the tree, and I presume at least some of those could be handled relatively simply, rather than as part of a complex set of changes like this.

2 Likes

I don’t touch taxonomy curation (yet), but is there any way to indicate on a flag or during the draft process how many taxon changes will take place / how many observations would be affected?
That way curators could clean up “easier” (or at least not-server-ruining) taxa, while this is getting set up. And it might be useful even once it is established.

When you create a taxa change, the number of associated observations that will be impacted is shown directly under the taxa.

1 Like

Thanks for the key! Here are a few initial thoughts. A lot will probably come down to how the user interface actually works though.

Colors (post-change tree only): green, orange, blue, yellow

I know you say the tree diagrams will become “less confusing to interpret with practice” but I worry it’s so overwhelming that hardly anyone will want to try. The four colors aren’t viscerally meaningful in a way that is helpful to my understanding. In other places on iNat, colors like green = “good”, yellow/orange=“slow down” or “alert”, and red/pink=“bad” or “alert” are sometimes used, and these are color meanings that are broadly used elsewhere, and globally. To me green means “new” more than any of the other meanings here. Orange and blue have little to no visceral meaning to me in this context, and after rereading multiple times and drawing out diagrams I had to reference the key every time to distinguish them. Maybe input taxa and anything old and/or unchanged should be grey or black to help reduce the color load.

Some symbols and/or words could be used instead of colors. Like a larger circle for a broader/sensu lato taxon or half circle for a narrower/sensu stricto one, or a little sparkly new icon for “new” taxa, rather than yellow.

The connecting lines could be normal for 1:1 swap, thicker for merging, and dotted for splitting. Some changes in that vein would help cut down on theoverwhelming (to me) colors.

also here’s what one of these looks like with simulated deuteranopia:

In this example, Why is S. alario alario yellow? Isn’t its “straightforward analogue” in the former tree the sensu stricto S. alario?

Alternative, with half moon showing sensu stricto S. alario and full green moon showing new sensu lato S. alario.

Comparing these two schematics, where Picus viridis is split into two species, with an old sensu lato version and a new sensu stricto version.

The trees also muddle left and right. In one sense, left to right means from old to new. In the other sense, left to right means from parent to descendant. Just playing around with ideas: could they be split, so that parents are above descendants, or does that just make it far too messy to visualize connecting lines when the trees and changes are very large?

Our convention is also to show a root (in this case Estrilda) above the taxon involved in the swap to help orient us. But note that the root is altered by the swap.

I’m not understanding / seeing how the root is altered. Is the root the black triangle? Or the base of each tree figure? Maybe to distinguish old/new the background could be a different color on one side (in my mock-ups, grey).

In general ‘move children’ should only be used if you’re absolutely sure it won’t create a mess and a better option is to manually handle the children first by moving them or swapping them.

This should prob require some validator / reminder when you check that box, indicating what is going to be done. The current text doesn’t explain the kind of mess it can create with incorrectly named epithets.

If you are manually handling these changes, note that the order of operations is important.

Could it also allow typing in the new output taxa rather than making them beforehand / on a different page? e.g. just type in the output taxon in an area on the output tree figure?

As a rule of thumb, taxon ranges, conservation statuses, atlases, and listed taxa (and associated establishment means) should always be reviewed after a taxon is broadened. Taxon names should also be checked (for example, ‘eastern newt’ might not be appropriate after ‘western newt’ is lumped into it). These types of content are more commonly associated with taxa of rank species, but this is not always the case.

Can associated content be displayed on a summary page so that curators can easily notice/find and update this information?

we need to be aware that adding Timor Leaf Warbler narrowed what we mean by Timor Leaf Warbler and we may need to update associated content - specifically we would need to update the range map to remove Rote Island and check that any atlases are still correct.

This part is a major bottleneck. Most curators aren’t able and/or willing to update ranges (requiring GIS tools and knowledge…and time), but we can pretty easily and quickly update atlases. I am more likely to just delete the outdated range myself. I don’t think creating new range maps should be a requirement of any taxon changes.

Remember that in the figures we omit drawing children who’s names, ranks, parent names and parent names don’t change

In cases like this:


Maybe there could be an option to show all children, including the unchanged ones? It is confusing to not see C. blanfordi blanfordi depicted.

Anyway, just a few random thoughts while reading through this. Thanks for all your work on it! :)

4 Likes

I did a little audit of the current Taxon Frameworks with Taxon Curators (these are the branches of the tree that are off limits to ‘normal’ curators).

They are (remember that overlapping downstream taxon frameworks take priority - eg the Bird taxon framework referencing Clements takes priority over the Chordata taxon framework referencing WoRMs where they overlap):
Life down to phylum (Catalog of Life)
Animalia down to order (WoRMs)
Mollusca down to family (WoRMs)
Cephalopoda down to ssp (WoRMs)
Ocypodidae down to ssp (WoRMs)
Araneae down to ssp (World Spider Catalogue)
Lepidoptera down to family (van Nieukerken et al., 2011)
Odonata down to sp (World Odonata List)
Chordata down to ssp (WoRMs)
Mammals down to sp (MDD)
Amphibians down to sp (ASW)
Reptile down to ssp (RD)
Birds down to ssp (Clements)
the 6 Fish classes down to ssp (Fishbase)
Tracheophyta down to Class (Catalogue of Life)
the 4 gymnosperm classes down to ssp (POWO)
Hypericaceae down to ssp (POWO)

In my opinion, the majority of these are working well. The major problems with these that I’m aware of seem to be:

  1. Life down to phylum (Catalog of Life)
    people don’t like CoL’s Fungi phyla but no once can propose alternatives (similar with Virus phlya) so there’s some open flags and weird stuff here. But this isn’t a curation issue as much as a community consensus issue.

  2. Anything referencing POWO (e.g. gymnosperms)
    POWO isn’t quite ready for show time. As folks have mentioned, its missing a lot and has a lot of errors. In this case, it seems less of a curation issue and more of an issue of the reference maybe being a bridge to far from what the community will tolerate/

  3. Anything referencing Fishbase (e.g. fish classes)
    Fishbase also isn’t quite ready for show time. As folks have mentioned, its missing a lot and has a lot of errors. The community seems to prefer Catalogue of Fishes which doesn’t have an API, but maybe we should switch anyway and try to do the best with regular exports. Again more of an issue of an inadequate reference than a curation bottleneck.

  4. Birds, Mammals, Amphibians, and Reptiles
    Of these, mammals is working well. The problem with the remaining 3 is that each change usually involves a complicated split that touches lots of other taxa. The main purpose of my doc was to try to make handling frameworks like these for ‘well known taxa’ easier where there’s lots of observations and distribution content involved.

Aside from hybrids, extinct taxa, and additional nodes (tribes, subgenera etc) which I guess are both unbounded, all of these frameworks should have all the species already added, so the bottle neck isn’t having curators graft taxa which in these cases will introduce lots of duplication, but rather do the tricky work of figuring out what the arrival of taxon X means for existing taxa Y, Z, etc. (e.g. ‘oh X was split from Y’).

So I guess I agree that more hands could help keep up with the hybrid, extinct taxa, and additional node demand. But there is a tree bloat cost of adding all these. I kind of wouldn’t mind new policies about what our plan here is exactly before unleashing the floodgates (is our plan to have every extinct taxon in the tree - I hope not)

One other quick option would be to release Fungi and Virus phyla, Fishes, and Gymnosperms from curated taxon frameworks (e.g. make it so all curators can edit them) if the references are a bridge to far unless we can come up with an easier process for agreeing on and crafting deviations.

As for Birds, Reptiles, Amphibians, and Mammals - with mammals we have a great duo of taxon curators (bobby23 & jwidness) who are doing awesome work. I’d love to recreate this with larger groups of curators for Birds, Reptiles, and Amphibans, but I think we do need better materials (if not tools) on how to do curation in these ‘well known’ branches with lots of observations, associated content and few ‘wholly new’ taxa (e.g. lots of splits) before we can get more taxon curators working on them. Anyway that was my main thinking behind the doc I pasted last week

We sort of have a spectrum here now with branches with no reference on one extreme (e.g. lepidoptera below family), branches with a reference but no taxon curators in the middle (e.g. most of plants), and branches with a reference and taxon curators on the other end (e.g. birds, reptiles, mammals, amphibians). I’d be curious about what folks think about the pros and cons of each of these approaches. We certainly could move in either direction, ie move more branches to one end of the spectrum or the other. My vision is to move more towards the end of the spectrum with good references and larger teams of taxon curators - but IMO this direction requires the materials (if not tools) I mentioned above.

3 Likes

Thanks for all the feedback

I didn’t put too much though into the colors, if you can propose a better set that would be awesome.

Re: the S. alario alario example, its kind of arbitrary what the analogue is. You could do it the way you propose, thinking of S. a. alario as being the analogue of S. alario (ie 2 swaps). And the output S. alario as a non-analogue ‘internode’
That would work in this example, but it wouldn’t work more generally when the input S. alario has other children. The ‘swing lump’ way works in those more general situations
But yes I agree that S. alario and S. a alario are the same in this example

Re: the vertical opening trees as opposed to horizontal, I tried tht but for big trees they get very wide so its harder to render them on a webpage (where the width is fixed but the height is unbound)

The root is never altered, we can style them differently if that would make that clearer

I agree that range maps are a pain to deal with. But it sucks to loose all that information with taxon changes.

I tried showing all children in trees but they get way to busy (many nodes have like 20 children with just one involved in the change). I know I’m used to these from staring at them so long, but for me being able to ignore a bunch of children that aren’t really involved makes these much easier to get my head around

1 Like

In terms of using things like powo, to me given a choice between the ‘least worst option’ and no reference at all still really favours the least worst choice. Going back to totally open with all the conflicting views, regional sources etc is not a good step in my kind.

To me you hit it in terms of how to better document deviations. It still seems they get really limited use and too many changes continue to get implemented with no sourcing documented which makes backtracking better.

Somewhere there needs to be a look at how to manage areas that do one big update say annually vs continually updated things. For example the taxon I curate publishes updates daily. It can get taxing to check and implement changes every day, and if you don’t you can get behind quickly with a lot of manual one at a time work to catch up.

3 Likes

Right now this really is more a how to make this easier discussion. It is not just about adding more curators. Especially not just adding taxonomic experts. The dirty little secret about being a taxa curator ( I do have this role) for those reading this who do not have the role is it really has little to do with taxonomic knowledge and much more to do with understanding how the inat system manages taxonomy and how to do that management properly. And it is really complicated to understand the full flow and implications of what you type into a taxa change.

I manage a group far less sexy and high profile than birds (spiders) but it has 5 times the species, a reference that updates daily etc. As an example another curator who clearly has likely forgotten more about spiders today than I know has recently been adding draft changes. And every single one of them has been wrong. Not wrong in the sense of not supported by the science or the reference the site uses but wrong in terms of what they have typed into the taxa change. For instance you don’t move a genus to a different parent by doing a 1 to swap between the genus and the family. I shudder to think how much work it would have been to undo that manually had they been allowed to commit that change.

The tools need to be better for managed taxa implementation but even more so does the onboarding and training of people before you let them loose. And making sure they understand that a lot of the role is pure grunt repetitive work, not some sexy role of making grand decisions about taxonomy.

9 Likes

I agree that it would be greatly beneficial to ease the learning curve of taxonomy curation on iNat. Whether it be through more tools, improved UI, mentoring, a more granular or examples-rich tutorial, all of the above…

In the long run, as I accrue more knowledge and experience, I do want to start doing taxonomic curation. But it’s pretty intimidating, and with the system as it is I expect it will take me a very long time before I’m confident doing anything but the most minor of changes, (maybe over a year?). (For context about where I started at, I’m a college freshman, and seven months ago, before I started on iNat, I thought that “tree” was logically a taxonomic category lol)

5 Likes

There are really 2 separate streams of tools that are needed here.

  • what I will call the bird stream. Scott, or whoever take over the birds role likely the mammal curators etc need a suite of tools to mange a relatively low volume of changes in high profile areas with lots of observations. Their primary issue is complexity in particular to ensure observations don’t go wrong.
  • then curators who manage groups like mine. We need a suite of tools to manage a high volume of changes in lower profile areas with fewer observations. Our primary issue is pure volume.

To compare Clements may do a couple of hundred changes in their annual update. But they could impact tens of thousands of observations. Most of the changes I do impact few and in many cases no observations. But the reference tied to my group published almost 5000 changes last year. That’s not because of some major one off thing, that is the flow of change in a group with 50000+ species and the rate of publishing today. Not all 5000 of those require work by me, but possibly possibly 2000 do. If each of those take a couple of minutes you get the idea. My need is to as quickly and efficiently process a large volume of changes as is possible. However if such a suite of tools exists it opens the door to a lot of chaos if used by curators who do not understand how they work.

For me a well curated taxa means more than just all the species are entered, so the tools I need are to yes do things in the taxa change framework, but also managing the taxonomy framework relationship data, managing the geospatial data etc. For my taxa I spent a huge amount of effort ensuring the range of every species possible is documented (remember this is 50000 plus species), every nation has its own dedicated national spiders checklist I did etc, but the tools to manage this are hugely time consuming.

It is more high profile if the bird or mammal updates are delayed, but there is still an audience of users who want less sexy things curated. I still get complaints about for example why tribes and subtribes are not implemented for spiders. I have no good answer to those questions other than it is 1000 plus changes i need to do one at a time manually then I need to do 1000 plus TFR updates manually one at a time. That answer is not really satisfactory to the audience who want to see it.

Sorry for the long post, just want to contribute the experience of someone who is doing the taxa curation role, and has a different set of needs than the discussion was covering.

8 Likes

Thanks! I just made a lengthy post about fern taxonomy in a linked thread, but your situation seems substantially similar to that in vascular plants. Plants of the World Online (POWO) claims to have about 1.1 million “names”–they index synonyms as well, and an estimate from Kew a few years ago was that there are about 400,000 vascular plant species. Ferns & lycophytes are about 13,000 of those. I estimate we have somewhat less than one-fifth of them in iNat, as there is high diversity in certain areas, particularly South America east of the Andes, Malesia, the Philippines, and China, where we have relatively few observations and hence few imports of names. I am working on a genus-by-genus basis to add all species with full synonyms and atlases, but of course this is very much a work in progress.

Right now, POWO updates something like hemiannually, I think, although I believe as they get up to speed they plan to update monthly or weekly. I know in the past Scott has made an automated run to fill in taxon frameworks when the iNat name matches the POWO name and there’s no relationship. The pattern in flowering plants seems rather patchy: every monocot in our taxonomy is linked to the taxon framework, but there are over 55,000 flowering plants with no relationships. I’ve been working heavily on ferns, and given the differences in generic delineation I’ve described there, I don’t think an automated process to fill in relationships would be helpful for that specific framework. I can see that being another story in flowering plants, or maybe being run on a family-by-family basis there.

The vast majority of my taxonomic changes in ferns lately have been to names with 0 observations, moving them to a new combination or placing them in synonymy. Rarely are more than 10 observations involved, I’d estimate. (Swaps with large effects are best done with some community feedback, apart from the effect on server load.) On the other hand, they’re not super urgent, so if I could queue them up to wait for approval, that wouldn’t be a big deal.

We did have an interesting issue recently that I thought of in conjunction with Scott’s post. A new species, Phegopteris excelsior was recently split from Phegopteris connectilis. (It is the “tall form” referred to in Flora Novae-Angliae.) This species got imported, and I started setting up a taxon split in the prescribed fashion: create a new P. connectilis corresponding to the more limited circumscription, create atlases, split the old P. connectilis. However, there was some protest, because P. excelsior, which as far as we can tell is pretty uncommon, is sympatric with P. connectilis from New York state to the Maritime Provinces. If we had gone ahead with the split, it would have blown up every P. connectilis observation in that region. Ultimately, some of us went out to manually run through those observations with the Identify tool and identify/comment on the few that were or might be P. excelsior. That’s very ad hoc, but in practice, it seems to have been less disruptive than doing it the “right” way. It may be very different for birds, but I would think that for plants in general, you can sometimes have problems where it’s not entirely clear which old circumscription a new taxon fell into. Knocking every species that it might have been filed under up to genus, even when it’s circumscribed by atlases, could be very disruptive!

Appreciate your reflections, which in many ways reflect my experience with ferns.

2 Likes

One suggestion and one question:

Maybe the title of this thread should be changed to reflect the broader taxonomy management discussion.

What’s a good rule of thumb for how many observations are impacted before a taxon change should get postponed for a good time?

One thing that struck me reading your post is your comment about updating atlases. Independently I also updated distribution records but did it with the checklist functions. Just shows one of the barriers of having at least 3 different ways to track distribution data.

I’ve been using atlases because they’ve been incorporated into the taxon splitting process: observations of the taxon being split that fall into only one atlas of the child taxa will be automatically assigned to that taxon.

When you add a country or its subdivision to an atlas, the taxon is automatically added to the list for that country or subdivision. (But if you “explode” the country first before adding, the taxon will only appear in the lists of the subdivisions you add.) I believe the process also works in reverse: when you create a new atlas for a taxon, if the taxon has already been added to country checklists, those countries will already be present in the atlas. (I think you may have to select countries and explode them if the taxon is only on the lists for subdivisions of countries.)

Unfortunately, the “marked” function on atlases is broken (even if manually refreshed, atlases with out-of-range observations do not persist as marked), which robs them of most of their utility for detecting misidentifications and range extensions.

@choess - can you post a link to the comment you wrote and referenced above. I don’t remember seeing it, any stealing of thoughts you wrote and being unattributed was unintentional.

I guess I am a little gunshy to add to atlases based on 2 things, being unclear how much process overhead an atlas creates, especially the checking for out of range jobs, and being uncertain what direction the site will go with distribution data. I guess I felt more confident in the site having checklists or some variant of them in their long term plans.

https://forum.inaturalist.org/t/are-pteridophytes-considered-to-be-under-a-taxon-framework-with-an-api/9128 It’s mostly about the specifics of what I do to curate ferns than the broader issues for iNaturalist curation.

I do have some misgivings about the amount of time I’ve put into fern atlases (World Ferns has been incredibly helpful here, although I do try to cross-check against the secondary literature I have on hand), and what use will be made of them in the future, given the breakage of the atlas marking feature. However, the atlas documentation suggests that this is how iNaturalist plans to move forward in managing taxa and their circumscriptions, and they do interoperate just fine with lists. On the other hand, if you’re maintaining good lists of your taxa, than setting up atlases in the future should be very easy: just create one, switch it active, and the correct countries are already loaded. So I wouldn’t take on the extra overhead of creating spider atlases yet unless we get some explicit guidance that this is a good thing (or unless they’re going to help you manage observations in a taxon split).

1 Like

Yeah, my assumption, and that is all it is, is that at least in their current guise, atlases could not become the sole source of distribution data. They lack a major feature right now which is the ability to do stuff outside geopolitical borders, so for instance there is no ability to do a list or tracking of species in Algonquin Park or Yellowstone or a naturalist society study area etc.