Hi folks,
We’re taking some steps to try to limit and schedule large taxonomy ancestry changes. This occurs when a curator changes the iNaturalist taxonomy by editing the parent or committing a taxon change when the taxon or input taxon has lots of downstream observations.
Background
One of the major features of iNaturalist is to be able to search observations (circles) by any node on the taxonomy (squares). For example, by searching for taxon 4 you can filter just the red observations, or by searching for taxon 2 you can filter the red, blue, and yellow ones.
To do this quickly, iNaturalist actually stores a snapshot of the relevant taxonomic ancestry on each observation.
This means that if the ancestry is altered, for example if someone inserted taxon 8 between 1 and 2, then every observation associated with that branch needs to have its ancestry updated.
A consequence of this is that if a branch has lots of observations, updating all the ancestries can take a very long time and really slows the site down. Unfortunately, as iNaturalist continues to grow, more and more branches of the taxonomy have numbers of observations that makes changing ancestry costly in terms of trade-offs on site performance.
What are we doing about this?
1 - Trying to improve the root of the problem
We’re working on improving parts of our core infrastructure to make these ancestry updates faster and less disruptive. But this is hard work and it’s going to take time/resources to improve.
2 - Replace the ‘only staff can alter taxonomy with ranks of order and coarser’ rule with 'only staff can alter taxonomy of nodes containing >200k downstream observations’
Currently only staff can alter the ancestry of taxa with rank order or coarser. But rank isn’t as good a proxy for which branches are disruptive to change as the number of observations on that branch. For example, even though Placozoa is a phylum it has no observations (including descendants) so moving its position on the tree will not be very disruptive. In contrast, even though dabbling ducks are a genus (Anas), moving the taxon would require reindexing more than a quarter million observations.
As a result, we’re altering this functionality so that only staff or taxon curators can alter the ancestry of taxa or commit taxon changes with inputs having >200,000. For example, in the Fungi Kingdom there are 18 nodes in 7 lineages that would exceed this threshold at the moment:
Phylum Basidiomycota (2946626 obs)
..Subphylum Agaricomycotina (2628426 obs)
....Class Agaricomycetes (2801134 obs)
......Order Polyporales (515820 obs)
........Family Polyporaceae (290416 obs)
......Order Russulales (233840 obs)
......Subclass Agaricomycetidae (1547457 obs)
........Order Agaricales (1460054 obs)
..........Suborder Agaricineae (642072 obs)
..........Suborder Marasmiineae (207766 obs)
..........Suborder Pluteineae (203821 obs)
........Order Boletales (227934 obs)
Phylum Ascomycota (962695 obs)
..Subphylum Pezizomycotina (868941 obs)
....Class Lecanoromycetes (678834 obs)
......Subclass Lecanoromycetidae (511547 obs)
........Order Lecanorales (376880 obs)
..........Family Parmeliaceae (215079 obs)
For these taxa, curators should flag the taxon and mention iNaturalist staff member @loarie or the respective taxon curator (if the branch is covered by a taxon framework with taxon curators). There the change can be discussed and scheduled for a time when it will minimally impact site performance (i.e. non-peak hours). Our goal is to resolve such flags within a month.
Assuming the yellow nodes in figure below have >200k observations that means that curators wouldn’t be able to do step C here (move 2 from 1 to 8). Please don’t try to circumvent these restrictions with a sequence of allowable moves as it may result in your curator status being revoked.
3 - Alerts on changes impacting >1,000 observations
We’d like curators to consider the costs of altering the taxonomy before making such changes. Are the benefits of the node you are inserting worth the costs? Have you spent enough time discussing this change in flags to make sure it won’t be reverted (thus incurring the costs twice) or to make sure you’re making this change as efficiently as possible? As a result, we’re planning to add “are you sure” warnings before updating the ancestry of a taxon with more than 1,000 downstream observations or committing a taxon change where such a taxon is an input.
4 - Increase the number of taxon frameworks with taxon curators to cover more of the tree
Taxon Frameworks cover branches of the tree starting at a node and descending to a particular rank. For example, there’s a taxon framework on Earwigs that extends down to subspecies. They are meant to encourage stabilizing the taxonomy and considering a branch more holistically by more explicitly mapping the iNat taxonomy to external references. When taxon frameworks have taxon curators, only these curators can alter the taxa covered by the framework. For example, the taxon framework on Mollusks that extends down to family has taxon curators @jonathan142, @loarie, and @bobby23. Note that other curators can make changes to parts of the three downstream of such a taxon framework (e.g. genera within a mollusk family).
We’re planning to add more taxon frameworks with curators to more parts of the tree of life with lots of observations where they are a good fit to further help reduce unplanned, disruptive taxon changes. For example, a taxon framework on Beetles (Order Coleoptera down to family) would not only address all nodes currently beyond the 200k threshold (except a lineage of Ladybugs in Family Coccinellidae), but would encourage coordination and discussion around changes to this clade as a whole.
Order Coleoptera (2636731 obs)
..Suborder Polyphaga (2337591 obs)
....Infraorder Scarabaeiformia (421278 obs)
......Superfamily Scarabaeoidea (420362 obs)
........Family Scarabaeidae (335405 obs)
....Infraorder Cucujiformia (1470870 obs)
......Superfamily Tenebrionoidea (218127 obs)
......Superfamily Coccinelloidea (384997 obs)
........Family Coccinellidae (378741 obs)
..........Subfamily Coccinellinae (320623 obs)
............Tribe Coccinellini (305858 obs)
......Superfamily Cerambycoidea (264468 obs)
........Family Cerambycidae (266370 obs)
......Superfamily Chrysomeloidea (320743 obs)
........Family Chrysomelidae (316113 obs)
....Infraorder Elateriformia (307141 obs)
......Superfamily Elateroidea (245229 obs)
..Suborder Adephaga (223936 obs)
Likewise, since taxon curators of a particular taxon framework will have permissions to get around the 200k threshold described in bullet 2 this will give us more ability to spread the burden that bullet 2 places on iNaturalist staff while still ensuring these changes are scheduled. We’ll be coordinating with taxon curators individually with regards to how to schedule disruptive changes within the branches that they curate.
5 - Better resources and training for curators
This remains a top priority.
Conclusions
Thanks for your patience with this. We realize taxonomy is constantly updating and we appreciate all the work done by curators to help fix errors and keep iNaturalist taxonomy up to date. We hope these changes will help this work continue while minimally impacting site performance as iNaturalist continues to grow.
We’d also appreciate any other feedback on what we could do to keep the benefits of a crowd-curated iNat taxonomy while working towards a more stable taxonomy.