There seems to be an increasing number of times that taxa are being added multiple times despite already existing on iNat under the same name. For example, Albizia julibrissin has been added 10+ times (https://www.inaturalist.org/taxon_names/76104/edit). This wastes curator and user time, and raises several questions:
is it possible to find out which users are doing this, so they could be contacted to explain that it already exists?
is it possible to prevent existing taxa being re-added? Except by curators?
what is the recommended action - mark as inactive, merge or delete?
What is odd about this one is each one of the records has a different photo(s) associated with it. Looking at the photos, it appears they have all been imported from Wikimedia Commons, which suggests there may be some kind of an issue with that import process also contributing to this.
Yes, I’ve noticed this ongoing issue too. I wonder if it would be too hard for the devs to add a little validation code to these import processes, so that it would at least not allow duplicate ungrafted names to be added.
Allowing a single ungrafted name might be necessary in the case of hemihomonyms where one homonym does not yet exist in the iNat taxonomy.
Or could we just disallow all ungrafted imports, and instead direct users to the taxon flagging process if they feel a name needs to be added?
I can see how “self-service” additions to the iNat taxonomy would have been essential during its early years. But as the iNat taxonomy “matures,” maybe that is not such essential functionality any more?
Speaking as one of the most active curators on the site, before disallowing them is implemented, I’d really like to have clarity on:
how many taxa are currently being added through the functionality
what percent of the ones added work just fine versus a problem
Curators are already getting overwhelmed with flags, and the inat database is missing literally millions of species that someone might have an interest in adding.
When I crunched some numbers a few months ago, I found the majority of new taxa are pulled from the name importer rather than manually entered, and most of them graft without intervention. It’s hard to give exact numbers on how many graft correctly because curators fix most of the broken ones pretty quickly. At the time we were also averaging more than 300 new taxa per day, which is IMO way more than curators can enter manually.
I’m not sure @jdmore is being interpreted correctly though. @krancmm seems to want to prevent all name imports, but I think jdmore wants to prevent name imports that will fail to graft. The thing is, the name importer doesn’t have prior knowledge of what will fail to graft. But we could give it at least a head start – grafts will fail on all locked taxa, and on unresolved (i.e. no ancestry provided) names.
Yes, exactly, only those. My follow-up statement about self-service taxonomy additions possibly not being so essential anymore probably muddled that point – and in any case appears to be wrong!
Thinking ideally and maybe not practically, it seems like the system could validate the import result during or immediately after import, and if found to be ungrafted, pop up an error message to the user directing them to flag an ancestor taxon for curation instead, and return them to the ID or other process from which the import was initiated. (And at the same time, automatically delete the ungrafted taxon and any ID that might have been created with it at import time.)
Incoming taxa are set up to graft as soon as possible, but that can sometimes take a number of minutes. I think it might be difficult to get the grafter to run fast enough to immediately give feedback to the user wanting to use the name.
Thanks, that helps me understand the inner workings a little better. I wonder (to further showcase my ignorance ), is it figuring out the graft position (or lack thereof) in the tree that is resource-intensive, or is it post-graft indexing etc.? If the latter, maybe one could just query tree position real-time initially, before completing the process?
I noticed the duplicates in the ‘Ungrafted Taxa’ box on my home page and went to inactivated them soon after. The list above is from a later API query of the most recently inactivated taxa.
As I recall I had just used the ‘Add batch’ / ‘Add Batch of Taxa’ function on the check_lists page, pasting in a list of 25 taxa into the text box (not upload csv). I’m not convinced I did add them all. Only two of the five names listed above were on my list: Namibostreptus kymatorhabdus and Praeterpediculus niger, not Ancistrocercus excelsior, Litsea diversifolia or Carmenta haematica. After inactivating the duplicates I tried again to test if that was the cause, but instead successfully imported a single instance for most of the names from the external providers.