Creation of multiple copies of already existing taxa

There seems to be an increasing number of times that taxa are being added multiple times despite already existing on iNat under the same name. For example, Albizia julibrissin has been added 10+ times (https://www.inaturalist.org/taxon_names/76104/edit). This wastes curator and user time, and raises several questions:

  1. is it possible to find out which users are doing this, so they could be contacted to explain that it already exists?
  2. is it possible to prevent existing taxa being re-added? Except by curators?
  3. what is the recommended action - mark as inactive, merge or delete?
1 Like

Unless you are the creator, you can’t delete it. If there is no associated data(identifications or observations), likely best to just set inactive.

They are caused by people using the COL or EOL import tool, and it does not graft properly, so they think it did not work, so they do it again.

Here is another example : https://www.inaturalist.org/taxa/search?utf8=%E2%9C%93&q=Halesia+carolina

What is odd about this one is each one of the records has a different photo(s) associated with it. Looking at the photos, it appears they have all been imported from Wikimedia Commons, which suggests there may be some kind of an issue with that import process also contributing to this.

Yes, I’ve noticed this ongoing issue too. I wonder if it would be too hard for the devs to add a little validation code to these import processes, so that it would at least not allow duplicate ungrafted names to be added.

Allowing a single ungrafted name might be necessary in the case of hemihomonyms where one homonym does not yet exist in the iNat taxonomy.

Or could we just disallow all ungrafted imports, and instead direct users to the taxon flagging process if they feel a name needs to be added?

I can see how “self-service” additions to the iNat taxonomy would have been essential during its early years. But as the iNat taxonomy “matures,” maybe that is not such essential functionality any more?

1 Like

Albizia julibrissin julibrissin has just been added (and inactivated) again. Seems like whoever is doing this will continue to do so…

Some code has been added to try and prevent the addition of duplicate taxa, we’ll see how it goes.

5 Likes

I added a new bug report that is similar, though I don’t think there’s been any discussion about synonyms in this thread.

2 Likes

Wholeheartedly concur. By far the best solution: disallow imports from outside entities.

1 Like

Speaking as one of the most active curators on the site, before disallowing them is implemented, I’d really like to have clarity on:

  • how many taxa are currently being added through the functionality
  • what percent of the ones added work just fine versus a problem

Curators are already getting overwhelmed with flags, and the inat database is missing literally millions of species that someone might have an interest in adding.

1 Like

When I crunched some numbers a few months ago, I found the majority of new taxa are pulled from the name importer rather than manually entered, and most of them graft without intervention. It’s hard to give exact numbers on how many graft correctly because curators fix most of the broken ones pretty quickly. At the time we were also averaging more than 300 new taxa per day, which is IMO way more than curators can enter manually.

I’m not sure @jdmore is being interpreted correctly though. @krancmm seems to want to prevent all name imports, but I think jdmore wants to prevent name imports that will fail to graft. The thing is, the name importer doesn’t have prior knowledge of what will fail to graft. But we could give it at least a head start – grafts will fail on all locked taxa, and on unresolved (i.e. no ancestry provided) names.

1 Like

Yes, exactly, only those. My follow-up statement about self-service taxonomy additions possibly not being so essential anymore probably muddled that point – and in any case appears to be wrong!

Thinking ideally and maybe not practically, it seems like the system could validate the import result during or immediately after import, and if found to be ungrafted, pop up an error message to the user directing them to flag an ancestor taxon for curation instead, and return them to the ID or other process from which the import was initiated. (And at the same time, automatically delete the ungrafted taxon and any ID that might have been created with it at import time.)

Incoming taxa are set up to graft as soon as possible, but that can sometimes take a number of minutes. I think it might be difficult to get the grafter to run fast enough to immediately give feedback to the user wanting to use the name.

1 Like

Tha will really help for data

Thanks, that helps me understand the inner workings a little better. I wonder (to further showcase my ignorance :wink:), is it figuring out the graft position (or lack thereof) in the tree that is resource-intensive, or is it post-graft indexing etc.? If the latter, maybe one could just query tree position real-time initially, before completing the process?

I’m not sure why the task isn’t completed immediately. @bouteloua any ideas?

Another spate of duplicates, some grafted multiple times in the same genus

Name Iconic Rank ID
Ancistrocercus excelsior species 1151266
Ancistrocercus excelsior species 1151264
Ancistrocercus excelsior species 1151265
Namibostreptus kymatorhabdus species 1151274
Ancistrocercus excelsior Insecta species 1151255
Ancistrocercus excelsior Insecta species 1151272
Ancistrocercus excelsior Insecta species 1151299
Ancistrocercus excelsior Insecta species 1151271
Ancistrocercus excelsior Insecta species 1151282
Ancistrocercus excelsior Insecta species 1151296
Ancistrocercus excelsior Insecta species 1151295
Litsea diversifolia Plantae species 1151297
Ancistrocercus excelsior species 1151273
Ancistrocercus excelsior species 1151260
Ancistrocercus excelsior species 1151257
Ancistrocercus excelsior species 1151259
Ancistrocercus excelsior species 1151258
Namibostreptus kymatorhabdus species 1151252
Ancistrocercus excelsior species 1151256
Ancistrocercus excelsior species 1151254
Praeterpediculus niger species 1151278
Praeterpediculus niger species 1151302
Ancistrocercus excelsior species 1151289
Ancistrocercus excelsior species 1151279
Ancistrocercus excelsior species 1151276
Ancistrocercus excelsior species 1151270
Ancistrocercus excelsior species 1151267
Ancistrocercus excelsior species 1151277
Praeterpediculus niger Animalia species 1151294
Ancistrocercus excelsior species 1151290
Carmenta haematica species 1151275
Carmenta haematica species 1151284
Carmenta haematica species 1151286
Carmenta haematica species 1151285
Carmenta haematica species 1151292
Carmenta haematica species 1151280
Ancistrocercus excelsior species 1151251
Namibostreptus kymatorhabdus species 1151268
Praeterpediculus niger Animalia species 1151287
Ancistrocercus excelsior species 1151300
Ancistrocercus excelsior species 1151281
Ancistrocercus excelsior species 1151288
1 Like

Thanks, I’ll add this to my weekly report.

Can you describe how you added these?

I noticed the duplicates in the ‘Ungrafted Taxa’ box on my home page and went to inactivated them soon after. The list above is from a later API query of the most recently inactivated taxa.

As I recall I had just used the ‘Add batch’ / ‘Add Batch of Taxa’ function on the check_lists page, pasting in a list of 25 taxa into the text box (not upload csv). I’m not convinced I did add them all. Only two of the five names listed above were on my list: Namibostreptus kymatorhabdus and Praeterpediculus niger, not Ancistrocercus excelsior, Litsea diversifolia or Carmenta haematica. After inactivating the duplicates I tried again to test if that was the cause, but instead successfully imported a single instance for most of the names from the external providers.

1 Like