Looking at the distribution map for taxon Eleodes goryi: applying the GBIF overlay throws up a gazillion data points that are not at all in agreement with the data at GBIF.
Safari 14.1.2 interface on Catalina 10.15.7
taxon page: https://www.inaturalist.org/taxa/334271-Eleodes-goryi
The same problem seems to apply to other species in the genus Eleodes, such as E. osculans (but I confess that I do not have time to test them all). Possibly, the genus level data is what is being displayed incorrectly on the species level map overlays.
The problem does NOT apply to an arbitrary plant taxon, Sanvitalia aberti. And the problem does NOT apply to an arbitrary insect taxon, Egira variabilis.
The incomparable Bouteloua showed me how to fix this problem. So this morning I’m going through all the species of Eleodes and checking/repairing the GBIF link. So far about one in 6 or 8 species is compromised in this way; certain subgenera are worse than that. But I do not yet see a pattern to it!
Here the fix is described: https://www.inaturalist.org/flags/544104#activity_comment_0f93c38c-8299-4474-81d0-e9af519c8d62
At least, in the case of a populous genus like Eleodes, it’s rather obvious when the map data is incorrect. But there maybe other taxa for which this is not the case! So it would be useful if anybody can puzzle out the pattern or cause of the issue!
Looks that way; I suppose the thing to do is flag the genus .
Curators can fix the issue species-by-species, as explained here: https://www.inaturalist.org/flags/544104
It is caused by taxa updates done on either site. There is no easy way to find issues as the data on the inat side is stored in a very obscure place to which there is no api connection available.
i think it would be technically possible to somewhat automate the search for mismatches by scraping / parsing the HTML in the taxon’s taxon scheme page (ex. https://www.inaturalist.org/taxa/326191/schemes) and then seeing if the GBIF taxon has the same name (ex. https://api.gbif.org/v1/species/9734350). that said, you would have to do this one taxon at a time, and it would not be a super efficient process.
if someone had access to the iNat database, or if iNat staff included GBIF ID in their taxon exports or taxon API, then you could get GBIF IDs for all taxa at the same time (or at least in chunks), but then you would still have to look things up over at GBIF, and you’d still have to do that in chunks, as far as i’m aware. so still not a super efficient process.
…
now for some speculation… i’m not familiar with the process for creating new species, but i wonder if it involves using the parent (genus) as the template for the new species? if so, that might explain how the genus-level GBIF ID might end up on multiple species, and in such a case, i guess you would just want to either procedurally or technologically prevent the GBIF ID from getting copied from another taxon.
not sure if there’s really still a bug here though. seems like a data problem (which for the original reported problem has already been fixed), and even if the source of the bad data could be prevented with a code change, that might be more of a feature request than a bug fix.
I tried to get all the Schinia you listed today. That ate about an hour and a half, but your including the taxon links really sped up the process. Thank you!
The following do not currently (9/10/2021)appear on GBIF: purpurascens, bieneri, chilensis, imperialis, copiosa, chanzyi, subrosea, pseudomia, multiplex.
Let me know if I missed any, or made any mistakes!
There are over a million known species of insects, and curators are supposed to check and repair them one by one. Greeeeeeeeeat.
So far I’ve done two genera, in something like a week.
Pretty obviously, we need a bot for this.
Nearly all the taxa I fixed have a (current) 7 digit GBIF number starting with a 9. At least this aspect of the problem is nonrandom! I have to wonder whether indexing changes were made at GBIF, causing the links to break.
If this were to become a bug report, our admins need something structured enough to tackle. How could this be formulated to make it productive? It looks to me like GBIF involvement is needed (if my hunch is correct).
Data being out of date is not a bug. It would be a bug if some programming in the code was adding the wrong identifiers.
The root cause is changes in taxonomy being made on either side, but mostly changes at gbif. Any bot or automated process to fix will have to constantly monitor the gbif database ( and likely thr whole database as it is unlikely any api has access to a change log ). That’s a hugely intensive process and one gbif likely would not approve of.