Correcting of widely misused taxa by an automatic process

neylon · January 21, 2024, 1:51am

This url would be a useful tool if you guys wanted to take a crack at fixing the dataset: https://www.inaturalist.org/observations?place_id=97394&subview=map&taxon_id=47735&view=identifiers&without_ident_user_id=derhennen,szucsich,upupa-epops,buggybuddy,biosam,insulindian_phasmid,jellyfishww,graytreefrog,douch,jacksonmeans

Change the names to be the people who can be considered reliable with their ID’s (you said there were about 5?) and send it out to them. This will show only the observations that lack a reliable ID, and then you can go to the filters to open those observations in the identify page and start correcting. That will make it a much more manageable number to take care of. Once that’s finished, everyone can start verifying ID’s.

upupa-epops · January 21, 2024, 3:36am

In this case it’s not unresolved taxonomy as much as, we don’t know which European species have been introduced or spread where in North America.

These kinds of organisms spread very easily in garden soil. A few years ago I found a European slug species in Ontario that had only previously been found in California and Newfoundland. There are plenty of myriapods that have been introduced from Europe to North America but haven’t been found in [this observation’s province/state] yet.

If both species of a certain millipede genus have been confirmed in Ohio, should I assume that both are present in Ontario, or can I ID it to species because the genus is distinctive and that’s the only species confirmed here?

spiphany · January 21, 2024, 9:28am

In other words, there are known unknowns. There is a difference between being cautious about ID’ing because hypothetically some similar unknown species could exist (i.e. unknown unknowns), and being cautious because the taxon is not well studied in the region and it is known that many species have not been documented – we do not fully know what possibilities there are, so things like range are only of limited use as a criterion.

This is a good point. However discouraging it might be as an observer to have one’s observation pushed back to a higher level, it’s important to remember that identifiers do not derive any pleasure from doing this either. It is far more satisfying to be able to suggest an alternative, and pretty depressing to have to say over and over: this can’t be ID’d to species.

I’ve seen cases of genera where people persistently suggest a certain species because it happens to be well known (not necessarily most common) but it cannot be reliably distinguished from a couple of other widespread species without (e.g.) examination of the adult genitalia. But the species IDs may end up remaining uncorrected because IDers have discovered that they get so much ill-will from doing so that they prefer to focus their attention on other genera where their efforts feel more constructive.

upupa-epops · January 21, 2024, 8:23pm

I just took a look at @derhennen’s website and discovered he has some newer resources including this list of sources and this key for families of Julida.

cooperj · January 21, 2024, 10:51pm

I’d prefer to see a mechanism allowing users to vote down a CV suggested taxon in a chosen area. My group, the fungi, have an increasing number of regionally cryptic species which cannot be reliably identified by photos alone. Sure, if there is adequate micro-data or even better a sequence, then they can be identified to species, but that is a small subset. Currently the CV makes suggestions for species that are wildly out of range (wrong hemisphere!) and often based on poor quality training data. The new Geomodel is much worse than the older ‘seen nearby’ in this respect.

If the number of down votes reaches some threshold in a particular area then the CV would suggest the next higher taxon in that area. And if that higher taxon is still too narrow then that too could be voted down, and so on. Then at least the problem is tackled at the point of origin.

upupa-epops · January 22, 2024, 12:52am

I’m not sure entirely which ranks above species the CV knows about, or how they’re weighted, but I wish all of the ranks above species were given a heaver weighting towards showing up as the top option. And even in secondary options (is it even possible for a non-species rank to show lower in the list?). The top species suggestion would still be in the list somewhere, but in so many taxa it would be preferable to choose a genus or family as the default if you’re inexperienced.

doppelhans · January 22, 2024, 1:04am

An automated process to add mass comments would be a really good start.

I know, the downside of a mass ID tool is that also correct IDs would be changed to a higher level.

doppelhans · January 22, 2024, 1:22am

I think it is a misunderstanding, of what ID means for Bombus (as a small group, see notes by neylon or by cthawley) and for millipedes. upupa-epops already stated that it is almost impossible for most of the millipede records (an animal class) to say more than the family.

If I would go through all the misidentified false North American Cylindroiulus caerulecocinctus I could only give them the much higher rank Juliformia because even the order is unclear for me.

And as far as I know from my North American colleagues, they already screened all the garbage IDs for known and identifiable species like the Narceus americanus complex. So, if they would go through the rest, they also would just rank them in a much higher group.

So, why don’t give us a tool to make IDs for a bigger group? Even there is a voting process before. Then experts can vote if false North American Cylindroiulus caerulecocinctus or several other false mass IDs should be changed to Julida or even Juliformia.

This would help to rescue the IDs for European Cylindroiulus caeruleocinctus and several other non-North America species. Currently, the wrong set of North American (undeterminable!) records to European species destroys the ID-able of European species.

neylon · January 22, 2024, 2:59am

Alright, I’ll expand my example, according NWF, there are 1400 Millipedes in US and CA. There’s more than 3000 Bees in the same area, and many of these also can’t get to species from photos, but people still include CV-based species ID’s. We also have the oh so fun job of setting these back to higher taxa by the thousands (see Augochlorini, Andrena, Sphecodes, Dialictus, Eumelissodes, Megachile, etc).

I did move three of my of Millipedes to Class after learning that the genus’ I had were likely wrong, but they had been sitting lonely and un-ID’d for 4 years.

I’m unsure what you mean regarding European vs American species. Are you referring to introduced European species in North America that would potentially get lost amid the mis-ID’s? Or the European species that are actually in Europe?

doppelhans · January 22, 2024, 8:24am

I only can tell you, that the American specialist actively refusing to correct the American chaos as long as there is NO helpful tool.

Regarding European vs American: Please read my introduction! C.c. is a European species: European species that are actually in Europe.

szucsich · January 22, 2024, 8:51am

As the leading identifier in Myriapoda I just can support the idea of doppelhans. Since I am not an expert in Diplopoda and Chilopoda my main work there goes into cleaning - as an expert of other groups my main work could go there (I just do not want to abondon myriapods).
So an automatic correction tool would highly help to make ID-time more effective.

neylon · January 22, 2024, 11:51am

I did read your intro, and that’s why I’m confused: you can pull up maps on iNat and GBIF that only show certain regions. I’m not seeing how one continent being messed up means the whole thing is. If Europe is accurate and properly reviewed, then that sounds great. North America still needs a reviewer, and we know that research papers can’t use that dataset currently.

doppelhans · January 22, 2024, 1:45pm

The massively wrong use of common European species in North America influences the ID-proposals also for Europe. Not the continents are messed up but the CV.

iNat is a determining tool but not a gis-tool, where wrong determinations do not matter.

I just can run a script and change the IDs for a bulk of North American records. I already tested it. For me as a European expert, it would just be fine because it would greatly improve the ID-quality of the computer vision.

But this would lead to a war of experts, because I don’t know what exactly would the choice of other experts. That’s why I proposed to find a proper discussion process.

If iNat will not provide one, we will implement our own one.

DianaStuder · January 22, 2024, 2:33pm

I would like to say - not it is not an orchid.
Without iNat sneering back at us - SO - what IS it??
And forcing us to add a ‘random broad’ ID.

Acknowledge taxon specialists.
Let them ID what they know!
It is NOT an orchid.

Downvote an ID!

sedgequeen · January 22, 2024, 4:29pm

Wouldn’t simply downvoting something cause iNaturalist to shift it up to the next category, which may also be wrong? Why not have us shift it up to the next category that we know applies?

mabuva2021 · January 22, 2024, 4:35pm

Coming from someone who identifies flies, which like millipedes are similarly a mess of ids, here are a few tips i’ve found that can help:

1.) Timing when you do corrections can be helpful. Its currently winter in the northern hemisphere so there aren’t as many new observations being made. That should make it easier to correct things.

2.) Start by correcting all misids that have gotten to RG. These are usually the source of the CV’s incorrect suggestions and by getting rid of them first, it makes sure you aren’t fighting the current of new incorrect obs as you clear old ones.

3.) I’ve found it helpful to compartmentalize a clean up into regions, e.g. “Today i’m going to correct all the misided obs for a species in this one state”. Then tomorrow do a different state, then another. By doing it this way it helps with burnout and can make the problem look less daunting.

I know inat is also currently conducting an experiment to assess the accuracy of identifications, so hopefully something useful will come from that: https://www.inaturalist.org/blog/88501-experiments-to-estimate-the-accuracy-of-inaturalist-observations

sedgequeen · January 22, 2024, 4:48pm

I can see that having an automatic method of correcting problem ID’s that are interfering with CV could be useful – if it only lasted a short time, in case there really are individuals of the problem species in the area it shouldn’t be.

Given that iNaturalist isn’t likely to do anything like that any time soon, if at all, why not solve the problem yourself? I don’t mean ID all the Cylindroiulus caeruleocinctus yourself. Write a journal post about this species and where it is & isn’t known to live. Include links to photos with good photos and a recommendation of a higher category to label it. Probably write a little paragraph of explanation for each identifier to paste into the observations as they go. Some commenters here would help. You know millipede people in Europe who could be recruited to spend a few hours, at least, on the project. Advertise it here and on the IdentFriday endless forum thread, with a date, maybe a weekend, for a focused ID push. Post on the progress, too, to encourage people. Follow @mabuva2021’s excellent suggestions. You could have this problem solved long before iNaturalist could write the relevant code, test it, and upload it.

tiwane · January 22, 2024, 4:57pm

This would likely fall under the category of Machine Generated content, which is a suspendable offense on iNaturalist.

I understand your frustration, but as others here have said the best thing to do is create resources and recruit people who can help knock the observations’ IDs back to something that’s accurate.

iNaturalist doesn’t “sneer” at anyone - it’s a website. The ID system is intentionally designed to force you to make an additive, constructive ID rather than just tell someone they’re wrong and contribute nothing else.

And the coarse ID you add isn’t “random,” or at least it shouldn’t be. It’s based on the evidence provided and what ID you think can be determined by that evidence.

DianaStuder · January 22, 2024, 7:20pm

I hear you.

But someone (not me) who studies orchids, and 100% knows it is not an orchid.
Then IDs as plant or dicot … and tips us into Ancestor Disagreement. Not the specialist’s intention, and neither additive nor constructive.
At best it prevents a second identifier misusing Agree as a thank you.
Wrong + Ancestor Disagreement = 5 IDs needed to reach RG.
We don’t always have 5 competent identifiers available.

cooperj · January 22, 2024, 8:05pm

Just to clarify in case it was misinterpreted. My suggestion was not to downvote an ID but to downvote the appearance of the taxon as a CV suggestion in an area.

Topic		Replies	Views
Disruptive disagreement on identifications General	48	1725	November 18, 2023
"Seen Nearby" vision suggestions often lead to incorrect identifications General	22	3318	July 6, 2019
A Kind Reminder To Only Identify What You Are Sure Of! General	32	1864	June 7, 2024
Problems with wrong suggestions General	31	3319	May 6, 2020
People making wrong suggestions General	127	6816	May 2, 2024

Correcting of widely misused taxa by an automatic process

Related topics