Confidence Scores for IDs

Yes, I definitely do that at times. But when the ‘broader ID’ is Dicots, well, things can easily get quite lost. Being able to add an ID that indicated a lower level of confidence might meet the dual purposes of putting it where experts were more likely to see it and hopefully making people less likely to blindly agree.

(I’d be more ready to do ‘give it a try’ IDs if I was confident observers wouldn’t just agree, because I’m happy to withdraw if needed, but it’s harder if others have already agreed. As it is, I tend to stick to genus at most unless I’m pretty confident, even if there’s only one local species.)

Having said all, that, I could be wrong, but I can’t see it being implemented.

1 Like

Given that leaving a comment indicating that I am not confident about my ID seems to have very little effect on whether observers agree or not, I doubt having a mechanism by which users could mark their IDs as being low confidence would serve as much of a deterrent. If people aren’t noticing comments or agreeing anyway, I doubt they would behave any differently if there is a notice next to the ID. (Many users will also happily select IDs that the CV indicates it is not confident about, or that are obviously way out of range, so…)

1 Like

You’re right, I’m probably over-optimistic. It’s a nice idea, though… :-) At the least, hitting ‘Agree’ to a half-strength ID should make their ID also half-strength, totalling one rather than two?
(Then there are the users who’ll select the CV suggestion while adding a note that it’s wrong??)

2 Likes

I like the idea as well. Currently all the “Tentative” IDs may get lost or people still blindly agree with them.

Maybe, this could be addressed by simply* removing the agree-button on “half-IDs”? I think most of the blind agreeing can be attributed to the ease of doing so. Most would probably not go to the lengths of typing out the taxon name in the search bar.

(*or perhaps not so simply. I don’t know how difficult this would be to implement)

1 Like

Would they be correct 99% of the time? I see lots of green bottles on iNaturalist from UK which have been identified as sericata, mostly from photos which don’t show the level of detail needed for an id. I had assumed this is one of those feedbacks where the more green bottles are identified as sericata, the more certain the computer is that green bottles are sericata, so the more people are confident in picking sericata for their id.

National Biodiversity Network (an amalgam of UK biological databases) has 1964 records of sericata for UK, iNaturalist has 1137.
For L. caesar NBN has 2162, iNaturalist 191.
For L. illustris, NBN 623, iNaturalist 11.
For Neomyia viridescens, NBN 991, iNaturalst 23.

1 Like

Well, that depends on whether there are IDers who are willing to disagree with the species-level ID in such cases. Particularly in situations where one species is known to be far more common than others, some IDers will instead add a non-disagreeing genus ID because this is less stressful than trying to diplomatically explain the rather counter-intuitive reasoning that “statistically considered, it most likely is that species but I am disagreeing because the photos aren’t good enough to confirm it.”

An algorithm that calculates ID reliability would presumably not consider a non-disagreeing genus ID to be an indication that the species ID is incorrect (there would be no way to tell based merely on the IDs which of the IDers is more skilled, because a polite non-disagreeing “not enough evidence” genus ID is effectively identical to an “I don’t know enough to confirm the more specific ID” genus ID.)

1 Like

Maybe not in Europe, but in North and South America, male Lucilia sericata are easy to recognize in poor-quality photos specifically once someone has been trained. (So I agree, do not try this at home without a good reference collection.) The idea that insect species cannot be separated without looking at bristles and terminalia is true for certain groups, but generally a myth, and perpetuated by taxonomic keys which avoid differentiating species using habitus-level characteristics like patterning and shape in the pre-inaturalist era, because these characters are harder to describe reliably in keys or understand without a reference collection. Many taxonomic experts, for instance, never even bothered using patterning in flies because patterning becomes obscured when flies are stored in alcohol.

6 Likes

A big part of this is that iNat users stick to cities where sericata is much more common than anything else. Whereas any natural history database will be biased toward the countryside where scientists have been more interested in collecting things.

3 Likes

I am glad to see there is some more thought on this…

While fully discrete consensus IDs (without confidence estimates) are surely convenient operationally, it sounds like there are weaknesses and pitfalls to this.

On the matter of wanting iNat to be a serious platform:

My gamification comment was not a challenge to the seriousness of iNat, but rather a note on how to further encourage higher engagement, which I understand is fundamental to the platform. I don’t think the core iNat system would be as popular as it is without that fun factor, which already exists (and should be credited!)

Improving definitions and calculations of consensus IDs could increase the level of quality and seriousness of the platform, reduce error rates, mitigate poisoning of CV training by overconfident but wrong consensus IDs, and ease frustrations by identifiers with professional backgrounds who see an overwhelming number of wrong, low confidence IDs–far more than they have time to address directly.

It seems like a recurring complaint by actual expert identifiers is that there is an unacceptably high misidentification and error rate in taxa where IDs are inherently less confident, especially by uniformed identifiers who click the first CV hit without verifying, or click ‘agree’ just because they like a photo.

This, as well as the apparent tiffs occuring between those worried about other people’s IDs over-influencing their observations (as in prior source thread), could potentially be addressed by implementing a confidence score approach to [at least] consensus IDs.

So I get why there is some resistance to considering how to implement confidence scores for IDs–including operational complexity and social concerns–but there could be statistical approaches that improve quality and address existing concerns.

1 Like

“There is no progress in the history of knowledge, but a continuous and sublime recapitulation.”
Zit. Umberto Ecco’s character Jorge de Burgos in “The name of the rose”.
:rofl:

Sorry, but I cannot let it stand like that.
Changing ID’s based on improved knowledge and skills is exactly the intention and not at all absurd.

Yes and no.
I’d personally see also the difficulty to apply a statistical approach all the way down to species level.
But no, because even CV today should be able to assign a meaningfull probability if there is a bird on a bird picture or not and help flagging/sorting out most of the trash posted by overly ambitious species hunters.
And this will help experts to focus on the value-added stuff.

Maybe a low-threashold confidence score to observations to mark if an observation is showing a living thing from the proposed family or not.

And another confidence score on species level to highlight taxa (not observations) which are undistinguishable from normal photos.

Both could guide those who want to serioisly contribute and repell those who abuse the platform as a photo album.

Except that it would not necessarily reflect improved knowledge and skill, merely additional IDs. If someone joins iNat with pre-existing skill at identifying a particular taxon, their skill is not going to be meaningfully different if it is the first ID they provide or the 100th.

It also happens that someone may add lots of IDs based on misinformation that take a very long time to get corrected because there are few specialists actively working on that taxon, with these wrong IDs in the meantime garnering equally uninformed agrees from observers and therefore being treated as more reliable. And yet these later IDs are no less wrong than the earlier ones just because the user has added more of them.

Nope. The CV will suggest an ID for anything, even a scribble on a piece of paper, because this is what it is programmed to do. It compares images without knowing what parts of these images are essential and what parts are not. It is therefore not able to determine whether an organism is not present, just as it is not able to tell whether the photo shows some organism that it has not been trained on.

2 Likes

Valid intent, but what is different compared to the process flow today?
I have observations made during vacation. I assigned an ID proposal. Meanwhile, i have six confirmations to this taxon of which i am not familiar with. If those confirmations are all from tourists like me (which is not unlikely), we are exactly at the point you describe but nobody might be even aware that a confirmation is questionable. If i initially assign only genus, the next :skull_and_crossbones: lurks around the corner and ID’s it to the species and the whole schmu starts again.

Again true. But if the first CV proposal is a snail, the second a mushroom and the third is a moss while the proposed ID is a frog, you may agree that something is weird and that the risk is high that there is no visible frog on the picture at all.
Take the sum of all taxons (family, genus, order, species etc) as points in a vector space. If the sum of geometric length of vectors between the first three CV proposed ID’s and originator proposed ID is high, then the observation is highly divergent and conveys a high risk of being trash. If it is low, divergence is low and the difference does likely not reach beyond taxon family.
Real life example
https://www.inaturalist.org/observations/308388426
Proposed ID: white-throated dipper (bird)
CV ID1: A crayfish
CV ID2: A moss
CV ID3: Another crayfish
Then a salamander, an insect…

My proposal: Create an automated signal to support saying “goodbye” to the mess in the most efficient way.

…you may see, i am used to turn systemic manure as above example into something value-added. That’s indeed still my full-time profession as of today.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.