Certainty and uncertainty in identification

I will do this in cases where I am quite confident about the higher-level ID but not so confident about the species, as many people above have mentioned. But to make this clearer, I try to not just put the best-guess species name in the comment field by itself, but instead something like “Probably [species name]” or “Probably [species name] but I can’t rule out [other species name] from this photo”.

1 Like

This. And yet because scientific names are often presented as “the” name of the taxon, this can create the impression that the taxon with that name is an objective fact of nature.

In a case like that, I would be the second identifier who comes along, sighs “unless you have a reason to think it is something other than the obvious one…” I have had some really exasperating pages – five or six species in a row, all of them showing the characteristics of an obvious, ubiquitous species, yet all of them identified only to genus. By the sixth one, my sigh was almost a shout: “unless you have a REASON to think it is something OTHER than the OBVIOUS one…”

If I have a reason to think it is something other than the obvious one, I will say what trait that is.

I have said that many times myself. Most of the replies in this thread reinforce that impression:
“I only do when I am certain”
“I tend not to ID to species unless I am certain .”
“I place an ID only when I am certain.”
“I would rather be boldly less correct but right, than boldly add something to gbif at 80% certainty.”
“Like many have said, I try to stick with things I am certain of, to avoid scenarios of “the blind leading the blind”.”

To which I reply in the words attributed to Albert Einstein: “If we knew what we were doing, it wouldn’t be called research.” The irony of this thread is that science only moves forward in uncertainty. If there is certainty about something, then there is no scientific question to be asked about it. If you read many scientific papers, you will find terms such as “95% confidence intervals.” You will see statistical significance expressed as the likelihood that the observed difference is due to random chance; it may be p<0.001, but you will never see p=0. You will see the discussion section describe the limitations of the study, and how those limitations require caution in interpreting the results.

If scientists had the attitude of many identifiers on here, nothing would ever be published.

6 Likes

The difference, compared to research that we might do as individuals, is that someone else’s iNat photo might not be easily interpreted. I’ve looked at photos where I think it’s some species I’m familiar with but there’s just something lacking in the image that makes me hesitate to give a firm ID.

2 Likes

I prefer to identify only when I’m certain. However, being certain and being right aren’t the same; sometimes I find I was wrong when I was certain I knew. (Learning happens!) And sometimes I click on the wrong button although I know what the creature actually is.

How certain is certain? 90% certain is good enough for me! Maybe less.

What about when I’m not certain? That varies. For one thing, my confidence varies from day to day. More important, I want to ID many observations that I’m not completely certain about. If species A is common in the area and species B is rare and the photo doesn’t provide the needed details to make the distinction, I’ll ID the observation as species A; Shore Pine rather than Scots Pine on the Oregon Coast, for example. Not only is this probable, but misidentification won’t confuse the actual distribution of Shore Pines. Also, I find that the observer matters to me. If I think the creature is probably species A and an observer who I consider skilled says it’s A, I’ll agree to A, but if the observer is not somebody I know, I’ll won’t add an ID or I’ll ID it at a higher rank.

Sometimes I just can’t ID the organism from the photo. It’s too blurry or the necessary bits aren’t shown or it’s from an area where I don’t know enough about alternatives. Sad. I don’t ID these – usually.

If I ID to a higher rank but say, “Probably species X” that’s usually because (1) I feel certain it’s X but I have to admit that the photo isn’t showing the traits I need or (2) I recognize that I don’t know that species in that geographic area well enough to call it but . . . .

Taxonomic uncertainties do present real problems but for most species, the species concepts are clear. (I think of Darwin’s correspondence with Hooker about the pattern of species clarity in their respective organisms of study, barnacles and plants. Both men found that some species are clearly different but some are found in pairs or groups with intermediate states that confuse the situation.) I mean, American Robin clearly isn’t Rufous-backed Robin or European Robin or the little birds called robins in Australia and New Zealand, nor is it a Blue Jay or a Meadowlark. No real problem there. We can say unequivocally that the name American Robin is correct or wrong when applied to even a moderately clear photo.

7 Likes

I can see where you’re coming from, but let me give you an example:

Two species of march fly, Bibio articulatus and Bibio abbreviatus, are only distinct from each other in the length of the hind basitarsus and the color of the setae on the mid and hind legs. If the legs aren’t visible in a photo or blurry, even if everything else is clear you’re kind of at a loss there.

Now, there are almost 1000 observations of B. articulatus on inat, compared to 0 observations of B. abbreviatus, which is an extremely rare species. Now, is this observation B. articulatus? Probably. Do I gain anything by putting it to species ID when there is an extremely tiny but non-zero chance it is B. abbreviatus? Not really.

So, even though the ID of the species is probably known, it could be more harmful if that ID ended up being incorrect. For me I would rather have a 2nd person look at it or just leave it at genus level. Assuming that it must be B. articulatus just because its common seems like bad science in my opinion because if you do that enough times you’re going to made a wrong assumption eventually.

12 Likes

Everyone’s different. For me, statistics and probability don’t really apply. You are given a couple of photos, a date-time, and a location. That process is more like what a private investigator does. Gathering evidence and drawing conclusions, not at all like hypothesis testing.

1 Like

Fair argument, however from my perspective the difference here is in expertise. In order to do quality research to the degree that it is publishable, you have to have a degree of understanding about the field. If someone was unfamiliar with algebra, they’d have a difficult time investigating infinite series, much less the differentiation and integration of thereof. Euler didn’t just sit down and suddenly have his formula appear in his mind, he likely worked quite hard from the fundamental principles he was already familiar with. Besides, a calculated statistical confidence relies upon having good collection methods as well. Sure, you can have a theoretically high degree of statistical confidence from a sampling of a few thousand data points, but a low variance could very well be a red flag for poor methodology or even data tampering.

So, when we’re looking at systems like computer vision and shape/image identification, they’re based on a collection of statistical analyses, and feeding them incorrect data is generally irresponsible, as it increases the likelihood of unexpected behavior. Plus, looking at human psychology, if there are two parties who express confidence in an identification, it’s much more likely for someone sitting on the fence to go ahead and lean towards the agreement of the other two. Supposing that the first identification was perhaps offered by the identification algorithm, and the second identification was offered by someone who was “trying to learn”, the third identification may come from someone who was less certain–suddenly it’s classified as “research grade”, and that’s one way that I expect we could be propagating errors through this system.

…and sure, a lot of these errors are likely eliminated by some supervisory algorithm that excludes extreme outliers, but that won’t necessarily apply to every group of observations; I imagine that this is especially true of species of concern that have a very small number of samples and a lot of hopeful naturalists trying to document them.

I’m not saying that we shouldn’t attempt to do our best to identify what we can, only that we should also acknowledge the importance of not overstepping the bounds of our personal expertise. While science is fundamentally about taking a step into the dark to poke the bear of knowledge, it’s good to do it practically, and to maintain some degree of safety in the process. For me, it’s not about being afraid of being wrong, it’s about wanting to avoid unnecessary and avoidable error propagation.

2 Likes

As one of the people who “timidly” identifies to genus and suggests a species in a comment:

If it is in Britain or Ireland, I put a species name if I am as certain as I would be if I saw it live. If the photo was taken in mainland Europe, I am more likely to identify to genus and suggest a species in the comment because although it looks just like the species I am suggesting, there might be look-alikes that I’m not aware of. The further away from British Isles, the less likely I am to put a species identification.

6 Likes

If I do add a tentative ID, I will @mention a relevant trusted identifier. That can often provoke a useful comment, which I can make part of my learning curve.

4 Likes

I thought similarly for a long while until I realized I do it too in my own IDs. Generally, when I have a hunch but I’m not certain, I leave it at genus but give my two cents as to what species I think it might be. With nearly any clade, I’m by no means an expert and I wouldn’t want my ID going against that of someone more confident than I am.

1 Like

I have impression that I might behave differently according to my “mood” - sometimes I risk an identification which I’m not 100% sure of (but I decide that my doubt is sufficiently small to ID) and at other times I don’t. One of the reasons why I’m cautious (besides that I obviously don’t want to cause mistakes on the portal) is that there are quite a few users who agree with others’ identifications without appropriate knowledge. If there are no other identifications, then one such person can push my, say, uncertain ID to the Research Grade and it’s not what would be wanted, I suppose.
I could not write my guesses in the comment section at all but then, I think that for some observers it can still be interesting what species their observation could possibly be.

7 Likes

If I am not sure of an ID, then I put cf. in the “tell us why” area below the ID.
Of course, if I am not IDing the family I know well … Then My ID could be wrong even when I think is is correct. In this case I usually get informed quite fast. :grinning:

1 Like

I had to chuckle. I recognize the frustration, but I usually feel it in a slightly different situation – genus ID when there is only one possible species. Gingko, for example. Bison in North America. Caution may be good, but in these cases it wastes the few seconds I spend adding the species name to the ID and means somebody else has to provide an ID, too.

3 Likes

Yes. I prefer to limit my IDs to when I am right. :-)

2 Likes

I agree. I recognize that it is exasperating when it seems like IDers are refusing to offer an ID for a common species because of a virtually nonexistent possibility that it might be a rare lookalike. However, there are solid reasons why this may be desirable in many cases. It isn’t just IDers being overcautious.

The issue is that there are different knowledge claims implied by IDing a species based on probability and IDing based on being able to positively distinguish it from lookalikes.

With the former approach, there is always a risk that the presence of the rarer species will be overlooked because one simply assumes that it is the more common species. And this has a tendency to be self-reinforcing (it “can’t be” the rarer species because there are no/few records of it, but nobody has actually taken a closer look to see whether it is there). In other words, such IDs are misleadingly specific.

To give an example of why this in fact matters in something more than a purely theoretical sense:

In Germany we have two species of Xylocopa that can be distinguished based on photos only under certain conditions. One of the species (X. violacea) is widespread, and in most areas of Germany it is in fact the only documented species in the genus. The second species (X. valga) is well established in the neighboring countries to the south, and a few years ago a population was discovered in one of the warmer regions of SW Germany. Since then, its known presence in the country has been extended bit by bit further northwards (though it is still far from being common anywhere). It is clear that the species is expanding its range due to climate change, which has created certain dilemmas for those trying to ID these bees.

In some parts of Germany – say, the far north of the country – it is probably quite safe to assume that observations are of X. violacea, since the probability of it being X. valga there is essentially zero, with this value increasing gradually as we move closer to the locations where the latter species has been documented. But given the uncertainty of its current range, where do we draw the line?

So in practice, most observations end up getting left at (sub)genus, regardless of what part of Germany they were seen in (the exception is X. violacea males, who generally are identifiable even from suboptimal photos).

For scientific use of the data, there are certain advantages to this approach: someone trying to compare ranges or track the expansion of the two species will (ideally) have a set of only those observations in which it was possible to securely determiine the species. Otherwise, they would have to go through every single observation of X. violacea to whether it is a verified X. violacea or only a probable one. This system also does not prevent the data from being usable for others who are doing studies where a probability-based ID may be sufficient (say, for phenology studies in an area where X. valga has not been documented) – in this case, they simply download all the Xylocopa records and assume that, even in the unlikely case there are a few unrecognized X. valgas hiding in the data, they are within the statistical margin of error.

6 Likes

Certainty also leads to errors.

4 Likes

and in this context, tons of type-II errors!

1 Like

On the face of it, this seems self-contradictory. If you have a genuine reason to make a species-level ID (i.e. because it’s supported by the evidence), it should make no difference who else subsequently agrees with you, or why they do so. What if it had been the observer who had (for reasons unknown to you) made the first ID? Are you saying you’d refuse to add a supporting ID, even though it could be confirmed from the evidence?

It what sense would it be “incorrect”? This could be taken to imply that the same valid ID from one identifier is somehow worth less (or more) than another - in other words, it appears to be some kind of ad hominen argument (or an argument from authority, depending how you look at it). I assume this is not what you actually meant. But if you didn’t, I’m not quite sure what else to make of it: I never heard of anyone refusing to do something right purely because they were afraid that someone might agree with them (“blindly” or otherwise).

2 Likes

No. I am not saying that.
I would add a supporting ID if I believed I had sufficient knowledge of that taxon and distribution to say it was correct. This would not be a “suggestion” though, this would be more of a “determination” as far as I’m concerned.

In some cases (with users I know wouldn’t blindly agree) I might add a species level ID more as a “suggestion” as the OP mentions. But in general I simply wouldn’t do that, as it’s probably safer to just write in as text.

I think ID-ing in one way or another depending on how well you know the user is a fair approach given the current user interface and algorithm. But I understand if you feel otherwise.

A blind agreement is of course worth less than an informed determination.
One can never know how blind or not the agreement actually is though…

Extrapolating this to the real world ( if this is what you mean ) makes no sense to me.
My decision-making here is unique to iNat and a response to the design of the user interface / algorithm in play.

4 Likes

That would fall under this situation: