Impact of obscure external species records on IDing abundant species

When IDing a genus I first try to determine all species ID options for the location including unobserved species, by checking external sources. In cases where it seems there’s one species in a location, I occasionally later come across obscure records of a second. These often lack photos, description, or aren’t mapped to databases, and papers describing them are often dated, making them difficult to evaluate or ask authors about.

In some cases the obscure species doesn’t prove to be relevant, e.g. if it looks different, was a later revised, or was misidentified. But it’s often impossible to know, since it may (even if unlikely) just be uncommon, rare, unconfirmed or dubious yet not disproven, or just rarely studied or sampled/observed. Once learning of an obscure second species record, a maximally cautious approach would be to then only ID genus. I like many mostly do that, but occasionally still tentatively ID the abundant species. The decision depends on how likely or valid you infer the obscure species to be, how far you searched for the answer, and whether the answer is known anywhere (sometimes expert sources share the same uncertainty). In these scenarios do you only ID genus, ID the abundant species, or ID genus and/or species on a case by case discretionary approach? (I do the last.)

One consideration is some users (probably everyone at certain times) don’t always consult external sources but base IDs mostly on current iNat obs. or Place checklists (although which may include unobserved species). Doing so isn’t always ideal, although it’s also true that the most abundant-in-nature and observed species are most likely IDs (and are best-described in external sources). A final scenario is if there’s a single proven record of a species occurring far outside it’s known range and where it isn’t established. If that species is similar to an abundant established species, it can raise the same question of whether to now only ID to genus. Adding to the complexity, we may also consider that rare but undetected out of range species occurrences can or may likely have occurred anywhere. What’s your ID response to out of those out of range scenarios?

Examples:

  • The European tube wasp (Ancistroceros gazella) has frequently been IDed in Australia, but external sources also list a second species which has far less accessible photos or ID info by comparison.

  • iNat and most external source Florida, US records only include one mouse eared bat (Myotis austroriparius), but GBIF includes a single older and potentially unlikely or unconfirmed record of a second eastern US species.

  • Hawaii, US was once thought to only have three metallic sweat bees (Dialictus), but it’s since become unclear whether there could now be additional.

  • The potter wasp Oreumenoides edwardsii which has been observed on iNat was once-reported by an external source to have been sighted far outside it’s known Asian range.

2 Likes

I tend to wield Ockham’s Razor a lot around here - what’s the most likely explanation for the observation?

That involves looking at the critter observations in the immediate area, and knowing the populations across the country in general (admittedly relatively easy here in NZ, not so much for other parts of the world, perhaps.)

Generally I’m not going to believe for example that a Northern Hemisphere spider has just been ID’d in NZ when there is another spider looking exactly like the obs, for which there’s 10,000 good local identifications.

Per Carl Sagan (borrowed from Laplace): “Extraordinary claims require extraordinary evidence.” So if the evidence is presented, fabulous! If not, it gets a meh and an explicit disagreement (which I use extremely rarely.)

Given iNat isn’t (and isn’t intended to be) an authoritative resource, I think that suffices most of the time. Also, knowing that my IDs will be under scrutiny by people who know a lot more than I do, I am comfortable that if I am wrong, the IDs will change over time (always toward the more correct end of the spectrum.)

10 Likes

To go even farther, it’s always possible that some iNat observer could have spotted the only individual of any species in a given area that has been transported way outside its range…it’s just really, really unlikely in most cases. @russellclarke’s invocation of Laplace via Carl Sagan is entirely appropriate here.

In practice, I wouldn’t let one previous observation (even if fully verified) of another similar species in an area prevent you from making IDs of a common species there. I would just go ahead and ID as the common species. Even the occasional hitchhiker still doesn’t prevent me personally from making IDs in this situation.

For instance, in Florida, there are 9 species of anoles observed (and a couple more potentials that aren’t on iNat). However in about 70% of the state, only populations of 2 of those species are established. In those areas, I confidently ID to those two species and keep a notion that I could see a rare hitchhiker. Florida’s a weird place, and anoles are good at stowing away. Even then, those hitchhikers might be at a frequency of 1/1000 observations. That’s not a high enough frequency (for me) to not ID the other anoles to species.

Once/if there’s good proof that a reproducing population has set up in an area though, then I will start to consider those other species and be a bit more conservative. For instance, with observations with bad pics, I might only ID to genus instead of to species.

There’s rarely going to be any perfect bright line in these situations, but in general I don’t think you need to worry to much about one-off rare species having a major impact.

4 Likes

I agree.

Although there also are some new species locality records uniquely from iNat, probably helped by the fact that people often have camera phones wherever they go. For example the wasp Delta esuriens has only been observed in California and Florida on iNat or Bug Guide. It may also be an example of where it’s difficult to determine whether or not a species has become established (maybe it has).

1 Like

I generally feel like it’s incumbent upon the poster to provide their reasoning as to why they’re choosing the obscure species instead of the common one. As @russellclarke said, extraordinary claims and so on. If they haven’t provided anything, or it appears they’re just guessing based on the common name or the visually similar lookup, I’m pretty skeptical. If it’s something I’m really unfamiliar with, I’ll usually tag someone with more experience in that taxa for an extra opinion.

@brian_d I’ve been running into those “dubious” records a lot lately with some of the herbarium records Calflora uses for their databases. There’s a ton of rare plant species in my county that were supposedly found once, in like 1890, hundreds of miles outside of their actual range… but they still appear in all the checklists for the area because of it.

6 Likes

In this context we might use “obscure” to mean “poorly documented in the identification resources available to us” or “poorly documented in the locality in question”.

If we’re trying to decide between species A and species B in place X, if our occurrence data for place X indicates that species A is observed about 10 times more often than species B, I think it’s generally reasonable to interpret that as indicating that our focal observation is about 10 times more likely to be species A. There’s no shortage of contexts in which that interpretation will be incorrect, but it’s generally reasonable.

In more extreme cases, for instance if there are about 1000 times as many observations of species A than species B, it might be reasonable to go further and say that an identification as species B represents an “extraordinary claim”. Personally, though, I’d be reluctant to arrive at that kind of conclusion in the absence of a lot of other contextual information.

If we’re trying to decide between species A and species B and the identification resources available to us provide a lot of information about species A and very little about species B, to my mind this provides no information whatsoever about the likelihood of our focal observation being one species vs. the other.

So far as I can tell, some contexts in which misidentification rates are likely to be very high are related to this scenario. Suppose you’ve spent many years learning the grasses of Kansas, and then you do some field work in Wyoming. In your mind, the grasses of Kansas are the “common ones” while those found in Wyoming but not Kansas are “obscure”. That subjective perception of what’s common and what’s obscure isn’t just uninformative, it is usually actively misleading. People who aren’t aware of this and actively working against just trusting their instincts are likely to be both wildly inaccurate in their identifications and to think that they know what they’re doing.

2 Likes

(And I definitely find myself tripping over that last kind of scenario regularly. Whenever I’m somewhere new, there’s some plant that I think know and it turns out I’m wrong.)

Worse, they are relying on the computer vision, which is frankly abysmal for a lot of the critters I identify, especially spiders.

2 Likes

Yeah - it’s surprisingly good for a lot of the large plants, at least here in California, but absolutely abysmal for mosses and lichens. And of course once you have enough incorrect IDs, those start feeding into the computer vision suggestions and the problem just snowballs.

For me it depends why the species is obscure.

Is it because it’s from the other side of the world, or because its range is restricted to a single mountaintop? In that case, it’s almost certainly not that species, and Occam’s Razor is the way to go.

But sometimes species just get ignored for no apparent reason. Usually someone IDs something as a certain species, and whether it’s correct or not, people latch onto it and think that everything that looks like that is the same species. In that case, it’s a bit of a vicious cycle where people ID it as the ‘common’ species because there are lots of records of it, but then of course that creates new records of the species which people will then use to justify their reasoning.

A perfect example is the genus Austroscolia. Take a look at the sightings of A. soror and the sightings of A. nitida. From that, you’d think it was pretty obvious that A. soror is very common and A. nitida is quite rare. But from the literature I read up on, it seems that both species are supposed to be equally common where their ranges overlap (mostly Queensland and NSW). The species are very similar, and it’s probable that an initial correct ID of A. soror on one of these guys was used as a reference and snowballed into them all being called that, even though that’s certainly not true.

This sort of thing happens again and again, and it’s important to properly rule out everything reasonable before going to an ID. If a species has been recorded in the area, I think it’s very important to at least check the circumstances of the record before dismissing it. If it was a rare vagrant, an escapee, or an accidental introduction that failed to establish, then fair enough. But if it’s just an old taxon that nobody has done much work on, then it’s certainly fair game and you should be confident in ruling it out before just choosing the ‘common’ species.

4 Likes

I had a fun story along those lines. I was spending a bunch of time in parts of southwestern New Mexico that are not particularly remote, but also aren’t that frequently visited. One of the common plants I was seeing was a little aster that I identified as Diaperia verna, since that seemed to be the only decent candidate among the species known to occur in New Mexico. After a couple of years of seeing these plants and calling them Diaperia verna, I happened to post some observations of them to iNaturalist. Luckily, James Morefield happens to be on iNaturalist and is one of the most knowledgable botanists for these little asters. Turns out I’d been seeing not one, but two species that were new records for the state, Logfia depressa and Stylocline sonorensis. And I’d been misidentifying them for years because I assumed they were probably the “common species known from the area” and never really thought about it further than that.

There’s still a lot we don’t know.

5 Likes

I’ve found several cool species just because I took a picture of a “common” thing that turned out not to be! For example, I had no idea about our many species of endangered native Shoulderband snails until one day I decided I may as well upload a pic of the “garden snail” shell I found on a hike and it got identified as the critically imperiled Pomo Bronze Shoulderband Snail.

So that’s why I try to photograph everything, even the common things I’ve seen a million times!

3 Likes

This may be, but I’ve also often come across statements from literature and ID keys that didn’t turn out to be exactly as written in the real world. I don’t know the specific species you mentioned, but do you suspect many iNat nitida are misIDed as soror? If so I agree, but if not I’d suggest that the one species getting many more observations (in a country with extensive wasp sampling) may indicate it’s actually more abundant in nature too, given that sampling size is also somewhat high.

I also tried to raise this more difficult version of the question. I agree so far with the consensus in the comments about doubtful species. But a more difficult dilemma occurs if the obscure species seems to probably be an accurate record, yet one with virtually no other info or occurrence records, and from long in the past. That’s the most costly situation, because we may hold ourselves back from IDing the species due to the obscure record even though the obscure species may not even look similar, if present at all. This question would be worthwhile for people to consider addressing in more replies.

A few other clarifications re: replies by multiple people above. By a species being “common” I mean abundant on iNat, GBIF, and/or literature (not only based on iNat). By a species being “obscure” I mean there’s difficulty finding information about it’s occurrence status or ID basis even when considering all sources, and it may only have one or no iNat or GBIF records.

1 Like

That sounds about right :joy: there’s always discrepancies, and maybe some things that were true in the past are not true any more. In the case of Austroscolia though it seems to be that the literature is true - collections seem to have them both in fair numbers, and all the literature is in agreement. The species cannot be told apart without a close look, and the vast majority of people are not even aware that A. nitida exists.

I agree this is a very difficult question, and something that I doubt we will ever get a satisfactory answer to. In my opinion, it is better to be correct at a higher taxon than to be incorrect at a lower taxon, but it really depends where your ‘threshold’ is. Different people will have different thresholds of acceptability. E.g. if one species is green all of the time and another species is blue 99.9% of the time, but green 0.1% of the time, should we ID solely based on colour? I think here we should refrain from a species ID going just by colour. But if two species are identical but well-separated by range, should we refrain from IDing any of them because of the possibility of vagrants? I think we would be fine to ID them in this case. But as before it will be different for different people. I think the key thing is that we should state our assumptions - e.g. “I am IDing as species X because it is green, and species Y is rarely ever green”. That way at least there is a record for why we chose one or the other. It’s a tough issue no matter how you look at it

1 Like

I don’t think I understand what cost you’re concerned by, here. From my viewpoint, the relevant costs and benefits are related to the amount of novel information an observation provides. If a species is poorly known in an area, or poorly known globally, a new observation of it, accompanied with reliable identifications, is at the top end of the information scale. Ergo, the difference in information between making or not making the correct ID is at the high end of the scale as well. If a species is already abundantly documented in an area, a new observation of it is towards the bottom end of the information scale and there is very little difference in information associated with making or not making the correct ID. “Holding back” from the ID in that context does not make much difference.

I’m also probably not understanding the specific context you’re imagining very well. I deal mostly with plants in the western U.S., and in that context my experience is that it is rare for there to be published names for which neither the original publication, nor any specimen images, nor relevant subsequent taxonomic works are available. So when you mention it being hard to find information, my default mental response is something like, “Well, yeah, sometimes you have to put in the work to find it.” If I were frequently running into plant names for which none of this information were available, I think my conclusion would be somewhat the opposite of yours. I’d be starting to think our understanding of that genus is so poor that we may not be able to make any species-level IDs with any kind of confidence.

I’m pretty sure I also stumbled on an undocumented population of snails around here. As you say, it appears to just not be possible to get a species-level ID given current understanding of the genus, because it is unclear what applying Occam’s razor would even mean; is the simplest explanation that it is a widely isolated population of a described species with a similar habitat, an abundant nearby species migrating into habitat that would appear to be unsuitable, or something completely undescribed? Hopefully, the existence of the observation can help someone answer these questions. Also, I now know to keep my eye out for them to take more observations.

1 Like

I mean if there’s a very high chance that hundreds or more genus obs. in a location are all actually same abundant species then it’s a loss of data specificity not to ID them to species due to cautiously staying at genus due to knowing one obscure record of a second species.

I mostly ID wasps and bees, although I ID mammals and some plants, but am unsure if I’ve seen this issue in plants either. And, when I say little to no info is available on some e.g. wasp species, I mostly mean info that could help ID or indicate whether it looks similar to an abundant, well-known species. Often names are mentioned in at least one source, e.g. species catalogs, taxonomic revisions. Yet I’ve come across many only mentioned in a 20th or 19th century paper with no description, ID key, or photos, or occurrence record on iNat or GBIF. As you imply, in some cases it’s possible that further info. could be gained by corresponding with external experts or visiting or loaning from museum specimen collections.

Yet some species seem so obscure that their only publications are posthumous, or the species was only collected/observed once or a few times, meaning there are few specimens among all global museums. It can also be a time-consuming process to find what museums have them and to learn if accessing them is possible. I assume even museum taxonomists sometimes find accessing external material difficult or time consuming. One suggestion that would greatly help would be to increase museum and taxonomist awareness of the usefulness of digitizing specimen records and especially including photos. Whether submitting them to GBIF or iNat. (iNat is preferred, because then it gets added to two databases, and helps identifiers). It would also help for taxonomists to become more aware of iNat, and to welcome questions from iNat identifiers, or consider joining iNat as identifiers or observers. Most external experts I’ve emailed have replied helpfully. Overall, these global databases can become or already are the most useful and fastest (additional) way to share and integrate global museum data. Any records/specimen photos which aren’t added to online databases are in effect “lost” to many people. And moving toward ideally digitizing everything is useful both for people training in these fields and other external experts alike.

[quote=“brian_d, post:18, topic:30558, full:true”]
I mean if there’s a very high chance that hundreds or more genus obs. in a location are all actually same abundant species then it’s a loss of data specificity not to ID them to species due to cautiously staying at genus due to knowing one obscure record of a second species.[/quote]

My feeling is that all of the relevant variables here increase proportionally with scale. Meaning, if the potential information loss from refraining from IDing a single observation of a common and well-documented species is quite small, while the potential information loss from misidentifying a single observation of a very poorly-known species is much greater, then the same is true if we’re talking about hundreds of observations. The concern we should have about the underlying problem–that we don’t know enough about the genus to identify species within it–should also increase proportionally.

And if the “common species” is so abundant and well-documented that we might encounter this kind of problem on hundreds of observations, I think the potential information loss on that side of the balance is still quite small. For instance, there are 24,073 observations of the Carolina chickadee on iNaturalist. Even if we simply deleted 5,000 of them, this really wouldn’t have any appreciable effect on our knowledge of the species.

I know just enough about insects to know that it’s a different world from plants. :-) However, I think what you’re describing would lead me to believe that we really don’t know enough to identify species in taxonomic groups where you’re encountering this issue frequently. If basically no one knows how to tell two species apart, even the belief that one of them is “common” may or may not have any basis in fact–all of our knowledge becomes suspect, for the well-documented species along with those that are obscure.

As someone who used to work in and still works with herbaria: That is absolutely, 100% true. :-)

In the plant world this is definitely a high priority for herbaria. The limit is funding and staff. Support for specimen collections and taxonomists from universities and funding agencies like NSF has always been poor and it has not been improving.

iNaturalist isn’t really built to be, or intended to be, a repository for specimen data. When I was working in an herbarium, I would not have considered it a viable option for making our data public. GBIF, personally, I find to be a totally unusable mess. That opinion is definitely not universally shared, but neither is it unique to me. :-) In any case, their basic mission is data aggregation, I think all of their data is pulled from other resources rather than having GBIF be the primary “home”. In the plant world, 99% of my specimen data comes from SEINet (swbiodiversity.org), occasionally from other herbarium databases. Although I don’t spend much time with arthropods, SCAN (scan-bugs.org) is my first stop.

I think the biggest limitation here is, again, funding and staff. Most university faculty are basically working 60+ hours a week on all the stuff they get paid to do, and helping the public with identification is rarely on that list.

From my point of view, there are also some severe limitations to the usability of iNaturalist to taxonomists. It’s a dead horse I’ve beat many times in the iNaturalist forum, so I’ll spare you the details. :-) The very brief version is that taxonomy is about figuring out how many species, genera, etc., are out there. Uncertainty and disagreement are fundamental to the process. iNaturalist is built on the assumption that we’ll all use a single, consensus list of “correct” names.

And there are also some taxonomists who just don’t believe that images are real data or worth their time. I push against this belief when the occasion arises and I think it is declining, but it is still a factor.

1 Like

Digitization is definitely a high priority at many museums - GBIF and other data portals have come out of this.

The limiting factor is just time and resources (as always). Museums don’t have much funding as is, so staff have to make hard decisions about whether to collect new specimens, ID/process older ones, digitize existing ones, maintenance of aging buildings/equipment, etc. Most collections are moving towards digitization in some form, and I haven’t met a museum professional who doesn’t know the high value of these data. They’ll get there eventually, but it would be quicker with more support (which honestly isn’t very likely to happen).

3 Likes