Species range creep from misidentifications

alisonnorthup · July 13, 2020, 4:37pm

Vision has become very good at identifying organisms in many cases - it is truly impressive! But I have noticed (particularly since the latest large-scale Vision update) that certain plants with limited range have begun “popping up” in unexpected locations. This has always happened occasionally, but the problem seems much worse now, and I suspect that Vision is paying less attention to locations than it did in the prior update. I think it is important to get a handle on this issue since iNaturalist learns from itself and bad information can proliferate, with both people and Vision learning things incorrectly and then reapplying that information.

An example of artificial range creep - Oxalis drummondii
The species where I have seen this artificial range creep most dramatically is Oxalis drummondii, a species that is range-limited to Texas (maaaaybe Arizona) and Northeastern Mexico. It happens to look very similar to Oxalis violacea, which has a much larger range (most of Eastern North America), with the biggest visual differences being leaflet shape and flowering season. Oxalis drummondii has no presence in the horticultural trade and is not spreading from its native range, so it should never show up outside of its native range.

I had been curating this species’s IDs, but I recently stopped fixing the bad IDs to see what would happen, and also because I have a baby now and very limited free time for fixing IDs. Sure enough, now there are observations all over Eastern North America, a few along the Pacific Coast of North America, and also in Europe! I don’t recall this species ever getting mis-ID’d so far outside of its native range before the latest update of Vision.

Take a look at the artificial range creep here:
https://www.inaturalist.org/taxa/165989-Oxalis-drummondii
compared to the United States range map:
http://bonap.net/MapGallery/County/Oxalis%20drummondii.png

How bad is it?
Perusing through 22 of the observations of O. drummondii outside of its range (all incorrectly ID’d), and looking at the Vision suggestions, I found O drummondii was listed in the following positions by Vision:
1st: 3 times
2nd: 10 times
3rd: 3 times
4th: 2 times
6th: 1 time
not listed: 3 times (but ID shows up as a Vision suggestion, so it was listed in the past)
Vision also suggests genus Oxalis first for most of these, but users are frequently overlooking that suggestion and choosing one of the species IDs, often not the first recommendation.

Most of the observations had “Needs ID” status, but two were “Research Grade” (including one in Europe - yikes!) and three were “Casual” (incorrectly ID’d).

The fact that some are reaching research grade is bad news for iNaturalist as a data source, and also as a learning tool (proliferation of bad information). The fact that the observations are proliferating without someone manual correcting them shows to me that an adjustment needs to be made. The fact that users are choosing suggestions that are not the top suggestion implies that future improvements to Vision’s accuracy will not be enough to fix this problem.

What type of fix is possible?
Could Vision make fewer suggestions, not always the default of genus + 8 species suggestions? Could Vision once again put more weight on “Seen nearby”? Could range maps be taken into account in Vision suggestions? I don’t know what the best fix would be.

bouteloua · July 13, 2020, 4:40pm

@alisonnorthup just a heads up that I moved this from #feature-requests to #general since it looks more like a summary of an issue or brainstorming topic than a specific actionable request, as well that per staff request, we currently aren’t approving new requests related to computer vision.

I’m on my phone, but maybe someone could link to some of the old existing topics related to improving computer vision’s suggestions.

alisonnorthup · July 13, 2020, 4:42pm

Thanks @bouteloua, I actually just moved it to #curators without noticing the change you made.

bouteloua · July 13, 2020, 4:47pm

Oh haha I thought it was just my fat thumbs that misclicked, and I had moved it back to General. I do think it’s a topic that affects all users pretty similarly, whether or not they’re a curator.

malisaspring · July 13, 2020, 4:54pm

The CV clean up post you want linked is this one, right?

https://forum.inaturalist.org/t/computer-vision-clean-up-wiki/7281/77

alisonnorthup · July 13, 2020, 5:19pm

Thanks @malisaspring. I just added a bunch of species to that wiki, as well as an additional “Common Theme”, which is “Two or more species that are visually similar but have non-overlapping ranges, with each suffering misidentifications outside of range”.

It seems like there might be two categories of Computer Vision - mediated mistakes (with gray area between). In some cases the mistakes were already present on iNaturalist, and Computer Vision is amplifying them. In other cases, especially in the case of non-overlapping ranges, the mistakes weren’t really occurring at all until Computer Vision started making suggestions of look-alike species.

matthias55 · July 13, 2020, 5:44pm

Thanks for pointing out this issue.
Three suggestions, two broad and one specific:

When a user picks a species from the list of CV suggestions that is visually similar but not seen nearby, have a pop-up box that says something like, “This species has not been reported from your area. Are you confident in your ID?”
On the iNaturalist taxon page, have a tab that lists the actual, known range of the species in question. This might not be possible for all taxa, but for NA plants we could use BONAP & I think it would be quite helpful.
For Oxalis drummondii in particular, create a wikipedia page for the “about” section and make sure that it includes information about the range. What you wrote above, Texas, NE Mexico, portions of AZ would be helpful for the casual user to know.

Oh, and congrats on your baby!

liquidanbar · July 13, 2020, 6:11pm

There are similar problems with some bees- but I’m curious about specific places where things co-exist and hard to differentiate? I spent the weekend cleaning up carpenter bees in Texas…

lotteryd · July 13, 2020, 6:17pm

In such cases a knowledgeable person could disagree back to the genus level with a comment like, “it could be x or y but we’d need to see traits a and b to decide”. Genus can still get to research grade so it’s not a “loss” on that front.

liquidanbar · July 13, 2020, 6:18pm

That’s exactly what I did :)

geographerdave · July 13, 2020, 6:25pm

I have the same experience/issue with several species Rhododendrons being reported out of range, and cultivars which are abundant in urban areas worldwide being identified as species.
For west coast North America species this includes:
R macrophyllum (being identified on east coast N. America and Europe; and in urban settings where it is rarely found, or grown even in/near it’s range).
R albiflorum (being misidentifed out of range)
R menziesii (being misidentified on east coast N America with the native R pilosum)
R occidentale (doesn’t grow east of its natural range).
Since there are a bazillion cultivars, and many if not most people grow them and not species rhododendrons, Is it possible to have a pop-up box ask for rhododendrons “is this plant wild, or growing in a garden?”
When I go thru the identifications for the few rhododendrons I am most interested in and see a house, wall, mown grass, or other human construction I think the plant is a cultivar.
Thanks
David

cthawley · July 13, 2020, 6:37pm

Anecdotally, I see this happening with lizards. I’m seeing more and more Sceloporus mis-IDed as brown anoles, seemingly because of AI suggestions. These aren’t too hard for humans to tell apart, and the mis-IDs seem to be mostly newer users (or at least users with fewer observations) who may be more likely to use AI. In my anecdotal case, I think most of these are fixed, but I am definitely concerned about cycles of reinforcement, where bad IDs can proliferate across the board.
In taxa that get a lot of attention (lizards being one), these are probably mostly caught. However, the taxa with fewer experts on iNat that can’t go through a large proportion of observations can make this quite difficult to deal with.
For what it’s worth, I think this potential issue is “philosophically” linked to the “agree reflex” issue, where some users reflexively agree to another users ID based only on the perception of the other user’s expertise (and not on their own). When these two issues come together, it isn’t hard to envision how you could have a lot of incorrect IDs resulting from humans using incorrect AI suggestions, and then other humans confirming those. It’s a tough issue to address I think.

brandonwoo · July 14, 2020, 3:56pm

I’ll just chime in here to say that this is certainly a problem with taxa that have fewer experts, especially in insects. Within my precious Orthoptera, there are several commonly mis-identified (by AI) taxa that I commonly go through and correct - but if I leave it for even just a few weeks, they pile up like crazy and appear to show dramatic range expansions. An example is the genus Dichromorpha, which is a New World group. I just checked and there are over 70 observations (several of them RG) from all across Europe, Africa, and Asia. And the more that pile up in a given place, the more tend to accumulate right around that place:

That example is fairly straightforward since I can just go through all of those observations from outside the New World and knock them down to family or whatnot (some of them aren’t even grasshoppers!). I think the problem becomes more complicated with taxa that actually do have an introduced range. The genus Phaneroptera is a good example of this. This genus is Old World but has a couple of introduced populations in five U.S. states. The populations in CA and NY are very well-photographed, and I think as a result of overconfident AI, it always appears as though their introduced range is expanding when it mostly is not. Once in a while the AI actually does detect a real range expansion - the MA population was apparently unnoticed until I looked at the observations and saw they were actually what they claimed to be (which is super cool!):

I basically agree with the points made above (and elsewhere) that it should be more stressed to new users that AI can be incorrect and you must check the range, among other things. I would say that AI should not suggest a taxon unless it has been reported at least once before in the country (or state/province even). Additional people IDing in the harder groups would certainly help, but that seems more like trying to fix a problem instead of preventing it from happening in the first place. It’s also somewhat tiring to constantly be having to knock things back down to family, especially the ones in parts of the world where I’m not really familiar with the fauna - I just know that it sure ain’t the species suggested. I would rather spend my time concentrating on IDing within the regions I know best, because I can more often give species IDs instead of just giving the order or family.

deboas · July 14, 2020, 4:25pm

This would help enormously. “Seen nearby” should not be constrained by season (as it is now), but ideally options that have not been Seen Nearby should either not be offered or it should be more clearly highlighted to novice users that these are unlikely and shouldn’t be selected unthinkingly.

This seems a real problem with all of the cases identified in the clean-up wiki, and many that have not been included there. Could one solution to this be to have the ability to flag taxa that should always (at least by novice users) be identified at a higher level? This could involve a vote on taxon pages - like IDs in that at least two and a clear majority of users would need to vote for what ID should be offered if the AI “thinks” it’s identified something. For example, “Trombidium holosericeum” should never be suggested, but if the AI lands on that species, it should offer the Family, for example.

This would require a second database of “safe IDs” to be linked to the taxonomic database used by the AI, but as the AI is not continuously updated, and as this would likely affect only perhaps a few thousand species, it doesn’t seem like it would be too onerous to implement. The AI would be trained exactly as it is now, but instead of offering species-level IDs for cases highlighted and agreed by the community, it would offer a “safe ID”. What do others think?

bouteloua · July 14, 2020, 4:29pm

This was the main one I was thinking of: "Better use of location in Computer Vision suggestions ".

DianaStuder · July 14, 2020, 9:07pm

I have knocked back that South African one

alisonnorthup · July 15, 2020, 11:21pm

@brandonwoo, @geographerdave, @cthawley, and @deboas, thanks for sharing these great examples. They exactly match the problem I was describing, and show that it is happening across taxa - not just in plants. Also, @brandonwoo’s example of the MA population that was correctly identified by Computer Vision is interesting - that’s really the argument for what CV has to offer by considering organisms that have not only been seen nearby.

I am with you that it can be tiring to be constantly trying to hold back the wave of identifications showing up in the wrong areas, especially since it seems, at least for my beloved Oxalis, that the problem has gotten worse. I would like to be spending my time on iNaturalist working on identifications that take more skill, like identifying the South American Oxalis whose species haven’t even been added to iNaturalist yet. Or by making range maps that would help people with IDs. Do we want to have lots of taxonomic experts spending hours each week correcting observations with IDs that are so uninformed?

bouteloua · July 15, 2020, 11:33pm

I also think it’s easy to tell people, and I’m guilty of this, that if they’re tired of fixing these misidentifications, just not to do it. But once you’ve sunk this much effort into helping maintain a reasonably accurate database, it’s pretty hard to step away.

upupa-epops · July 15, 2020, 11:35pm

I tend to think that it’s not worth the problems it causes, and perhaps those populations should be identified for the first time by people rather than the computer anyway.

cthawley · July 16, 2020, 12:51pm

I generally agree that the “overidentification” by the AI isn’t worth the problems that it causes. There are definitely some situations in which the AI may correctly ID an out or range population or novel introduction of an invasive species or something, and these are definitely cool. If iNat were primarily a tool for finding and tracking introduced species, this would be an important outcome for AI.

But, iNat is really a tool to get people involved with nature and create a community around observing. If the types of AI suggestions we’re discussing are causing IDer burnout or frustration or conflicts between experienced and new users, then I think they may not be worth it (and I say that as an invasion biologist who is personally and professionally interested in species introductions!). Additionally, as an ecologist, I think having more accurate maps for a broad variety of species, even common ones, is more valuable than a better chance of finding out of range populations/individuals.

I think the best solutions will depend on the details of how the AI works, so I don’t know what would be best. However, one suggestion (which may have been raised before) is to change the location info given by the AI. Currently it notes that a species is “seen nearby” which is phrased in a positive way (meaning that it encourages users to pick it). But we could also consider labeling some choices as “not found nearby” or something similar which is more negative or a warning which would say “Hey, be sceptical of this.”

Topic		Replies	Views
Possible increase in CV errors around organism range/location General	22	931	November 1, 2022
Frequent incorrect observations due to specific common names! General	43	1942	January 24, 2024
How To Tweak or Give Feedback on Problematic Species Suggestions? General	4	845	February 14, 2020
Why so many implausible identifications? General	21	1126	November 2, 2022
Why aren't taxa that are out of range automatically removed from suggestions? General	4	675	December 29, 2020

Species range creep from misidentifications

Related topics