Species range creep from misidentifications

Oh haha I thought it was just my fat thumbs that misclicked, and I had moved it back to General. I do think it’s a topic that affects all users pretty similarly, whether or not they’re a curator.

1 Like

The CV clean up post you want linked is this one, right?

https://forum.inaturalist.org/t/computer-vision-clean-up-wiki/7281/77

4 Likes

Thanks @malisaspring. I just added a bunch of species to that wiki, as well as an additional “Common Theme”, which is “Two or more species that are visually similar but have non-overlapping ranges, with each suffering misidentifications outside of range”.

It seems like there might be two categories of Computer Vision - mediated mistakes (with gray area between). In some cases the mistakes were already present on iNaturalist, and Computer Vision is amplifying them. In other cases, especially in the case of non-overlapping ranges, the mistakes weren’t really occurring at all until Computer Vision started making suggestions of look-alike species.

3 Likes

Thanks for pointing out this issue.
Three suggestions, two broad and one specific:

  1. When a user picks a species from the list of CV suggestions that is visually similar but not seen nearby, have a pop-up box that says something like, “This species has not been reported from your area. Are you confident in your ID?”
  2. On the iNaturalist taxon page, have a tab that lists the actual, known range of the species in question. This might not be possible for all taxa, but for NA plants we could use BONAP & I think it would be quite helpful.
  3. For Oxalis drummondii in particular, create a wikipedia page for the “about” section and make sure that it includes information about the range. What you wrote above, Texas, NE Mexico, portions of AZ would be helpful for the casual user to know.

Oh, and congrats on your baby!

9 Likes

There are similar problems with some bees- but I’m curious about specific places where things co-exist and hard to differentiate? I spent the weekend cleaning up carpenter bees in Texas…

5 Likes

In such cases a knowledgeable person could disagree back to the genus level with a comment like, “it could be x or y but we’d need to see traits a and b to decide”. Genus can still get to research grade so it’s not a “loss” on that front.

1 Like

That’s exactly what I did :)

5 Likes

I have the same experience/issue with several species Rhododendrons being reported out of range, and cultivars which are abundant in urban areas worldwide being identified as species.
For west coast North America species this includes:
R macrophyllum (being identified on east coast N. America and Europe; and in urban settings where it is rarely found, or grown even in/near it’s range).
R albiflorum (being misidentifed out of range)
R menziesii (being misidentified on east coast N America with the native R pilosum)
R occidentale (doesn’t grow east of its natural range).
Since there are a bazillion cultivars, and many if not most people grow them and not species rhododendrons, Is it possible to have a pop-up box ask for rhododendrons “is this plant wild, or growing in a garden?”
When I go thru the identifications for the few rhododendrons I am most interested in and see a house, wall, mown grass, or other human construction I think the plant is a cultivar.
Thanks
David

5 Likes

Anecdotally, I see this happening with lizards. I’m seeing more and more Sceloporus mis-IDed as brown anoles, seemingly because of AI suggestions. These aren’t too hard for humans to tell apart, and the mis-IDs seem to be mostly newer users (or at least users with fewer observations) who may be more likely to use AI. In my anecdotal case, I think most of these are fixed, but I am definitely concerned about cycles of reinforcement, where bad IDs can proliferate across the board.
In taxa that get a lot of attention (lizards being one), these are probably mostly caught. However, the taxa with fewer experts on iNat that can’t go through a large proportion of observations can make this quite difficult to deal with.
For what it’s worth, I think this potential issue is “philosophically” linked to the “agree reflex” issue, where some users reflexively agree to another users ID based only on the perception of the other user’s expertise (and not on their own). When these two issues come together, it isn’t hard to envision how you could have a lot of incorrect IDs resulting from humans using incorrect AI suggestions, and then other humans confirming those. It’s a tough issue to address I think.

7 Likes

I’ll just chime in here to say that this is certainly a problem with taxa that have fewer experts, especially in insects. Within my precious Orthoptera, there are several commonly mis-identified (by AI) taxa that I commonly go through and correct - but if I leave it for even just a few weeks, they pile up like crazy and appear to show dramatic range expansions. An example is the genus Dichromorpha, which is a New World group. I just checked and there are over 70 observations (several of them RG) from all across Europe, Africa, and Asia. And the more that pile up in a given place, the more tend to accumulate right around that place:

That example is fairly straightforward since I can just go through all of those observations from outside the New World and knock them down to family or whatnot (some of them aren’t even grasshoppers!). I think the problem becomes more complicated with taxa that actually do have an introduced range. The genus Phaneroptera is a good example of this. This genus is Old World but has a couple of introduced populations in five U.S. states. The populations in CA and NY are very well-photographed, and I think as a result of overconfident AI, it always appears as though their introduced range is expanding when it mostly is not. Once in a while the AI actually does detect a real range expansion - the MA population was apparently unnoticed until I looked at the observations and saw they were actually what they claimed to be (which is super cool!):

I basically agree with the points made above (and elsewhere) that it should be more stressed to new users that AI can be incorrect and you must check the range, among other things. I would say that AI should not suggest a taxon unless it has been reported at least once before in the country (or state/province even). Additional people IDing in the harder groups would certainly help, but that seems more like trying to fix a problem instead of preventing it from happening in the first place. It’s also somewhat tiring to constantly be having to knock things back down to family, especially the ones in parts of the world where I’m not really familiar with the fauna - I just know that it sure ain’t the species suggested. I would rather spend my time concentrating on IDing within the regions I know best, because I can more often give species IDs instead of just giving the order or family.

11 Likes

This would help enormously. “Seen nearby” should not be constrained by season (as it is now), but ideally options that have not been Seen Nearby should either not be offered or it should be more clearly highlighted to novice users that these are unlikely and shouldn’t be selected unthinkingly.

This seems a real problem with all of the cases identified in the clean-up wiki, and many that have not been included there. Could one solution to this be to have the ability to flag taxa that should always (at least by novice users) be identified at a higher level? This could involve a vote on taxon pages - like IDs in that at least two and a clear majority of users would need to vote for what ID should be offered if the AI “thinks” it’s identified something. For example, “Trombidium holosericeum” should never be suggested, but if the AI lands on that species, it should offer the Family, for example.

This would require a second database of “safe IDs” to be linked to the taxonomic database used by the AI, but as the AI is not continuously updated, and as this would likely affect only perhaps a few thousand species, it doesn’t seem like it would be too onerous to implement. The AI would be trained exactly as it is now, but instead of offering species-level IDs for cases highlighted and agreed by the community, it would offer a “safe ID”. What do others think?

This was the main one I was thinking of: "Better use of location in Computer Vision suggestions ".

2 Likes

I have knocked back that South African one

1 Like

@brandonwoo, @geographerdave, @cthawley, and @deboas, thanks for sharing these great examples. They exactly match the problem I was describing, and show that it is happening across taxa - not just in plants. Also, @brandonwoo’s example of the MA population that was correctly identified by Computer Vision is interesting - that’s really the argument for what CV has to offer by considering organisms that have not only been seen nearby.

I am with you that it can be tiring to be constantly trying to hold back the wave of identifications showing up in the wrong areas, especially since it seems, at least for my beloved Oxalis, that the problem has gotten worse. I would like to be spending my time on iNaturalist working on identifications that take more skill, like identifying the South American Oxalis whose species haven’t even been added to iNaturalist yet. Or by making range maps that would help people with IDs. Do we want to have lots of taxonomic experts spending hours each week correcting observations with IDs that are so uninformed?

8 Likes

I also think it’s easy to tell people, and I’m guilty of this, that if they’re tired of fixing these misidentifications, just not to do it. But once you’ve sunk this much effort into helping maintain a reasonably accurate database, it’s pretty hard to step away.

7 Likes

I tend to think that it’s not worth the problems it causes, and perhaps those populations should be identified for the first time by people rather than the computer anyway.

2 Likes

I generally agree that the “overidentification” by the AI isn’t worth the problems that it causes. There are definitely some situations in which the AI may correctly ID an out or range population or novel introduction of an invasive species or something, and these are definitely cool. If iNat were primarily a tool for finding and tracking introduced species, this would be an important outcome for AI.

But, iNat is really a tool to get people involved with nature and create a community around observing. If the types of AI suggestions we’re discussing are causing IDer burnout or frustration or conflicts between experienced and new users, then I think they may not be worth it (and I say that as an invasion biologist who is personally and professionally interested in species introductions!). Additionally, as an ecologist, I think having more accurate maps for a broad variety of species, even common ones, is more valuable than a better chance of finding out of range populations/individuals.

I think the best solutions will depend on the details of how the AI works, so I don’t know what would be best. However, one suggestion (which may have been raised before) is to change the location info given by the AI. Currently it notes that a species is “seen nearby” which is phrased in a positive way (meaning that it encourages users to pick it). But we could also consider labeling some choices as “not found nearby” or something similar which is more negative or a warning which would say “Hey, be sceptical of this.”

6 Likes

I suggested these two interventions on the location data thread:

  1. Require location data to be entered before ID. When uploading via the desktop site, ID is the first field and location the third. Therefore AI can’t offer “Seen nearby” because it doesn’t know where the observation is.

I became aware of this when I got a new camera that didn’t have GPS and have to now enter my positions manually. Beforehand the metadata in the photo was automatically entered before the ID.

  1. AI should offer suggestions preferably to family, or genus at the most. It should not offer species-level suggestions.
5 Likes

@jane_trembath, good thinking with the location-before-ID suggestion! That might be why, in some of the observations I was looking at, Oxalis drummondii was no longer being suggested by Computer Vision.

BTW I’m realizing now that I should have taken a screenshot of the Oxalis drummondii observations I was seeing proliferate globally like @brandonwoo did - it appears some kind-hearted soul or souls has gone about fixing them. But now my original post doesn’t make as much sense.

1 Like

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.