Species Suggestions for the Wrong Continent

Since the implementation of the suggested species list when making an ID, the number of submissions of species not found on the continent has climbed dramatically. It’s pretty clear that some users simply choose the top species in the list, or they just choose the species based on the common name without a quick check to make sure that the chosen species even occurs on the same continent. I don’t know how many North American “blue-tailed damselflies” and “common blue damselflies” I have had to correct over the last couple years, and I have gotten into the habit of seeking these out which takes more of my time.

It’s pretty clear that geography doesn’t seem to be a factor in the list of suggested species—at least not to the degree that it would prevent some of these, but is this something that the machine-learning doohickey should pick up on over time?

I also wonder if it would be helpful to provide a “Are you sure?” message to the user when they choose a species that is nowhere close to the location (e.g. picking a European species that is not recorded in North America or vice versa).

23 Likes

I guess the worst problem, is when the algorithm says “Is very sure” about the ID, and the species doesn’t exist on the continent/region.

11 Likes

Particularly bad with bivalves; in New England most of the uninitiated are currently identifying local bivalves as Pacific species. I guess one mussel looks much the same as another to the software.

And gulls of the genus Larus. A lot of people identifying them as anything but native species.

2 Likes

Welcome to the iNaturalist Forums @jimjohnson @mirmeleon @jlayman! Thank you for being part of our community.

FYI there is a related discussion here. Definitely an issue of concern.

3 Likes

I think this is a major issue, especially considering it takes only two of those incorrect ids for the observation to go into GBIF. There is a real possibility that bad ids and a bad suggestion system are contributing to scientific errors.

The problem is particularly serious with invertebrates. Many of the insect species I work with are completely unidentifiable to species based on the majority of the photos here. The issue is just as bad with species from disparate geographic areas within continents. There’s usually one species per genus that is suggested and everyone selects. I often get people questioning my id because iNaturalist says its range covers a certain area, but that range is based off the community ids, creating a negative feedback loop.

14 Likes

Yes, it’s a huge problem in insects, but also many plants, particularly New Zealand endemics. I have made of list of vulnerable taxa (Computer Vision Traps) that I am trying to check regularly to prevent the accumulation of too many misidentified research grade observations. But I’m afraid it’s a bit of a fool’s errand.

10 Likes

Yes, absolutely, this is also a really significant problem with Mollusca worldwide. And mollusks are a really huge phylum – for example, gastropods are second only to insects in terms of how many species there are! This is a major problem!

The Computer Vision is ID-ing every cockle in the world as if it were one California species, every small mussel as a New Zealand species, and so on, for thousands of species.

The AI suggestions are not even labelled as being a Beta Version. Newbies assume the combined power of all of iNat is behind these guesses.

I think the Computer Vision should be filtered to allow its suggestions to be presented only for those areas and those taxa for which it is competent.

17 Likes

As insects have been mentioned, one example that I’ve been tracking over the past year and change is Xanthocryptus novozealandicus. Due to this species’ numerous observations in New Zealand (where it’s endemic), it frequently pops up in the Computer Vision suggestions for the US (where we have on the order of 10+ superficially-similar species with a decent number of observations). This is a rather particular example, but we really do need a way to prevent species that are endemic elsewhere from showing up in ID suggestions elsewhere.

I think part of the problem is not just how Computer Vision operates but also how users historically make identifications. The CV side is that it seems to factor in number of positive IDs globally (as in the above example as well as other New Zealand endemics). One region with a single, easily identifiable species can accumulate hundreds of correct IDs. If atlasing could be used to restrict species suggestions, that could be of use. But these then get analyzed globally without really accounting for disagreeing IDs (say, every North American identification of X. novozealandicus being disagreed with at the genus level). The issue also seems to get really “fun” after a single ID is given within a range. One premature ID in a new range, even historically, seems to sway future IDs. Many of these traps, particularly NZ endemics, are avoidable with a simple Google search or even just clicking on the taxon page.

2 Likes

We see the same issues with lizards. Sometimes the AI suggested IDs are good to genus, but they can be found throughout the Western Hemisphere. They’re usually fairly obvious to correct, but the time to do so adds up.

1 Like

I agree with this issue - especially that there’s both a CV component as well as a human psychology component. I am an iNat newbie, and when going through many obs the temptation is to use the top suggestion. Personally I do click through to compare features, and if I’m not sure I go for genus or higher, but I don’t always remember to check geography. It would be nice if it showed up on the ID screen, but I don’t know if that’s feasible. Although it won’t solve the problem, it might help to require 3 agreeing IDs. At least RG isn’t granted by the first and only person to agree with the poster’s ID.

Just stray thoughts. Probably obvious :-)

4 Likes

Would atlases help with any of these? The main atlas page obviously isn’t really engineered to cope with 129 pages of atlases with out-of-range observations, but it is nice to be able to pull up a list of Probably Wrong observations at a click.

3 Likes

I am also a relatively new user, though I have been using it a great deal since I started. I have also run into this problem, and have committed this error of being far out of range based on suggestion I will admit. I can only imagine how frustrating it must be for experienced users/experts.

I think the idea of a warning if nothing else is a very good idea. To be honest, from the standpoint of user experience design, I would really expect that suggestions would be controlled by default for this factor. Indeed it was a few months in before I realized fully that the suggestions were NOT. Also from the standpoint of wanting to encourage citizen scientists and support an enthusiastic user base - it doesn’t feel great when you realize you were so off base in something you suggested. Certainly educating the user base is also one of the goals if such an endevour - and hopefully this can be done in a way that helps educate ahead of time rather than embarrass after the fact.

iNat has been an amazing find for me, and I believe it has the power to do much good and contribute, but especially given that many users are well-meaning but not experts, the AI system itself needs to be modulated in a way that helps prevent mis IDs.

12 Likes

Not only the wrong continents, but I have been bemused always seeing taxa belonging to four or five or even more families of plants. The suggestion must be very clear that this is a suggestion only and to be critically checked against nearby observations. Possibly add the “compare button” here?

3 Likes

I still wonder if the algorithm is intended to take geography into account—perhaps over time as we continue to make corrections, or is that component just not there? Maybe one of the staff or someone else who knows can chime in on that question.

From @tiwane in 2018 on the Google Group:

When you submit an observation to computer vision, it takes the first photo and runs it through the model and spits out visually similar results. If something is “Seen nearby” that will affect the ranking of the results. Improvements to ranking and displaying results is definitely something we are looking into. That, along with a way to train the model on not just species but higher-level taxa will be really helpful, I think. But that will take time, it’s not easy.

and

“Seen Nearby” searches for Research Grade observations - excepting the observation in question - within 100km of the requested coordinates and 45 days before and after the date specified. [not just that year, but any year]

2 Likes

In Central Texas, and probably across the South, we see this consistently with the live oaks, which are probably our most common native oaks. Invariably the first suggestion is Quercus agrifola, which occurs only west of the Sierra Nevada. The common name given, coastal live oak, is no clue to newbies as our live oaks are found all along the Gulf to mid-Atlantic coasts. I would like to see these features added:

  • Build a geographic factor into the suggestion algorithm so species outside their known distribution are excluded from the initial list.

  • Also check against regional alert lists. For example, in much of the U.S., kudzu, zebra mussels, or emerald ash borer would be on such a list. If there is a close match on that list, add it to the initial list and mark it prominently as an invasive species of concern.

  • At the bottom of the initial list, add a link that reads something like “Other possibilities.” When the user clicks that, widen the search to include other species that would be outside their known range.

  • Add the ability to filter for species that have been identified where they are on an alert list and, separately, other instances of species having been identified outside their known range.

Perhaps there are other ways to accomplish the same goals. These are the key features in any solution:

  • First, present a short list of the most likely possibilities in that locale plus “alert list” lookalikes, if any.

  • Distinguish clearly between natives and introduced species.

  • Let the user choose whether to widen the search.

  • Make it easy to find recent IDs of species on alert lists.

  • Make it easy to find IDs in a locale of interest to me that are worth reviewing because you wouldn’t expect to find them there.

7 Likes

Hi @baldeagle! I hope you don’t mind, i moved this post into this other thread since there’s been lots more discussion about the very same topics here. Though if you decide to branch off some of your other points as seperate threads or feature requests that is fine too!

1 Like

@treegrow and @bouteloua have caught a bunch of my errors with this…nice people helping me learn are the main reason I started to understand the function and limitations of AI/ CV (never sure which is the appropriate term). So, thank you. There is a wealth of dialogue on this and similar topic(s) that occurred on the google group but I’m hesitant to pull in my own or others’ relevant comments and make things too wordy here.

3 Likes

I agree with the concerns raised here, and have submitted a feature request for this issue. I linked to some of the relevant discussions on the google group. Please do vote for it as a feature request and move it up the list of priorities.

A good approach could be to brainstorm ideas for how to fix it here on this thread, and then people can add those to the feature request thread when there seems to be some agreement on a way forward?

5 Likes

For me this is key thing that’s needed. A wider net than 100 km would be appropriate in many parts of the world. A radius of 500 or 1000 km to filter suggestions should mostly fix the problem of Californian plants being suggested in South Africa, or Pacific molluscs suggested in the Atlantic. The “Seen Nearby” text could still be included for taxa seen within 100 km (I think within 45 days is too restrictive in many places - many taxa can be found year-round in the tropics but have few records).

I also like this suggestion. At least for taxa with complete taxonomies, the AI should “know” what it doesn’t know, i.e. it might be aware that it has a sufficient training set of photos for only 5 out of 12 members of a genus, and downgrade its certainty accordingly if it matches a photo to a member of that genus. Exactly how to program this seems like it could be challenging though…

5 Likes