Species Suggestions for the Wrong Continent

A bit late to the discussion, but I face this issue constantly when working with myriapods (millipedes and centipedes): No matter where on Earth the observation was taken, the algorithm reliably suggests almost any dark millipede with yellow spots as Harpaphe haydeniana (a species in a genus restricted to Western North America), any long, banded millipede as either Narceus (Central-Eastern North America) or Paeromopus (California endemic), and any pale millipede (or frequently, beetle grubs) as the California endemic Xystocheir dissecta. I spend much time correcting these frequent mis-IDs: Like many invertebrate groups, myriapods at the species level are poorly known among the general public (or even biologists), and misidentification is rampant online. I think the AI is clearly biased by historic & current user base (California, North America). My question is: how do we bap the AI on the nose and say “NO!” (or how do we boost the “I” in AI)? Does setting Establishment statuses (e.g. “endemic to California”, “Native to the the United States”) lower the probability of that taxon (and/or its constituent taxa) being suggested in Asia or anywhere else? Maybe if an out-of-range suggestion is chosen by a user, a checkpoint message could be raised, something to the effect of “this taxon doesn’t appear to be known from this area, do you wish to proceed?”

Update (May 12): After poking around in some other observations I admit that I may be seeing a biased assortment myself, seeing only the obviously wrong IDs, and missing ones that don’t cross my radar because they were correctly suggested (either by AI or the user). I don’t have access to all the data, but I do see that the AI ranks certain suggestions higher or lower based on geography, contrary to my hyperbolic claim of “No matter where on Earth the observation was taken…”. I don’t think the AI is fatally flawed, but still would like to see some more cooperation between people and machines, e.g. maybe in the form of a checkpoint when choosing a taxon not known from the continent, as described above.

6 Likes

Under the current design setting the endemic status or a range will have no impact on the suggestions. It may have other or future benefits, but nothing here. Maybe 35 species of millipede have enough records to have gone through the vision training. Maybe 3 of these are black and yellow in colour. If it gets sent a black and yellow millipede, the suggestions will pretty much always have these, especially given the proposed results list 8 (or is it 10 I can’t remember ) options.

Well then, that there’s a big problem. The problem with machines (even computers) is that they are dumb: they cannot think and reason like people. It sounds like the only way to train the Computer Vision AI to be smarter (offer better suggestions) is to feed it more Research Grade observations (positive reinforcement). I wish there was a way to add negative reinforcement, otherwise it will keep making the same bad suggestions. In the meantime, educating people about what does occur in their area might help.

3 Likes

oh I dunno, people can be pretty dumb too, it’s not fair to single out the machines!

3 Likes

There has been plenty of discussion, as well as an ongoing review by the developers about how it can be improved.

It is far from a simple question. How to determine what lives where (at both macro and micro geographic levels - saying something is found in Canada is useless to me if it is found 5,000 kilometers away from where I saw it), track it and be able to correlate that to an iNat record is not a simple process.

It is also not always the machine that is wrong, there are many cases where the vision suggestion actually proposes the right thing as the top choice, and the user selects something else from the list for some reason.

3 Likes

Hi, I’m from the US, and currently in Ladakh, India (studying plants, etc.). Two brand new users are uploading observations, and the ID’s are all from other continents. Most have been US species suggestions, but there was a New Zealand one as well. There are very few observations in Ladakh, which may be part of the computer algorithm problem, but shouldn’t the app default to lizards, for example, rather than suggesting a US lizard? I know that the local lizards are recorded, because I’ve posted a few of them.

Secondarily, I’ve suggested to one new user the app is wrong, but if he doesn’t look at the web interface, I’m not sure he’s really seeing my comments. I want to be encouraging, but with all the wrong app ID’s I’m not sure it is coming across.

3 Likes

there have been many discussions around factoring location into the AI suggestions. If you go to the topic lists you will be able to find them from the topic titles.

The best thing you can do is have a “cut and paste” message you can put in comments on those observations, basically asking observers to choose AI suggestions carefully, or better still to just put high level taxa as a starting point, eg Plantae for plants or Aranae for arachnids. Then try and enlist expertise on the local fauna/flora to be involved in getting those observations ID’d as accurately as possible, and then there will be that base of IDs to train the AI

Ok, will do. Thanks.

The AI has not yet been taught to factor in location, and it also is only just barely starting to learn to default to higher taxa.

As a result, the AI offers species IDs for organisms all over the world, very often ID-ing them as species from California or NZ. This is very unfortunate.

And yes, if the new users are only using the phone app, and are not going online looking at the website, they will not see your comments.

New users tend to imagine that the AI is completely reliable in its suggestions. Of course they would like to be able to magically get a species-level ID from the AI, no matter where they are in the world, but at this point in time it is better that they just type in “plant” or “bird” or “mollusc” etc.

3 Likes

To be 100 percent accurate (and no i dont want to be THAT GUY) it does factor in location in that it will explicitly tell you if the species suggested has been seen near where your observation was. What it does not do is to exclude things which have not.

Because a heavier weight is placed on visual match than seen nearby that drives the suggestions that are not viable based on range.

3 Likes

See: https://forum.inaturalist.org/t/better-use-of-location-in-computer-vision-suggestions/915/2

I think iNat is making a mistake with using AI to suggest species as it is currently implemented. For trees we are getting a huge number of false suggested IDs on species which I attempt to curate like Ulmus thomasii and Picea rubens, and I cannot easily keep up with the flood of inane uploads of these species which frequently look nothing like the suggestion, and take no context for location (e.g. being present in tropical Asia). I can only image what is happening across the board, it might be a nightmare. It seems to me that things worked better before the automated suggestions era. Or why not have suggestions to just genus/family instead of species to get the ball rolling for inexperienced IDers.

6 Likes

This is so true and this “false friends” (as linguists say) leave many new users confused.

In the places with small amount of observations it gets particullary annoying, since suggested species as far as I can tell are based on the number of observations - the greater it is, more frequently it shows up.

I presume with Animalia it can be done more easily - separation by geography (at least by continent). But with plants it can get little tricky, because for instance here in Almaty, Kazakhstan there are many acclimatized species from North America thanks to Botanical Garden - I’ve encountered several Nootka cypresses and Pseudotsuga sp. throughout the city.

6 Likes

Good points about the number of observations Kastani.

I think “rare” species with numerous “good” observations are subsequently vulnerable to a flood of incorrect observations based on the current AI approach: I give an example from my experience below.

For Ulmus thomasii as an example, I have made a serious effort over years to document and upload this rare species to iNaturalist, as I think it is likely threatened/endangered, and historically good data has been lacking to draw quantitative conclusions regarding the population. It appears to me that due to this species having received “attention” and being uploaded to iNat much more than one would expect given a relatively small/declining wild population (i.e., it is “over-represented” as an uploaded tree species for Ontario/Quebec), it is now frequently being suggested by AI for anything that looks somewhat similar (e.g. a photo of what is actually Celtis occidentalis may be be AI-suggested to be Ulmus thomasii). So, ignoring the obviously incorrect tropical Asia uploads, many of the problematic incorrect uploads for this species are occurring in or near its natural range, so the location may be plausible or vaguely realistic for a newly discovered population, but the actual ID based on photo evidence may be completely different from the AI suggestion to the trained eye with expertise/experience. Without constant curation of this and many many other species, I fear that one problem will be generally that “rare” species which have a critical mass of good uploads to iNat subsequently get swamped by a large number of incorrect observations based on AI suggestions (without constant dedicated curation).

10 Likes

I was taught at medical school that only three groups of people can correctly use “we” in this way. Heads of state, editors of newpapers and people suffering from worms.

4 Likes

That’s ok, I do correctly not very much :)

Oh, talking to me not you were…

1 Like

Can it not make a note about not seen here (say country or province level) previously?

1 Like

From https://forum.inaturalist.org/t/better-use-of-location-in-computer-vision-suggestions/915

1 Like

Has anyone suggested adding a “frequently misidentified” message for cases like Ulmus thomasii—that is, where the ID has been changed to another species more frequently than not? Or perhaps more frequently than some threshold, such as 20 percent of the time? The message could be displayed with the suggestion, as is “found nearby,” or it could be displayed next to the species name in the observation, as is “introduced.” Or both. :thinking: