Species Suggestions for the Wrong Continent


A bit late to the discussion, but I face this issue constantly when working with myriapods (millipedes and centipedes): No matter where on Earth the observation was taken, the algorithm reliably suggests almost any dark millipede with yellow spots as Harpaphe haydeniana (a species in a genus restricted to Western North America), any long, banded millipede as either Narceus (Central-Eastern North America) or Paeromopus (California endemic), and any pale millipede (or frequently, beetle grubs) as the California endemic Xystocheir dissecta. I spend much time correcting these frequent mis-IDs: Like many invertebrate groups, myriapods at the species level are poorly known among the general public (or even biologists), and misidentification is rampant online. I think the AI is clearly biased by historic & current user base (California, North America). My question is: how do we bap the AI on the nose and say “NO!” (or how do we boost the “I” in AI)? Does setting Establishment statuses (e.g. “endemic to California”, “Native to the the United States”) lower the probability of that taxon (and/or its constituent taxa) being suggested in Asia or anywhere else? Maybe if an out-of-range suggestion is chosen by a user, a checkpoint message could be raised, something to the effect of “this taxon doesn’t appear to be known from this area, do you wish to proceed?”

Update (May 12): After poking around in some other observations I admit that I may be seeing a biased assortment myself, seeing only the obviously wrong IDs, and missing ones that don’t cross my radar because they were correctly suggested (either by AI or the user). I don’t have access to all the data, but I do see that the AI ranks certain suggestions higher or lower based on geography, contrary to my hyperbolic claim of “No matter where on Earth the observation was taken…”. I don’t think the AI is fatally flawed, but still would like to see some more cooperation between people and machines, e.g. maybe in the form of a checkpoint when choosing a taxon not known from the continent, as described above.


Under the current design setting the endemic status or a range will have no impact on the suggestions. It may have other or future benefits, but nothing here. Maybe 35 species of millipede have enough records to have gone through the vision training. Maybe 3 of these are black and yellow in colour. If it gets sent a black and yellow millipede, the suggestions will pretty much always have these, especially given the proposed results list 8 (or is it 10 I can’t remember ) options.


Well then, that there’s a big problem. The problem with machines (even computers) is that they are dumb: they cannot think and reason like people. It sounds like the only way to train the Computer Vision AI to be smarter (offer better suggestions) is to feed it more Research Grade observations (positive reinforcement). I wish there was a way to add negative reinforcement, otherwise it will keep making the same bad suggestions. In the meantime, educating people about what does occur in their area might help.


oh I dunno, people can be pretty dumb too, it’s not fair to single out the machines!


There has been plenty of discussion, as well as an ongoing review by the developers about how it can be improved.

It is far from a simple question. How to determine what lives where (at both macro and micro geographic levels - saying something is found in Canada is useless to me if it is found 5,000 kilometers away from where I saw it), track it and be able to correlate that to an iNat record is not a simple process.

It is also not always the machine that is wrong, there are many cases where the vision suggestion actually proposes the right thing as the top choice, and the user selects something else from the list for some reason.


Hi, I’m from the US, and currently in Ladakh, India (studying plants, etc.). Two brand new users are uploading observations, and the ID’s are all from other continents. Most have been US species suggestions, but there was a New Zealand one as well. There are very few observations in Ladakh, which may be part of the computer algorithm problem, but shouldn’t the app default to lizards, for example, rather than suggesting a US lizard? I know that the local lizards are recorded, because I’ve posted a few of them.

Secondarily, I’ve suggested to one new user the app is wrong, but if he doesn’t look at the web interface, I’m not sure he’s really seeing my comments. I want to be encouraging, but with all the wrong app ID’s I’m not sure it is coming across.


there have been many discussions around factoring location into the AI suggestions. If you go to the topic lists you will be able to find them from the topic titles.

The best thing you can do is have a “cut and paste” message you can put in comments on those observations, basically asking observers to choose AI suggestions carefully, or better still to just put high level taxa as a starting point, eg Plantae for plants or Aranae for arachnids. Then try and enlist expertise on the local fauna/flora to be involved in getting those observations ID’d as accurately as possible, and then there will be that base of IDs to train the AI


Ok, will do. Thanks.


The AI has not yet been taught to factor in location, and it also is only just barely starting to learn to default to higher taxa.

As a result, the AI offers species IDs for organisms all over the world, very often ID-ing them as species from California or NZ. This is very unfortunate.

And yes, if the new users are only using the phone app, and are not going online looking at the website, they will not see your comments.

New users tend to imagine that the AI is completely reliable in its suggestions. Of course they would like to be able to magically get a species-level ID from the AI, no matter where they are in the world, but at this point in time it is better that they just type in “plant” or “bird” or “mollusc” etc.

1 Like

To be 100 percent accurate (and no i dont want to be THAT GUY) it does factor in location in that it will explicitly tell you if the species suggested has been seen near where your observation was. What it does not do is to exclude things which have not.

Because a heavier weight is placed on visual match than seen nearby that drives the suggestions that are not viable based on range.


See: https://forum.inaturalist.org/t/better-use-of-location-in-computer-vision-suggestions/915/2