Species Suggestions for the Wrong Continent

robotpie · March 15, 2019, 2:54am

Speaking from the perspective of the Hong Kong uploads, I have noticed a good portion of the Lepidoptera uploads (mostly excluding Papilionoidea) contain ID’s for species not found in Hong Kong. Usually they would be some moth found in the USA. This issue typically seems to arise from the iNat AI in the visually similar part in “suggest an identification”. Typical red flags for me would be just one location pin in Hong Kong and a bunch of other pins elsewhere (but then again one would have to take into account how cosmopolitan the species is). Usually I would send a comment in the lines of “when making an ID please consider the distribution of each species”, but likely to no avail.

For something like moths, many species look so similar in terms of their morphology, I wonder how useful species suggestions are if the underlying algorithm is not optimised enough. In that case this issue may stem from the (because I can’t think of the right word) coding side, in that the code have not factored in the species distribution to make a more informed ID suggestion. Then again, coding this will be extremely challenging.

On the other hand, this may be an issue of how diligent users are in terms of ID’ing their species. Sure, if we want to prevent this from happening we could make the user experience more restrictive (eg. make location mandatory so that the algorithm can then restrict itself into providing ID suggestion for that region), but I wonder how effective that may be in the long run.

Edit: just found an example of what I meant.

mreith · March 17, 2019, 3:55am

Uploading files from a developing country, where internet is at times slow and patchy is sometimes a really slow process … I think this might be largely due to functions like the picture based species suggestion process. Example: Tried to upload pictures of Consolea (Cactaceae) lately … typing Consolea in the name field … didnt stick … again … again … after 15 minutes i get the message from the website … we are sure this is genus Opuntia … and yes they look very similar, but a Consolea is (currently) not a Opuntia. So the computer based guess did cost me 15 minutes for a single observation upload and it was worse than my own guess. I have more examples like this … but its maybe boring to read. I would love to have a upload site, where this data eating nonsense is minimized and where i just can upload stuff!!

charlie · March 17, 2019, 1:36pm

turn off auto sync. the auto sync feature of the app breaks the app for anything other than the most casual use. If you go to settings and turn it off you can then just upload when you have a wifi connection, tell the app to upload all and then go do something else.

bouteloua · March 17, 2019, 2:19pm

I think @mreith is referring to the website. I started a feature request here:
https://forum.inaturalist.org/t/toggle-computer-vision-suggestions-on-off-on-website/1117

charlie · March 17, 2019, 2:21pm

ah, i never really noticed it being slow on the website, i think because i usually add things via the app. Things have been slow lately though… that makes sense.

mreith · March 17, 2019, 11:31pm

thanks a lot! will try this!

robotpie · March 21, 2019, 5:06am

OK to add on to my previous post here: I think it may be beneficial to at this moment suppress the “visually similar” feature in the “Suggest an Identification” section. I feel it much better (please voice your opinions) if one keeps the ID simply to family level, or to class etc rather than to take advantage of that feature and key in a wrong species ID. And especially with the CNC coming up in a month’s time I feel it really hinders the IDing workflow and one might end up with fallacious results etc.

EDIT: Well, wrong ID is still a wrong ID…

cosmophasis · March 21, 2019, 10:50am

I don’t think taking away the feature is a good idea as many are attracted to this feature, though I don’t have any reasonable alternative to fixing this problem. However, I agree that this can become quite a big problem during events such as CNC (both the international one and a local one we just had in HK). There are so many mindless IDs such as reptiles being IDed as mammals, moths as beetles etc. Many of these observations with bad IDs are lost in the enormous surge of observations, where they will never be corrected and many 'unknown’s are also lost. I don’t know if this problem will ever be solved but I would also like to raise awareness to this issue, as we all know iNat can be an invaluable tool in providing scientific data.

charlie · March 21, 2019, 11:55am

at this point i don’t think we are going to get rid of the algorithm. Nor should we. But it’s fully integrated in the site and it’s getting better over time. Deleting it will just piss people off. Before the algorithm people would just add random nonsense IDs on their own anyway.

sullivanribbit · March 21, 2019, 1:38pm

I have lost track of whether this was already brought up, but one possibility is to tweak the algorithm/scoring system such that IDs chosen from the computer vision “visually similar” suggestions require a higher standard of confirmation.

andrewgillespie · March 21, 2019, 5:10pm

Going back to the title of the post, is it really logically correct to say that there is a wrong continent? You can’t say that because nobody has observed a species in a place, then it will never be observed there. Besides species that are self mobile, like Sun fish, or birds, humans move things all over the globe. Sometimes they purposefully move species, Pangolins for example, but some species hitch a ride. And then there is the possibility of climate change causing species to change their natural range, like Polar bears.

charlie · March 21, 2019, 5:15pm

yes in a sense, but also, a new discovery on a new continent is very rare and requires extraordinary evidence. In the least, i think it is reasonable to require more than an algorithm ID at this point in time. The chance of the algorithm mis identifying something out of range are much, much higher (happens 1000s of times already) than the algorithm nailing a new species introduction on a different continent (maybe never has happened yet?) It’s not unreasonable to require verified community ID to add such a discovery.

kiwifergus · March 21, 2019, 5:27pm

Also, the algorithm is basing it’s ID on a data pool containing mis-identified observations!

Perhaps rather than limiting suggestions to geographically present taxa, a better option might be to not make suggestions for locations that don’t have a reasonable amount of observations. When a location reaches a determined number of (human) identified observations, then the CV/AI can become active as it will have a good stock of accurately IDd observations to work from. Or, and probably better still, it could start in a new area with only higher level taxa suggestions. Maybe each taxonomic level could have a threshold of how many observations are near before it can be allowed to suggest them…

andrewgillespie · March 21, 2019, 5:29pm

That is very reasonable. I suppose that location does affect probability.

jdmore · March 21, 2019, 7:25pm

Yes, however we get there algorithmically (spell check doesn’t like that one…), I think a well-functioning CV system should not be offering highly improbable suggestions. Ones that much more often than not would lead users down the wrong path to an ID.

jimjohnson · March 22, 2019, 10:54pm

I don’t think anyone has said that. Of course it is possible for a species to genuinely appear on a “wrong” continent, for whatever reason—it has happened before, and it will happen again. But it is rare, and every case I have looked at among my areas of expertise were incorrectly identified, and those were nearly all based on computer vision suggestions. I spend a lot of time chasing those down in the hopes that they won’t contribute to more incorrect (or we can call them highly unlikely, if you prefer) suggestions in the future.

andrewgillespie · March 25, 2019, 8:00am

Not explicitly, but it is implied. However, I think my question has been answered. The rarity of the event means that the algorithm should give a lower probability. Or in more plain language: it should appear lower down the list of suggestions.

annemirdl · March 25, 2019, 11:06am

There are other possibilities than the computer vision that observations appear on the wrong continent. For example Hylesia nigricans in Mexico (North America). This is a moth endemic to Argentina and Brazil (South America), but there is an error in the Spanish wikipedia site for this moth showing a totally wrong picture of the caterpillar in Mexico. Whenever someone searches the internet for Azotador, the search comes up with Hylesia nigricans.

susanhewitt · March 25, 2019, 12:10pm

I very much agree with robotpie’s idea that with the CNC coming up, if a lot of the suggested IDs from CV/AI could be limited to family level, or even in a lot of more dfficult cases to class level, that would really help a lot. However, I don’t think the AI/CV is currently capable of giving out higher-level taxon IDs like that. At the present time, it has not been trained to do that.

It is a common error in inexperienced humans to think that it should be relatively easy to ID any organism all the way down to the species level, even from a photograph. More experienced taxonomists know that it is often the case that you cannot go beyond the genus or family level.

Because the AI is currently mostly delivering species level IDs, it tends to reinforce this human prejudice that anything can be ID’ed to species level with relative ease, so beginner users are naive about the suggestions offered – they do not think they need to stop and check each one to see what is reasonable – they expect to be able to just use the top suggestion.

I also really don’t like that the AI intro says “We think this is…” because new users imagine that the combined intelligence and knowledge of the entire corpus of iNat staff and users has gone into the selection of those suggested IDs. Let’s face it – the AI results are currently not reliable for most of the world and for most invertebrates.

It would be nice if the AI could say fairly often, “I don’t know” or " I have not yet had sufficient training on this group".

Why does it say “we” anyway? It does not speak on behalf of all of us.

kiwifergus · March 26, 2019, 5:57am

I think part of the problem is that the CV/AI can only recommend the best matches based on past identified images, and if those previous IDs were wrong, then it will be wrong again!

Topic		Replies	Views
ID Tool and Persistent Errors General	5	484	October 18, 2019
How To Tweak or Give Feedback on Problematic Species Suggestions? General	4	844	February 14, 2020
People making wrong suggestions General	127	6072	May 2, 2024
Computer IDs from a continent away General	2	639	May 16, 2019
Completely inaccurate species suggestions Bug Reports	36	1273	May 15, 2024

Species Suggestions for the Wrong Continent

Related topics