Problems with Computer Vision and new/inexperienced users

Why? It suggests tons of correct species.

4 Likes

Of the few hundred UK Diptera I checked yesterday, almost all incorrect CV use was of species level IDs. Most family level or genus level IDs were fine. The CV can place in the right ball park. But can´t reliably place to species in most taxa, as it just doesn´t have sufficient training data.

So I agree with @stu_crawford - this ID shouldn´t be used to contribute to RG data.
It could be used as a powerless placeholder… but it shouldn´t be a contributing factor as it is atm.

It would be throwing the baby out with the bathwater not to have these suggestions visible somewhere.
They just shouldn´t be encouraged to be used blindly as the current user interface does, whilst the CV IDs retain the power they do. For me, whilst the CV IDs retain the power they do, the autosuggest should only ever present “pretty sure” of suggestions… and ideally even then these would be restricted to genus or higher. The species-level IDs should be less accessible, hidden from initial upload view. Especially in an app(!) You cannot evaluate species level IDs in complex taxa whilst on a phone.

It´s also significantly more work for identifiers to fix incorrect initial species level IDs than it is to narrow down a coarse ID. In addition, I have the impression it´s frustrating and confusing for newer observers to have their initial ID “disagreed” with so much - it´s a negative interaction. Finally, as an identifier it´s often more difficult to disagree with species level IDs, as they require more research or confidence - leaving many species level CV IDs simply with no further interaction whatsoever - which is again, not ideal from the observers POV.
It´s a lose-lose situation.

An interface which promotes and encourages coarser, safer initial IDs would be a win-win.

5 Likes

I think the problem here is that it would still unfairly punish the people who use CV suggestions to speed up their IDs. I otherwise quite like the idea though. Could we something like this, but in such a way that there was some setting to turn it back on? Preferably not something too easy to find (or with a large disclaimer if you want to re-activate it), but something so that the people who use the CV in a desirable way can continue to do so

2 Likes

I’m in the same boat as @fffffffff. I use CV suggestions to quickly add a (correct) ID to something I’m familiar with. Saves a lot of typing. On the flipside, I’ve built up such a strong aversion to accepting IDs for species groups I’m not familiar with, that just the contemplation of accepting a CV suggestion usually sends me off into deep dive into literature. And being retired, I have the luxury of the time (and sometimes the interest) to do just that. But perhaps 99% of iNat users are not in my situation.

That said (and I haven’t yet waded through all 125 posts here yet), my strong suggestion would be to constrain CV suggestions to just the genus or even family level except for the most common, most easily identified species (Monarch, Striped Skunk, etc.). The latter criteria could be established by some type of 99%+ rule of prior CV performance. In other words, don’t let CV guess. Don’t even show suggestions unless they are absolutely solid. And let, “I don’t know!” be a common and unapologetic part of the response set. Perhaps when a family or genus is offered, some algorithm might add a short list like, “The following are the 5 most common species in this family/genus in this area/state/province:”.

9 Likes

The bottom suggestion is a leafhopper, but the rest are all flies (all in zoosubsection Acalyptratae actually). 2 are in family Lauxaniidae, the other 5 are in superfamily Tephritoidea. Visually I can see that it’s pretty clearly Tephritoidea and probably Tephritidae just intuitively from having seen images of lots of them. I know the CV is able to suggest family-level, but doesn’t as often as it should. I don’t know if it knows about those higher levels. It’s good that it says it’s not confident enough to take it to species, but it really should be able to confidently take it to a higher level here I think.

4 Likes

What platform did you run it on ? iOS ? I dont appear to get the same suggestion list. First off, I only get 6 (5 species and 1 genus) suggestions, not 8. While I do get a leafhopper, it is not last in the list but rather 4th

Oh, I had the “Include suggestions not seen nearby” option on. Is that choice sticky? The nearby suggestions match what you described, and are less clear (although still mostly Tephritoidea).

1 Like

Yes it is.

2 Likes

So last night I was reviewing some observations in needs ID and by chance I happened to catch a specific wild onion species ID’d as a knapweed, which it looks absolutely nothing like; ‘purplish flowers’ is about the extent of the similarity. I know the user didn’t use computer vision, 1.) because there was no sparkling shield icon and 2.) because if they had used CV it would have ID’d it to a specific quite similar looking onion species. I know that because the correct species is not in the CV model yet, but thanks to a concerted effort to clear mis-IDs out of the incorrect species, the correct species now has enough RG observations to be included in the next training run.

Had the user just clicked the incorrect CV suggestion, it would have quickly been knocked back to genus by one of the users monitoring the taxon for that exact error, and then likely relatively quickly got to the correct species, whether or not the user agreed to the correct ID. Instead, when I entered the correct ID it knocked it all the way back to angiospermae. Now, unless I tag in someone or the user changes their ID, it will probably have to wait for two people combing through angiosperms who don’t know the species but are confident enough to ID to genus allium, then two more people who know the species agreeing to get it to RG on the correct species.

In other words, but not incorrectly clicking the wrong CV ID and instead entering something totally unrelated, the user has likely made it take 5 correct IDs to get to RG instead of 3, 2 of them slow-to-find genus-level IDs from angiospermae (unless they withdraw their id). So it would have been preferable if they had chosen the incorrect but close CV ID.

9 Likes

This only works because there are identifiers like you to monitor those taxa though. For all the groups that don’t have identifiers, there’s nobody around to correct these mistakes and the observation will sit uncorrected until someone realises there’s a mistake. An incorrect ID, whether it’s something slightly incorrect from the CV or something hugely incorrect from a user guessing, is still going to sit there uncorrected until someone reviews it. And if nobody has enough knowledge to know that the ID is wrong, then it will sit there forever. So in a case like that it would be better to have something wildly incorrect, because then someone is more likely to notice that it’s incorrect and bump it back up to a higher level that’s (hopefully) correct. Yes, it’s annoying when you need to have 5 IDs to get something to Research Grade, but in my opinion at least it’s more detrimental to have thousands of observations sitting uncorrected for ages and ages with nobody to know that the CV is wrong. There’s always going to be a similar problem with users guessing IDs and getting it wrong, but we can’t really hope to change human behaviour much, and when users guess an ID there’s no reinforcement there telling them that they’re correct. People trust the AI to be correct, and it often is correct, but new users especially put far too much faith in it being correct, and that’s something that we can easily madify by changing the wording or interface

You can also follow an observation without Iding by using the tab shown in this screenshot.

4 Likes

This is especially problematic with tropical species, where the majority of species don’t have CV images and are typically quite difficult to identify, causing users to resort to what the CV suggests because they think it’s better than leaving it as “unknown”

I think the solution would be to have a prompt that encourages users to prioritise giving an ID that’s coarse (eg. to family or order) over whatever the CV suggests (CV suggestions should be low on priority imo)

6 Likes

Yes the more I think about it the more I think that this sort of thing is going to be the only reasonable solution. Because even if the AI is correct in most cases, we still have situations like the last one in my original post where the user new what the correct ID was and still selected the incorrect CV option.

1 Like

I have some very bad news about the global socioeconomic system and the academic system.

8 Likes

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.