The present topic is a spin-off of my much (dis)cussed topic here regarding the interplay of CV suggestions and cryptic species. Here I want to focus on (read: complain about) the apparent behavior of the community with respect to the user interface with CV suggestions. Aside from difficult submittals which get a “not confident” admission from CV, currently most CV suggestions come in the form of a “pretty sure” taxon and then a list of “top suggestions”. CV may list anywhere from 2 to 8+ “top suggestions”. I don’t know what the limit is. The problem is that there is no easy way to see how strong or weak the individual suggestions are in CV’s calculation. @pisum and others have graciously tried to show me how to dig into the CV output to examine probabilities of suggested taxa, but that is waaaay beyond the common UI desire/capabilities. The main issue arises because some portion of the user community are in the habit of choosing the first top suggested taxon without investigating the likelihood of that or any other taxon being correct. This is leading to false concurrences and false RG observations from subsequent community behavior. I view these false IDs as a major detractor of the iNaturalist platform as a source of valid natural history information.
Elsewhere I have suggested a couple of minor adjustments to help curb the false ID problem at the CV interface. (1) IF CV is only “pretty sure” at genus or higher level, then it should not even attempt to offer lower level (e.g. species-level) “top suggestions”, that is, there should be some type of muzzle or confidence threshold below which species IDs should not appear. Alternatively, (2) if CV is only confident at a higher taxon level, any pointers to lower level taxa should either come with a severe warning about the potential for error/misuse and/or should explicitly include probabilities for any suggested taxa. This should be overt and stark, equivalent to a “Surgeon General’s Warning” on dangerous products in the U.S. consumer market.
I’m interested in other opinions about this CV/UI problem.
Have you tried the new iNat Next app? It has this feature, in the form of a “dot” rating for each suggestion under the names. Here’s the list of suggestions for one of my pictures, showing that one species suggestion has a 5/5 rating and the others have a 1/5 rating.
There is an extension, which is available (only?) on Chrome, which adds some color-coding to the suggestions. It’s called “iNaturalist enhancement suite”, and it seems pretty slick, but I don’t know precisely what it is doing.
I can say from some preliminary experiments that even when the top suggestion is bright green and everything under it is bright red, the top suggestion can still be wrong, so it would be a mistake to ID solely based on that. But it does seem to add some clarity to what the computer is actually suggesting.
I downloaded and installed the color-coding extension for Chrome. It is an interesting addition, but like you, I don’t know what it is doing. And therein lies the problem. Depending on what it is color coding, this could be misleading users or identifiers into overconfidently accepting erroneous suggestions from CV. I’m already seeing it in action when, for my above-referenced two cryptic moth species, the color-coding extension is offering up a green bar for one of the two and a red bar for the other…even though I’m quite certain that the potential for separating these two is extremely low. As we currently understand the distinctions between the two cryptic moth species in the area of their overlapping ranges, at best CV should only be offering a 50-50 guess or 55-45 based on other confirmed IDs in the area of overlapping ranges (see the previous thread for the details on this framework). The green bar adjacent to a species “top suggestion” invites an overconfident acceptance by the user/identifier.
By contrast, the dot rating mentioned above seems a little more “honest” in that it may not offer five dots for even the “pretty sure” suggested ID, much less for any of the “top suggestions”. I noted one case where the “pretty sure” ID got only 3 dots as did the top suggested species. Would the color-coding extension paint those with green bars because they were the highest rated offerings? Thus the actual behavior behind the color-coding is important to understand.
For CV suggested IDs or the dot system or the color-coding extension, one concern I have is that the CV probabilities of displayed ID suggestions are probably not summing to 100%. Perhaps a five-dot rating or a green bar indicates a near-100% likelihood of a correct ID suggestion. But what if that likelihood is only 90% or 75% or 51%, how are those coded? Or what if, in a large genus with many difficult-to-ID species, the probability for the most likely ID is just, say, 10% and 18 other species are just 5% likely to be correct? Does the ten-percenter get a green bar? Five dots or a green bar offer an alluring bit of nuance, but the potential for misunderstanding and misapplication seems high.
Yes, the question is: Is the extension exaggerating the computer’s confidence, or is the computer exaggerating the computer’s confidence? If it’s the computer doing it, then that’s a CV problem, not something that the interface can correct.
FWIW, for sure it’s not just that the one with the highest confidence is green. There are plenty of cases where all of the suggestions are red. There are also cases where a couple of suggestions are green, or slightly different shades of green, so I think the color value is assigned based on the CV score, independent of the other suggestions.
“For example, computer vision is pretty sure that this golden paper wasp is in fact a Syngamia moth, and the UI is already heavily influencing people to choose that ID. What the colors would add is an addition push to select the species Syngamia florella.”
Ironically, this spells out precisely why I think the “top suggestions” should be severely constrained with warnings attached. IMHO, since those top suggestions are frequently iffy or flat-out incorrect (i.e., <<90% confident, even if they get coded green), I would like to see these suggestions de-emphasized or eliminated from view, not “pushed”. Given the present state of CV’s development, including the integration of geomodels, I think the general user community should only be exposed to the “pretty sure” IDs. I’m sensing too much personal or community inertia towards must-have species IDs such that some types of constraints seem to be in order for the time being. As the training sets enlarge and the geomodel improves, my nervousness about the species-level ID inertia will hopefully be lessened.
i.e., <<90% confident, even if they get coded green
Inasmuch as a high CV score constitutes “confidence”, suggestions colored green by the extension are confident–which does not mean they are correct. The moth example (which is four years old now; the current CV model correctly suggests the paper wasp subfamily, with high confidence) was an issue with the CV model itself, not anything with the UI.
I think the general user community should only be exposed to the “pretty sure” IDs.
The extension simply provides a relative visualization of “confidence” scores, which are otherwise generally hidden from the user. In your case, you might find it useful to know that the model is not particularly confident about any of its suggestions for a given observation, e.g. https://www.inaturalist.org/observations/147931454
Could this be counter-productive? The list of suggestions can also function as a list of similar species to rule out. If there is a list of possibilities and one of them is more confident than the others, I’ll probably pick the more confident one, but at least I am being made aware that the others exist.
In cases where the correct ID is not one of the most confident suggestions, removing it from the list would make it even less likely that people will pick it, because they would have to already know the name of the species in order to manually enter it.
Obviously the computer cannot be blamed for being confused by swans doing yoga poses. But for a human who knows just a little about birds, I think it’s fairly obvious that Mute Swan is the correct ID, even though it appears orange and Trumpeter Swan is yellow-green. (It’s the bill. The bill is orange. This is the most noticeable feature of the bird, and you can see it even in the tiny thumbnails.)
So to me this is a clear case where having the “bad” suggestions in the drop-down is beneficial. Someone who did not know that Mute Swans exist could still figure out that’s a species they should at least consider just by looking at this list.
…and obviously we need more photos of swans doing yoga, in order to train the CV.
@taylorse, I think you are absolutely right in this characterization. It’s a knotty problem in user behavior. Where conscientious posters and identifiers will use the suggestions as a study list, more casual users just pick the top suggestion, thereby frequently leading an ID off in the wrong direction. I don’t see any easy or comfortable solution to this dilemma, even though my own suggestions have come down on the restrictive side. I would truly rather see the “top suggestions” list be used by all users as a study guide, but that’s just not happening and it’s probably not realistic in our very diverse community.
If people don’t know they exist, they probably shouldn’t be adding a species level ID without significant research in any taxa…at the very least in more complex taxa other than birds.
If people are going to do said research to ID any observation, it is more likely to happen outside of the upload screen which is more of a transient space. There is no huge benefit to displaying these autosuggests on upload…they would be used more responsibly if access was limited to elsewhere on the platform.
This is good. But the new app fails miserably to address the issue with autosuggest when there is no “pretty sure of” suggestion - the new interface design makes it even more likely that users will pick a random option from the dropdown than the previous design or browser…as to choose a broader coarser level you have to go to a separate screen via the magnifying glass. It would be better to retain the species look up as on the previous app.
This gets into a question of what, fundamentally, iNaturalist is for.
On the “about” page it says:
“Mission: iNaturalist’s mission is to connect people to nature and advance biodiversity science and conservation.”
Part of how iNaturalist connects people to nature is by empowering average people to identify things. If you have to do significant research, most people won’t identify anything. (And we don’t need iNaturalist for that. Most people were already not identifyng anything.)
A lot of work is going into developing the computer vision if its primary purpose is to support people who already have done significant research. The person in your scenario does not need a list of suggestions. But for someone who is trying to learn, it’s a powerful tool. It can be a better powerful tool, and I’m much more interesting in working on that than in limiting who is allowed to use it.
Edit: Personally, I will not make a confirming ID to take an observation to research grade unless I understand how that thing is identified. I think this is the best approach for agreeing with IDs. I won’t use the CV for that. However, for making a preliminary ID, it can be very useful.
The alternative is to leave everything you don’t know about in “Unknown”, which is a great way to make sure that nobody who knows more than you ever sees it. Putting in an incorrect ID that is at least in the right ballpark gets it in front of people who can help you.
To me, it’s much more important that we work together to get things identified than that every single individual identification is correct. People make mistakes, and iNaturalist’s system for finding and correctng those mistakes is one of its great strengths. Being afraid of making mistakes is a barrier to learning.
In case that was too long for anybody to read, here is the short version:
I think that the goal is not so much to prevent people from making mistakes as to help people to make more useful mistakes.
If somebody uses the computer vision to make an initial ID which is incorrect, what we want is for that ID to be close enough that somebody who can correct it can find it.