Computer Vision should tell us how sure it is of its suggestions

even if we don’t call it confidence or probability, i think it would still be useful to see the “score”.

6 Likes

Presumably one could make a relative scale, where the first choice would be one (or zero), and the following ones would be fractions of one (or negative numbers). This could illustrate the differences between the 10 options. Does one stand out, or are they basically all equally likely?

1 Like

I would be concerned that many users, especially new users, would misinterpret any displayed numeric value as some level of relative confidence. I concur with @kueda that some would overly rely on the *cryptic opinion of this black box" and not their own judgement. Student users would be one group of new users who might overly rely on displayed values.

4 Likes

Just to argue with myself, maybe including such numbers would allow people to make better choices among the black box outputs. Counter counter argument: the problem is over-confidence in a numerical rating when the right answer isn’t even on the list. Counter counter counter argument: that’s already a problem. Counter counter counter counter argument: we’re all doomed and should be spending our time learning post-apocalyptic survival skills.

16 Likes

On it: https://www.inaturalist.org/projects/search?q=edible

5 Likes

i think the scores are basically a measure of visual match and possibly some other factors like presence of observations of a taxon nearby. i think that’s why iNat staff don’t want people thinking of it as a probability or confidence.

so, for example, suppose you have 3 brothers A, B, and C. A & B are identical twins. suppose you take a picture of A and run it through a computer vision algorithm similar to iNaturalist’s. i would expect that CV to return scores that might be like this:
F: 0.97 – the family of A, B, & C
A: 0.95
B: 0.93
C: 0.65
D: 0.35 – D is the boy who lives next door.

so obviously, the CV couldn’t be both 95% sure the photo was of A and also 93% sure that it was of B, nor would it make sense to assign 95% probability of A and at the same time assign 93% probability of B, but by seeing the relative scores, you could see that the CV was saying that A and B were way better potential matches than C or D.

5 Likes

Any interest in a browser extension which translates the computer vision scores into a red-green scale? I slapped together a prototype today after reading this thread; here are some examples:

(Obviously) I am not a designer, but it does give a rough idea of what is possible with a purely client-side solution. If there’s interest, I’ll publish it and put the code on GitHub so others can help refine it.

6 Likes

What would be the thresholds for each color (is green 80% - 95%, or 70% to 95%)?

Does red sort of imply wrong so implicitly in our current visual lexicon that people don’t even consider the other suggestions (sometimes the right suggestion is further down the list)? I’d assumed that’s why explicit disagreement is an orange button instead of a red one, but I could be reading too much into that.

Would the colors be adjusted by the user for accessibility (r/g color blindness, etc)?

I’m not saying it’s a bad idea; these are just the first things that popped into my head when I looked at the pics (especially red = stop).

5 Likes

What would be the thresholds for each color (is green 80% - 95%, or 70% to 95%)?

It’s just the computer vision score, which is [0,100], scaled to [0,120] and used as the hue in HSL.

Does red sort of imply wrong so implicitly in our current visual lexicon that people don’t even consider the other suggestions (sometimes the right suggestion is further down the list)?

Red-green was just the first gradient that occurred to me. I don’t have an informed opinion about whether red in particular imparts a bias or, if so, whether that bias is relevant given that the extension would be opt-in.

Would the colors be adjusted by the user for accessibility (r/g color blindness, etc)?

This is a good point, and taken with your second one argues for a different color gradient, or at least a color-blindness mode.

In general, whether it’s color-coding or just exposing the raw number, as OP requested, I think it’s nice to be able to distinguish between different the different cases illustrated by my examples. Then again, I confess that in addition to not being a designer, I’m also not a statistician, and thus @kueda’s comment above is a bit lost on me:

I’m told the score should not be considered a metric of “confidence” or “probability” and it should mainly be used for ordering outputs

i.e. I’m unclear on whether the magnitude of the differences in scores is relevant in any way.

2 Likes

i think this is great as a proof of concept. i wonder what this would look like implemented as a color gradient that transitions to pure white by 70% of the width of the box (or maybe hsla that drops the alpha to 0 by 70% the width of the box)? or maybe more simply as a colored left border (or maybe left and bottom border)? maybe a gradient with variable points at which it transitioned to white (effectively creating something like a bar chart) could solve the color blindness problem?

then the main unresolved problem would be the problem of the situation where the real answer isn’t actually in the list. hopefully those kinds of situations would more often look more like your bottom example with lots of red than your top example above with bright green choices…

i think 0-120 hues on hsl is a good choice for color scales. on a scale of 0 to 1 then, 0 would be red, 0.25 would be orange, 0.5 would be yellow, 0.75 would be yellow-green, and 1 would be green. (i’m a little surprised to see so much red in the examples above, but maybe i shouldn’t be. this is exactly the kind of truth i was hoping could be revealed by something like this though.)

3 Likes

Great ideas. Here are a few examples based on them. Transition to white at 70% width:


At 90%:


At 100%:


Left border bar:

then the main unresolved problem would be the problem of the situation where the real answer isn’t actually in the list. hopefully those kinds of situations would more often look more like your bottom example with lots of red than your top example above with bright green choices…

I think this is already a problem, given the “We’re pretty sure…” text (in fact, I don’t understand why that feature exists at all if what @kueda said above about the score only being good for ordinal ranking among suggestions is true). For example, computer vision is pretty sure that this golden paper wasp is in fact a Syngamia moth, and the UI is already heavily influencing people to choose that ID. What the colors would add is an addition push to select the species Syngamia florella:


I don’t know how much that matters.

5 Likes

then the left border bar option - I prefer to be able to read all the text on white, not against colour. The species names and Nearby need to be obvious.

Or with the colour gradient on the far right, where it is simply View all the way down.

4 Likes

I also like the left side bar…and then all that extra blank space between the end of the taxonomy name and the “view” could hold a probability as in @psium 's previous post.

4 Likes

The Chrome extension is published, and the code is on GitHub. You can choose between the two different display modes (sidebar vs. gradient), as well as a color-blind mode that changes the range from 0->120 to 240->120 on the hue spectrum.

I’ll work on a Firefox version as well.

8 Likes

Thank you - I will try it.

I like what @psium suggested. Perhaps it can be a % similarity [to other photos of which are RG]

e.g.
98% similar to [photos of] genus A
92% similar to [photos of] species X
91% similar to [photos of] species Y
75% similar to [photos of] species z
etc.

With an option to view the top ~10 ranking IDs in the Identotron on a separate tab, this will also show where the nearest observations have been to try and eliminate Australian species suggestions for African observations.

Identotron: https://www.inaturalist.org/observations/identotron?observation_id=49603263&taxon=51468#establishment_means=&order=&place=97392&taxon=51468

I agree, people would tend to read the higher “match” percentage the wrong way and jump on the identification without really looking closely at it.

This is already a bit of an issue (sometimes I even have to stop myself from doing it).

I think it’s best if the AI suggestions are left general. I like the “pretty sure” wording as well, that reminds people that it’s not a certain ID.

3 Likes

If you need some book references, hit me up…

3 Likes

I’m new to this community, so I won’t weight much here, but I think it might be a bad idea.
We have some Formicidae screenshot, they are a good example of why this is a bad idea.

Currently the CV is very bad at suggesting and identifying ants. I don’t recall the algorithm ever suggesting the right species as its favorite choice. I even had “we are pretty sure it is a [genus]” on an insect from a totally different order. Also almost every black ant are suggested as “Camponotus” Because they are all black, 6 leg 2 antennae 1 head one thorax one “wasp waist” and a big gaster. Most don’t have striking pattern like butterflies or birds can have for example.

Such a feature would give a false sentiment of confidence, as stated by other members. But it is already the case with “We are pretty sure it is”.

But worse, a red highlight would give the impression that it is unlikely the correct ID is actually among these “uncertain” suggestions.

Simply put : it’s a confusing system, especially for beginners.

3 Likes

yes. i think that’s exactly why showing the underlying scores would be enlightening – because the #1 choice in a list of bad choices is not the same as the #1 choice in a list of good choices. so if you can see the actual scores, you have better insight into whether you’re being presented good choices or if you’re being presented bad choices.

just for example, here are 3 of my own ant observations, along with the actual computer vision scores:

  • https://www.inaturalist.org/observations/43079992 (needs ID as Myrmicine Ants, possibly Bicolored Pennant Ant):

    we’re pretty sure it’s in this family:

    • (99.9) Ants (Formicidae)

    our top suggestions (combined score, vision score):

    • (40.2, 53.4) Dolicohoderus genus
    • (10.2, 4.5) Bicolored Pennant Ant (Tetramorium bicarinatum)
    • (5.6, 7.4) Acorn Ants (Temnothorax genus)
    • (4.6, 6.1) African Big-headed Ant (Pheidole megacephala)
    • (4.4, 1.9) Graceful Twig Ant (Pseudomymex gracilis)
    • (4.2, 1.8) Florida Carpenter Ant (Camponotus floridanus)
    • (3.2, 4.3) Arboreal Bicolored Slender Ant (Tetraponera rufonigra)
    • (3.2, 1.4) Crematogaster laeviuscula
  • https://www.inaturalist.org/observations/20826798 (research grade Red Imported Fire Ant / Solenopsis invicta)

    we’re pretty sure it’s in this family:

    • (100) Ants (Formicidae)

    our top suggestions (combined score, vision score):

    • (50.0, 30.5) Forelius genus
    • (8.3, 15.4) Asian Weaver Ant (Oecophylla smaragdina)
    • (6.0, 11.1) Mediterranean Acrobat Ant (Crematogaster scutellaris)
    • (5.7, 3.5) Argentine Ant (Linepithema humile)
    • (5.5, 10.1) Azteca genus
    • (3.5, 2.1) Crematogaster laeviuscula
    • (2.6, 4.9) Tropical Fire Ant (Solenopsis geminata)
    • (2.6, 4.8) Yellow Crazy Ant (Anopolepis gracilipes)
  • https://www.inaturalist.org/observations/44126816 (research grade as Eastern Black Carpenter Ant / Componotus pennsylvanicus):

    we’re pretty sure it’s in this genus:

    • (77.9) Carpenter Ants (Camponotus)

    our top suggestions (combined score, vision score):

    • (64.3, 61.3) Shimmering Golden Sugar Ant (C. sericeiventris)
    • (7.0, 1.8) Eastern Black Carpenter Ant (C. pennsylvanicus)
    • (6.9, 7.3) Giant Turtle Ant (Cephalotes atratus)
    • (4.0, 4.3) Eciton genus
    • (2.6, 2.8) Bullet Ant (Paraponera clavata)
    • (2.0, 2.2) Diacamma genus
    • (1.9, 2.0) Giant Forest Ant (Dinomyrmex gigas)
    • (1.8,1.9) Hairy Panther Ant (Neoponera villosa)

if using sessilefielder’s red-to-green gradient, remember that:

so most of the “top suggestions” above would be red to yellow, whereas the “we’re pretty sure” suggestion would be more green. hopefully in such cases, that would push most folks to select the green rather than the yellow or orange, if they were simply choosing blindly based on the system’s suggestions.

i also think if people could see that, say, bird suggestions tend to be very green, while, say, spider suggestions tend to be very red, then they would also be much more careful about relying on the computer vision for spiders.

or if they see two equally green birds suggestions, they might pause for a moment to consider why both are equally green before just blindly selecting the first choice.

of course, computer vision suggestions will never be perfect. there will always be mistakes, but i think showing the computer vision scores will help reduce (rather than increase) the likelihood that the community will adopt those mistakes.

8 Likes