Computer Vision should tell us how sure it is of its suggestions

Thank you! Even a someone else’s guess helps me have a perspective.
Next “easy” question…if I don’t want to immediately provide my id then I think there is a keyword I can use that means I’ll come back to it. I saw something to the effect of this when I was reading some instructions as it applies to offline mode. But I can’t find that material again…so what do I enter to submit an observation but I am not currently ready to even give an id guess?

And what about the observations with multiple pictures? I’m assuming you mean the confidence score to be shown for the primary photo. When I have multiple photos, I usually check by making each one the primary photo and seeing what the computer vision says. I look for repeats to help narrow it down. But in the end, I try to pick the photo that shows most of the plant, as that is what will be the thumbnail. Close ups of plants seem to work better for the accuracy of the computer vision, but some humans don’t want to look at just a close up (usually of a flower) and nothing else. I could be wrong, but I know I get frustrated at looking at observation after observation of only the flower (which, for Ranunculus is not very helpful), so sometimes will I skip over those. So, point is the CV may only be 50% sure for the primary photo, but 75% for a close up of the leaves (and maybe only 10% for a stem shot). The data may not be as useful as one would think.

3 Likes

@Rboinco, there is a thing called “placeholder text” which is what you can write in that will not count as an ID. However, I think that puts things into the “unknown” bin which is not very useful. I would put a guess in that you KNOW to be true…like insect, fungi, mammal, plant, etc. Then others can help narrow it down. If you have a guess or guesses, put them as a comment.

4 Likes

should be a new topic :)

You could put a tag, something like “revisit”, then when you are ready you can search for your observations that have that tag. I use chrome bookmarks folder to manage my “come back to this later” ones.

And there is nothing inherantly wrong with letting the community have a go in the mean time. Sometimes IDs are easy for some people who deal with those things all the time. As an example, I do my identifying in two passes… the first pass is as they turn up in the “needs ID” pool, and I only ID those I can be reasonably sure from the photo without having to pick up a book/reference. Then when I have time, I go back through old observations and look to ID from resources/books. There is no point in spending 10-20 minutes hunting through books if someone else can rattle off the ID in seconds. After it has been in the needs ID pool for longer than a week, then it is more likely to be one that needs looking up (depending on the taxa/difficulty of course)

1 Like

Hopefully without straying too much off topic, what about the ability to add an (optional) bounding box to photos? In theory, this could be used to restrict AI training to a certain part of the image, and could be visible to help identifiers work out what to identify.

I’ve often used MS Paint to circle an organism in an image - being able to do this natively would be useful.

4 Likes

see https://forum.inaturalist.org/t/draw-bounding-box-on-photo-for-computer-vision/2554

2 Likes

And remember placeholder text disappears when an ID is added. Better to leave the text in a comment.

3 Likes

One simple and unobtrusive way you could make the confidence data available would be to add it as a title attribute to the “Visually similar” span in the suggestion interface. That way you could see the confidence score by hovering over the text. It would be a subtle addition, but at least it would make it available for folks that were interested.

3 Likes

It’s the words “pretty sure” that adds false confidence to the Algorithm ID.

I particularly like this “fig”

Elephant%20fig

6 Likes

Some relevant comments from Ken-ichi:

6 Likes

even if we don’t call it confidence or probability, i think it would still be useful to see the “score”.

6 Likes

Presumably one could make a relative scale, where the first choice would be one (or zero), and the following ones would be fractions of one (or negative numbers). This could illustrate the differences between the 10 options. Does one stand out, or are they basically all equally likely?

1 Like

I would be concerned that many users, especially new users, would misinterpret any displayed numeric value as some level of relative confidence. I concur with @kueda that some would overly rely on the *cryptic opinion of this black box" and not their own judgement. Student users would be one group of new users who might overly rely on displayed values.

4 Likes

Just to argue with myself, maybe including such numbers would allow people to make better choices among the black box outputs. Counter counter argument: the problem is over-confidence in a numerical rating when the right answer isn’t even on the list. Counter counter counter argument: that’s already a problem. Counter counter counter counter argument: we’re all doomed and should be spending our time learning post-apocalyptic survival skills.

16 Likes

On it: https://www.inaturalist.org/projects/search?q=edible

5 Likes

i think the scores are basically a measure of visual match and possibly some other factors like presence of observations of a taxon nearby. i think that’s why iNat staff don’t want people thinking of it as a probability or confidence.

so, for example, suppose you have 3 brothers A, B, and C. A & B are identical twins. suppose you take a picture of A and run it through a computer vision algorithm similar to iNaturalist’s. i would expect that CV to return scores that might be like this:
F: 0.97 – the family of A, B, & C
A: 0.95
B: 0.93
C: 0.65
D: 0.35 – D is the boy who lives next door.

so obviously, the CV couldn’t be both 95% sure the photo was of A and also 93% sure that it was of B, nor would it make sense to assign 95% probability of A and at the same time assign 93% probability of B, but by seeing the relative scores, you could see that the CV was saying that A and B were way better potential matches than C or D.

5 Likes

Any interest in a browser extension which translates the computer vision scores into a red-green scale? I slapped together a prototype today after reading this thread; here are some examples:

(Obviously) I am not a designer, but it does give a rough idea of what is possible with a purely client-side solution. If there’s interest, I’ll publish it and put the code on GitHub so others can help refine it.

6 Likes

What would be the thresholds for each color (is green 80% - 95%, or 70% to 95%)?

Does red sort of imply wrong so implicitly in our current visual lexicon that people don’t even consider the other suggestions (sometimes the right suggestion is further down the list)? I’d assumed that’s why explicit disagreement is an orange button instead of a red one, but I could be reading too much into that.

Would the colors be adjusted by the user for accessibility (r/g color blindness, etc)?

I’m not saying it’s a bad idea; these are just the first things that popped into my head when I looked at the pics (especially red = stop).

5 Likes

What would be the thresholds for each color (is green 80% - 95%, or 70% to 95%)?

It’s just the computer vision score, which is [0,100], scaled to [0,120] and used as the hue in HSL.

Does red sort of imply wrong so implicitly in our current visual lexicon that people don’t even consider the other suggestions (sometimes the right suggestion is further down the list)?

Red-green was just the first gradient that occurred to me. I don’t have an informed opinion about whether red in particular imparts a bias or, if so, whether that bias is relevant given that the extension would be opt-in.

Would the colors be adjusted by the user for accessibility (r/g color blindness, etc)?

This is a good point, and taken with your second one argues for a different color gradient, or at least a color-blindness mode.

In general, whether it’s color-coding or just exposing the raw number, as OP requested, I think it’s nice to be able to distinguish between different the different cases illustrated by my examples. Then again, I confess that in addition to not being a designer, I’m also not a statistician, and thus @kueda’s comment above is a bit lost on me:

I’m told the score should not be considered a metric of “confidence” or “probability” and it should mainly be used for ordering outputs

i.e. I’m unclear on whether the magnitude of the differences in scores is relevant in any way.

2 Likes

i think this is great as a proof of concept. i wonder what this would look like implemented as a color gradient that transitions to pure white by 70% of the width of the box (or maybe hsla that drops the alpha to 0 by 70% the width of the box)? or maybe more simply as a colored left border (or maybe left and bottom border)? maybe a gradient with variable points at which it transitioned to white (effectively creating something like a bar chart) could solve the color blindness problem?

then the main unresolved problem would be the problem of the situation where the real answer isn’t actually in the list. hopefully those kinds of situations would more often look more like your bottom example with lots of red than your top example above with bright green choices…

i think 0-120 hues on hsl is a good choice for color scales. on a scale of 0 to 1 then, 0 would be red, 0.25 would be orange, 0.5 would be yellow, 0.75 would be yellow-green, and 1 would be green. (i’m a little surprised to see so much red in the examples above, but maybe i shouldn’t be. this is exactly the kind of truth i was hoping could be revealed by something like this though.)

3 Likes