Add a "visually identifiable" scale parameter to set user expectations on confidence of computer vision IDs

AI suggests this is XY

We need to let new people know not to auto-agree
(do you know this is XY? Then click agree)

iNat is pretty sure - sounds definite and unambiguous. This IS what it is, no doubt about it!

4 Likes

Right - Iā€™m not saying to prevent selecting it. I was really thinking along the lines of ā€œsetting expectations with the observerā€ to help them understand when a visual ID is actually quite difficult and unlikely unless they are an expert. Iā€™m not a UI/UX person, but Iā€™m confident appropriate steering/wording could be done.

I agree that this would be a challenge. One example of this I read recently that leaf miners, for example, are often easier to identify by the host plant and mining pattern than visually looking at the insect directly.

But, interestingly, it seems like the AI determines if it thinks an insect is a nymph or adult when making the suggestion. It would be good to surface those assumptions. like ā€œiNat thinks this is a nymph Milkweed Beetleā€. That would provide the users with more insight into the suggested ID and hopefully give them pause when it is clear that the observation is not a nymph, for example.

A possible solution for this would be to seed a tuple data structure, i.e. {taxonomic_level, life_stage}, and leaving unseeded tuples as it works today, and providing the extra insight only on seeded tuples. This gets a little more complex from a database perspective, so another idea would be to only seed in cases where ID challenge is consistent across all life stages - i.e. taking a minimally viable starting point from which you can then improve upon with future augmentations to the feature.

It doesnā€™t do that. It compares the photo to a set of photos which include larva etc and determines this looks like x.

It doesnā€™t say this is a larva, letā€™s go compare against all the larva photos I have access to and limit my analysis to those.

3 Likes

I agree that it would be hard to do with everything all at once. A good approach is to start simple. Maybe pick a few families, run an experiments, see if it can scale and be maintained. My view is maybe it never has to be completed. If it can simply be improved in some useful way, say, for some problematic taxa, that might still be a win.

My secondary thought is that with the idea of seeding, it ties in with the entire concept of machine learning. ML needs humans to seed the knowledge. By seeding, over time, ML would learn and could inform the AI which scenarios are not identifiable, filling in the gaps. It might take years, but thatā€™s pretty much how it works.

1 Like

I like this idea insofar as I think its really helpful just to forewarn newer users or those unfamiliar with a taxa that there might be an issue. We have this in place in the UK platform iRecord. It automatically alerts you when you ID a species which is out of typical range or not usually identifiable from photos. I think it works well. I think it would also help identifiers from having to state all the timeā€¦ā€œnot identifiable from photosā€. Formalising this issue helps it to remain impersonal as well which could also save time.

I donā€™t feel this needs to be connected to the CV suggestions though? Why not simply initiate the notification when anyone IDs the taxa in question?

As this is already implemented in iRecord, I canā€™t imagine its totally unworkable. But as a way of not adding extra effort to create, perhaps it could use some sort of consensus voting mechanism to initiateā€¦ so the development side is just something simple, generic and embedded in all taxa. Then the rest could be crowdsourced. This sort of range of input would inevitably create a sliding scale of opinions, which could represent how tough it is to ID from photos alone. Iā€™m no expert, but I know a whole bunch of Diptera not to even try to go to species on. I think it wouldnā€™t hurt to pass on that information in the form of a vote. Plus the output doesnā€™t have to be anything rigid, complex or formal I think in comparison with taxonomic structure. It can be just a ā€œmight be a tough one FYIā€ kinda thing like @Star3 suggests:

I also like the symbols options suggested by @lm77. My worry though would be that this might get lost in the existing interface. Maybe in addition to a notification somewhere?

As a general rule of thumb, isnā€™t it just always harder to ID juveniles, nymphs, bare trees, etc? Wouldnā€™t it still be helpful to flag the taxa for users, even if only applied to those which are well known to be difficult as adults, or when in leaf?

2 Likes

Tangential to the Feature Request discussion here, but Iā€™ve often wished iNat supported a taxon info along the lines of Bugguideā€™s. Or a public-facing general-info flag system. So when I go to a species info page, I could see notes like ā€œhey this whole family requires microscopyā€ or ā€œhereā€™s a useful Guide or keyā€ or info on distinguishing commonly mistaken species.

1 Like

There was this topic: https://forum.inaturalist.org/t/add-comments-or-wiki-like-functionality-on-taxa-pages-to-discuss-identification-and-other-relevant-issues/91

Is iRecord for UK only or all territories? Creating such thing for one country already is a very big work, but for all species of the world it does sound undoable in short perspective.

UK I think.

Short term, even if you could just warn people going to species on things like Sarcophaga and Psychodidae, that could be helpful. The most common ones are the ones which take the most energy.

Perhaps the flags could just be raised on these repeat offenders, where there is the most needā€¦ rather than even trying to conquer all species in all locations.

1 Like

This is really only partially true at best. The out of range alerts cause a lot of false positives because theyā€™re solely based on existing records that have been accepted by the NBN Atlas. This is usually out of date, and has a large number of gaps and omissions - so the information it provides is often quite misleading. The ID difficulty ratings are more useful, but they only cover a limited subset of UK species. The ratings are provided by the various taxon-specific recording schemes in the UK and are quite narrow in scope. They only offer brief advisory messages about what kinds of evidence the schemes require, and donā€™t usually offer any details about why the idenification might be difficult. Hereā€™s an example:

screenshot-plbrQl

These messages can be useful if you already have some recording experience, but it is very common for new users to be confused by them. There are many posts on the iRecord forum from people thinking the messages mean their records have been invalidated in some way. Ironically, the usual advice is to simply ignore them.

I donā€™t wish appear too negative about iRecord (which I also use myself), but Iā€™m afraid your assessment here may be somewhat inaccurate. :upside_down_face:

1 Like

Ok nice! Good to have some finer grain to my limited awareness of the iRecord system :slightly_smiling_face:

The negative time costs resulting from any automation are a good point.
Any negative time costs to this action should be weighed against positive time gains though.

At present, on Sarcophaga photos IDā€™d to species, most identifiers might write a personalised note, explaining the difficulty. Every time every identifier writes a new note, adds a considerable net cost in identifier time to the overall process.

Lets say this is semi-automated and there is a button added for identifiers which when pressed gives the user a more formal note explaining the difficulty. Now identifiers only have to press a button instead of writing it outā€¦but its doing pretty much the same thing. Would this be a worthwhile change in terms of time cost/gain?

This would also enable the logging of the button presses. So then lets say after a period of time, the buttons for genera which are just being pressed over and over again are just pressed in advance, forewarning the user automatically as @star3 suggestedā€¦ or making a little symbol show up on the taxa in question, as @lm77 suggested. More time saved! Would this second change be worthwhile in terms of time cost/gain?

I would imagine the queries from users who receive an automated suggestion wonā€™t be more significant than the existing queries identifiers have to grapple with person to person. In addition, this negative response to the automation could itself be generic, formalised and minimisedā€¦ with FAQs, whats this buttons, etcā€¦

The cost for the developers to implement though ā€¦ is a whole other negative cost to factor inā€¦

2 Likes

IDers should save those messages, Iā€™d die if I type a polite message each time someone didnā€™t mark a cultivated plant or forgot to add an id.

4 Likes

Sure. I know some do. I need to get round to it! Iā€™m less organised :)
Whats the tool for it again?

In any case though ā€¦if some are using an automated message of their own and some notā€¦ this is just more evidence to build it in to the system in my book. Especially if iNaturalist wants to make the on-ramp for identifiers as smooth as possibleā€¦

1 Like

ā€¦but maybe Iā€™m the one suffering from confirmation bias hereā€¦ :crazy_face:

I saved it on another platform and copy when start iding, as new people make common mistakes this copied message gose many times in a row. I would like to have an option to have saved messages, but not if it is pre-made by someone else.

3 Likes

For me, I would rather it was just generic.
Less personal means less chance of it being taken personally.
Plus, Iā€™m not diplomatic enough to get the tone right so would rather it was written by a pro :slightly_smiling_face:
I get that that makes it more impersonal thoughā€¦which is another ā€œcostā€ in the equation I guessā€¦

Potentially, saved messages could still be logged in some way thoughā€¦like button presses. This segues into other ideas floated on other threads too about saving the ID features folks mention - which is already in the pipeline by the look of it :

In the online world, it isnā€™t too hard to take a step-by-step approach. Features can be rolled out by taxa, by region, etc. Perhaps boiling the ocean with all regions/taxa is undoable, but starting small and learning what works and what doesnā€™t work seems like a good approach. The other way to look at it is ā€œdoes it improve data accuracyā€? That helps reduce the pressure for perfection.

1 Like

itā€™s actually better to anticipate many of the the potential issues upfront so that you can create a foundation that will at least be flexible enough to handle things that will inevitably come up. just slapping something together for one place or taxa at a time and then adding as you go seems like a good way to create a lot of major rework, which is not a good thing.

2 Likes

Unless I feel like personalizing it, I just copy & paste from the frequent responses page (thatā€™s what itā€™s there for, after all):
https://www.inaturalist.org/pages/responses

I have personalized ones saved to an unpublished journal entry, so I have access to them on any device as long as I am logged into iNat.

4 Likes