In cases where the CV is not "pretty sure" of anything, offer a suggestion of a higher taxa

Not sure if this should be described as a bug or a feature request… but it always seems strange to me that if not “Pretty sure of” something, the CV will nevertheless suggest something as specific as a species level ID.

E.g.
Here it isn’t pretty sure of anything…yet the top two suggestions are species level and both incorrect.

Here it is pretty sure its something, and thats only a superfamily, but thats correct.

Wouldn’t it be better if it always showed the lowest common denominator of the top suggestions? Even if that ends up being a “Pretty sure its a Fly!” … if it comes to it, rather than autosuggest something species level but uncertain?

As with human identifiers, and the suggested “Identification etiquette”, ideally, I think the top AI autosuggest should represent the level it has an actual degree of certainty about. My hunch being that users, like myself at times, will just select the option at the top of the list if they have no idea, to keep things quick.

Species level IDs with very little certainty increase the chance of incorrect RG observations slipping through the net, creating the kind of feedback loops we see noted in the CV clean-up wiki.

It’s also more work for identifiers if their IDs involve a conflicting branch, than it is to advance a coarser ID to a lower rank.

Alternatively
I think I noticed that in the app, it actually states " we are not certain of anything" or something along those lines in this instance?.. I think it would be good to at least have this kind of wording on the website as well as the app, if the above is not possible / desired.

Yes. There needs to be a significant review of this functionality. Novice users have no idea how untrustworthy these AI suggestions are. Species and genus-level IDs should be reserved for instances where there is strong evidence (something akin to a 95% confidence interval). Otherwise, the standard should be to offer IDs at a much higher taxonomic level, or perhaps none at all if there is no obvious suggestion.

I’d also argue for AI identifications not being factored into Research Grade observations, at least for those that fall below a certain confidence interval.

5 Likes

Yeah I wondered about that as well… this is presumably another factor in the kind of things we see making it into the CV clean-up wiki…creating a kind of confirmation bias in the system…

Another even more extreme example - 2 x autosuggests taking it to RG ! …( incorrectly I would guess judging from blurry photo and that both users seem to be new to platform )

Screenshot 2020-07-26 at 14.14.06

1 Like

I disagree with the principle, because I may submit an ID using the AI after having verified thoroughly that this is a “good” ID (checking the details visible on the photo and comparing with the taxon page, checking the other species in the same genus already identified in a large region surrounding the observation, checking the distribution map provided on POWO).

Using the AI to find and select an ID to submit does not imply that the act of identifying is automated and of poor value.

11 Likes

In theory though, if autosuggested IDs were not factored in, that wouldn’t prevent you from then making the same ID again on the observation without use of autosuggest in order for it to have full power.

I think the point of this option to my mind would be that it would empower identifiers over less experienced observers. As an observer who also uses the autosuggest, I wouldn’t be opposed to the onus being more on me to (for example) resubmit an ID than on identifiers to overcome my incorrect autosuggested IDs.

Ideally though there must be a way of creating an interface that supports both users who simply want to drop an observation into the correct ball park…as well as users who are using AI with care / as an autofill. Something like a toggle during upload / account preferences would also work in that regard perhaps.

There have since been a few threads. Many of us use CV and autosuggest as the quickest - least clicks and RSI - way to get to the ID we know. No reason to reward I Can Typing over efficiency.

But I hope, all these years later - we are closer to being offered a broadly right taxon over narrowly wrong ones. Or a display of the taxon levels to choose (up) from.

4 Likes

And another year has passed. Still ‘under review’

2 Likes

Not sure why this hasn’t been addressed. Technically this was introduced with iNat Next, but it also potentially went backwards at the same time. While the common ancestor can now be a higher rank, if the CV is confident enough. A common ancestor will not be listed.


3 Likes

I completely agree with this being done! So many times I have to individually select each suggested ID (if im not knowledgeable on the taxa) to follow each of their taxa trail to find their most last common taxa represented or even just the top.

Even if it was just “plants” it would help give it the correct category for the people who know to help further taxa ID down the chain.

I’ve found this to create a lot of “unknown” issues and incorrect taxa even on the broad scale which then puts it in the incorrect categories that mess up that taxa data.

2 Likes

I’m glad you’re doing the work to put accurate IDs on your observations, even though it’s a clunky workflow that needs a better system available. I’ve had to do the same thing sometimes. There is a way to speed up that process quite a bit, which is available on the unofficial iNat Discord server. I can provide an invite link if anyone is interested in joining.

(For some reason that reads like an advertisement. Don’t worry, there are no purchases involved. It’s kind of like this forum except in a very different format.)

1 Like

You might be interested in using this tool for identifying, allowing you to select any higher taxon (and also to browse lower taxa, if you want to select a subspecies).

This tool is a Windows desktop application that communicates with the iNat server. It has evolved since the previous presentations on the forum. Presently :

There are 2 modes for using the program for direct identifications:

  • Option OnlyCacheReview: the program loads a [huge] local cache of observations (filled earlier when populating the Phylogenetic Projects, then filters the cache applying the CV-based filter you choose, then the program checks online the observations filtered in order to get the current “Community ID”, and in order to filter out observations already reviewed or identified by you, and only then the program downloads the photos and displays the observations. In this mode no requests for CV suggestions are sent to the iNat server, as a consequence this mode spares the server ressources at best.

  • Option ReviewGenera: the program downloads observations (basic datas + CV suggestions) from the iNat server based on a predefined URL filter (consisting in one or several taxa of your choice, for instance “Crassulaceae”), stores all data in a separate local cache (so that you can close/reopen the program without data loss), then filters the observations applying the CV-based filter you choose, then displays the observations. This mode is named ReviewGenera because the program will download only observation presently identified at the rank genus.

I like both modes, they are complementary in the way they targets observations. Using CV-based filters at the rank species, I can ID 100+ observations of the same species in a few minutes, just checking the photos and clicking on “Submit ID”.

In short, in comparison to identifying from the iNat web site, the extra features are:

  • suggestion of a higher taxon (this discussion in the forum), when the CV is not “pretty sure”,
  • quick access to all higher taxa, in order to submit any higher taxon as your ID,
  • quick access to lower taxa (on click on a button “Children“), including subspecies and hybrids (never proposed by the iNat web site),
  • filter observations to review based on their Computer Vision data (in addition to the URL-based filter, targetting for instance the current observation ID and/or a place),
  • local cache of preloaded observations datas (+ preloaded photos of the next 50 observations to review), making the review/identification process much faster.

If you are interested, we need to further discuss so that I undersand your habits/targets, preconfigure the tool for you and preload some data, to allow you be ready to identify and have a better first experience with the tool.

4 Likes