List of "Similar species" could include mismatches between AI-generated IDs and human-made IDs

mferreira · July 5, 2024, 9:30pm

Platform(s), such as mobile, website, API, other:
Website

URLs (aka web addresses) of any pages, if relevant:
https://www.biodiversity4all.org/taxa/1158203-Ulex-jussiaei
https://www.biodiversity4all.org/observations/227284318
https://www.biodiversity4all.org/observations/227285335
https://forum.inaturalist.org/t/expand-the-similar-species-tab-into-an-editable-identification-guide/13890

Description of need:
The “Similar species” tab is a valuable resource when attempting to identify a less-known species. So far it seems to be based only on pending ID mismatches among users. It could be expanded in several ways:

A species once recognized as “similar” because of an ID mismatch should remain in the list of “Similar species” list even after the ID mismatch is solved for all observations, because that confusion may well happen again in the future. The number of pending mismatches (shown in the upper right corner of the image of that similar species) would then become 0 but the species would still appear in the list.
Whenever a user rejects the topmost AI-generated IDs and provides an alternative ID instead (selected further down the list or manually inserted), all species that appeared above in the list of AI-generated suggestions should be considered “similar species” and that information should be reflected in the “Similar species” tabs of all species involved.

Feature request details:
Consider these observations, for example:
https://www.biodiversity4all.org/observations/227284318
https://www.biodiversity4all.org/observations/227285335
When I uploaded these observations, the CV algorithm suggested the IDs Ulex minor, Ulex europaeus and Ulex parviflorus as having higher probability than or similar probability to Ulex jussiaei (the correct ID). However, in the “Similar species” tab for Ulex jussiaei
https://www.biodiversity4all.org/taxa/1158203-Ulex-jussiaei
only Ulex europaeus is mentioned. That’s not because the other species are not prone to confusion with Ulex jussiaei: that’s simply because all past ID mismatches between Ulex jussiaei and the other species have already been sorted out. However, the risk of confusion remains for the future, and it is important to see that information in the “Similar species” tab: in that way, any user who is about to identify a plant as Ulex minor or Ulex parviflorus will become aware that the true ID might be Ulex jussiaei as long as he or she remembers to visit the “Similar species” tab for any of those species.

jeanphilippeb · July 6, 2024, 8:16pm

I am not sure to agree with this point, because this feature would induce a bias with regard to species not yet in the Computer Vision model (because all observations of these species have wrong topmost AI-generated IDs, and there is no general reason for these IDs to disserve a “Similar species” status). In other words, the unavoidable confusion of the AI with regard to observations of species not yet in the CV model does not necessarily correspond to a equivalent human confusion. This AI confusion does not necessarily provides usefull information for human identifiers.

optilete · July 7, 2024, 6:16pm

This information could be outdated with next CV release. I should not implement it.

lj_l · July 7, 2024, 8:38pm

This is already the case. See the Similar Species page for the often overlooked Anadara chemnitzii, for example:

Corrected (RG) observations are still shown. If a taxon has very little species on the “Similar Species” tab, that means that the taxon doesn’t have many misidentifications

DianaStuder · July 7, 2024, 9:01pm

If we delete our ‘wrong’ ID it won’t appear as a Similar Species.
If mine was a mis-click and hasn’t yet attracted another ID - I will delete. But if I feel the two can be confused, were confused by me - then I Withdraw to trigger Similar Species.

mferreira · July 10, 2024, 11:59pm

This doesn’t seem to be the “similar species” tab that I know. This page shows observations, not a list of similar species. What I mean is the 5th (the rightmost) tab in a page such as this: https://www.biodiversity4all.org/taxa/60476-Armeria-maritima#similar-tab

OK, got it now. You were one step ahead, you opened the list of conflicting IDs. Those RG IDs have not been corrected, they have been superseded by a sufficient number of correct IDs. There’s still a maverick ID in each of them, for example https://www.inaturalist.org/observations/197554896 . Once that user corrects his/her ID, that observation will no longer be counted as an ID mismatch. If all such IDs are corrected then that species will no longer appear in the list of similar species: the confusion between the two species will be “forgotten” by the system (it will no longer be mentioned in the list of “similar species”) until there’s another ID conflict between those same two species. I witnessed this with species of Myosotis and Rumex which I once reviewed.

lj_l · July 11, 2024, 1:30am

Oh, by “corrected” do you mean the incorrect ID is withdrawn? If so, I’m not sure what happens then

pisum · July 11, 2024, 1:34am

yes, there are many cases where the computer vision returns suggestions which, if not chosen, do not necessarily reflect “similar species”. for example:

i have a photo of a plant and an insect on that plant. i want to ID the plant, but when there is no existing identification, the computer vision typically prioritizes animals over plants. so if i ignore an insect suggestion, it’s not necessarily because it’s a misidentification of the insect, it’s because i wanted to identify the plant instead.
cyptic animals sometimes are hard for the computer vision to evaluate properly. it sometimes will suggest things that it associates with noise or blurry photos. in these cases, the computer vision just doesn’t have good suggestions in the first place. so it doesn’t follow that those bad suggestions were even close to the right identificaiton.
blurry photos or distant photos or odd crops, etc. can all produce unexpected / bad suggestions. as in the point above, it doesn’t follow that bad suggestions were even close to the right identification.

moreover, there’s no simple way to merge the existing algorithm with the proposed AI suggestions comparisons. they’re apples and oranges. given these two methods, you have to choose between one or the other, not both.

too complicated. you’ll never know if someone withdrew an ID because they misclicked and picked the wrong ID originally or if they actually thought that original ID was legitimate. things get sketchier when taxon changes are involved, too. it’s better to rely on active disagreements.

mferreira · July 12, 2024, 9:47am

Withdrawn or replaced, yes. I know what happens then because I saw it in several cases: first the number of mismatches/conflicts between the two species involved is recalculated, and once it goes down to zero, each species ceases to be shown as similar to the other.

mferreira · July 12, 2024, 9:51am

Agreed, @pisum. Your examples convince me that my suggestion would not work as expected. Perhaps there’s “food for thought” for the programmers (that was the intention) but the idea would have to be carefully matured before being implemented.

Topic		Replies	Views
Suggested IDs could include similar species not yet recognized by the algorithm Feature Requests identify , identification , computer-vision , ai	6	306	July 12, 2024
Add "often confused with" warnings Feature Requests	12	1651	December 4, 2019
Change computer vision suggestions to only above species level Feature Requests	32	2785	May 29, 2019
Anyone else noticing a big increase in outlandishly bad IDs? General	23	2988	July 27, 2019
Species Suggestions for the Wrong Continent General	91	10356	September 24, 2021

List of "Similar species" could include mismatches between AI-generated IDs and human-made IDs

Related topics