I had raised this issue in the model thread here:
https://forum.inaturalist.org/t/new-computer-vision-model-released-august-2022/34588/6
And speculated there on whether there could be strategies employed during the model development process to help reduce such outliers.
In terms of the user experience, I think part of the issue is that there are no confidence ratings displayed along with the suggestions. It appears that what we are seeing in many cases is a very sudden drop off in confidence rating when there are no other options available to display and you end up down into the “noise” so to speak. Now, it must be that iNat already excludes results below a certain confidence threshold, since you only get shown a few option in some cases. So one option would be for them to simply increase the level of this threshold. However, that also risks excluding useful suggestions for species that the model is simply less capable with. (And similarly for more complicated options such as adjusting the confidence threshold relative to the number of results returned – it is difficult to get a “one size fits all” solution).
As an example, BirdNET shows a bar chart of the different probabilities in their bird call identifier (would be great if that could be integrated with iNat BTW!):
https://birdnet.cornell.edu/api/
Though I think a (e.g. %) number value would be simpler to implement and understand as part of the iNat suggestions list.
Displaying these values allows the user to either ignore lower rated ones in those cases where there is clearly already good suggestions to consider, while still being able to view them in those cases where none of the higher rated suggestions are providing a good match. Often when I am submitting a species I am not overly familiar with, I will have a look at a number of the top suggestions and their taxonomic trees to try to get a feel for what might be a reasonable first stab to say tribe or genus level at least. It would be helpful to be able to see in the list at what point the confidence rating starts to drop away more steeply to help decide how many from the list are really worth a closer look.