Suggestions and CV omitting subgenus or tribe IDs

Platform: Website

Browser: Chrome

URLs of relevant observations: https://www.inaturalist.org/observations/92972564

Screenshots of what you are seeing:

Photos which are (and are most commonly IDed as) Dialictus:


  • Dialictus wasn’t suggested in either page.


  • Dialictus wasn’t suggested in either page.

Photo which is (and is most commonly IDed as) Augochlorini (although species ID may also be possible):

  • Augochlorini is suggested (correctly).

  • Augochlorini isn’t suggested.

Description of problem:

  1. Observation page-CV and Identify page-Suggestions (when set to resemblance) at least sometimes give different suggestions (very different, such as different genus or family). Ideally these would behave identically, or at least iNat would add an option to use CV in Identify or vice versa if not.

  2. In the eastern US (and globally) most Dialictus are IDed as subgenus or genus, very few are possible to ID to species. So, it would be best if CV/Suggestions suggested subgenus and genus as first suggestions, but they only occasionally seem to. They seem to overly suggest species, vs. tribe, genus, or subgenus. This can be replicated by viewing suggestions for many Augochlorini and Dialictus. Also for Agapostemon, it would be best to get suggestions for the nominate subgenus (vs. genus) in the eastern US where only it occurs. In summary, the recommendation is for CV and Suggestions to show the most commonly used IDs, whether that can be calculated globally (which would make no difference for Dialictus) or by location.

An additional request is for Place checklists to also show subgenera, genera, or tribes where applicable when Suggestions are set to checklists. I don’t mean species should be excluded, only that tribe and subgenus should also be shown when applicable.

  1. Observation page-CV and Identify page-Suggestions (when set to resemblance) at least sometimes give different suggestions (very different, such as different genus or family). Ideally these would behave identically, or at least iNat would add an option to use CV in Identify or vice versa if not.

I have noticed this, maybe it’s because the Identify suggestions are filtered by the current observation taxon, whereas the ones on the observation page are only filtered by the Iconic Taxon (plants, birds, insects, etc) of the observation taxon. I’m not sure if that explains all the discrepancy though.

  1. In the eastern US (and globally), most Dialictus photos are IDed as the subgenus or genus, and very few are possible to ID to species. So, it would be best if CV/Suggestions suggested subgenus and genus as first suggestions, but they only occasionally seem to suggest them. They seem to overly suggest species, but not tribe, genus, or subgenus enough. These issues can be replicated by viewing suggestions on many Augochlorini and Dialictus obs. Also for Agapostemon, it would be best for suggestions to show the nominate subgenus (vs. genus) in the eastern US where only it occurs. In summary, the recommendation is for CV and Suggestions to show the most commonly used IDs, whether that can be calculated globally (which would make no difference for Dialictus) or by location.

I’m a little unsure what you’re suggesting here, but I don’t think it’s possible with how the suggestions work now.

There are two, almost entirely separate, parts to the “suggestion” algorithm. First, the Computer Vision model tries to match a photo to a species or other taxon that was in its dataset of training photos. This only works with a single photo at a time, and only considers what the photo looks like, no other information. It returns a list of candidate IDs, scored based on how well the photo matches them.

Then, the website has its own separate algorithm that determines how these suggestions are displayed. It’s at this point that phenology and geography are considered, to mark things as “Seen Nearby” or not. Taxa seen nearby are given a little boost in the ranking. It’s also at this point that the suggestion algorithm decides if there is enough cumulative support for any taxon, to suggest it as “Pretty Sure.” This may be the genus of the single, highly supported CV suggestion, or it may be the family that the first few suggestions all belong to. The “Pretty Sure” suggestion is never a species (and, maybe never below genus, I’m not sure).

Your suggestion - not to suggest species when most IDs are at the level of genus or subgenus - is a good one on its face. It’s held back somewhat by the fact that the CV training data doesn’t contain “nested” taxa. Once a species in a genus qualifies for the CV, the genus itself gets booted out of the model. Maybe it could be implemented at the second step, (“if the top rated suggestion has no local IDs but there are many at a higher level taxon, suggest that instead”) but I bet that will be hard to implement well.

One of your examples (the one where Halictus was suggested) is just the CV getting it wrong. That’s something different and unavoidable. You don’t want the website to just override the CV and make up a new ID based on what is most common nearby.

1 Like

I’m asking for the most commonly used IDs (whether species, subgenus, genus, tribe) to be suggested first by Observation page-CV and Identify page-Suggestions set to resemblance (not to necessarily exclude species entirely).

Maybe, but it also reveals the first issue of large discrepancies between the results of CV (Halictus) and Suggestions (Agapostemon or Halictus).

I’m not suggesting this.

This topic seems reasonable to categorize as a bug report since it describes discrepancies or logically unexpected results. But in the event it were judged not to be one, I prefer it recategorized as a feature request. If the recommended changes from this post would require a site change/update to do, then that would also be part of my recommendation.

I’m asking for the most commonly used IDs (whether species, subgenus, genus, tribe) to be suggested first by Observation page-CV and Identify page-Suggestions set to resemblance

Maybe I’m misunderstanding what you are suggesting. But based on your description, it seems inevitable that “Honey Bee” would be the top suggestion for any insect (or at least for any Hymenopteran), because that’s the second most observed species (and thus the second most ID’d species) on iNat worldwide.

If you set it to “Visually Similar” which uses the CV, rather than “Observations” which does not, it once again thinks this is Halictus. This is just an AI misidentification, not a discrepancy that I can see.

This topic seems reasonable to categorize as a bug report since it describes discrepancies or logically unexpected results. But in the event it were judged not to be one, I prefer it recategorized as a feature request. If the recommended changes from this post would require a site change/update to do, then that would also be part of my recommendation.

We’ll see what iNat staff say, but I bet the changes required would be enough to warrant a feature request.

2 Likes

I described two issues:

  1. Discrepancies between CV and Suggestions.
  2. Neither CV nor Suggestions giving subgenus or tribe IDs often enough when those are the most common IDs for what the photo is (for what CV or Suggestions think it is). In the examples I’m suggesting that Dialictus and Augochlorini be suggested by both CV and Identify Suggestions (specifically when set to “visually similar,” which in my experience doesn’t give identical results to CV). I don’t recommend not suggesting species, nor that the most common ID among all insects/bees would be suggested, I mean based on the photo. The Dialictus example should be focused on since there are many species which show up in suggestions but most photos can only accurately be identified to subgenus.

Lastly, if one of the three examples was a mere CV misID, then it at most relates to (1) but not necessarily to (2), so (2) isn’t contradicted and the CV misID can be ignored for the time being.

Anyone else have comments on this?