Computer Vision suggests monotypic genera

Not sure if this is exactly a bug, but the auto-ID sometimes suggests monotypic genera. For example, this observation: https://www.inaturalist.org/observations/24639253 suggests the genus Sanguinaria. There is only one species of Sanguinaria. This seems to imply that it is more confident in the genus than the species which makes no sense:
image

There are no research grade observations at the genus level: https://www.inaturalist.org/observations?lrank=genus&place_id=any&subview=table&taxon_id=51045

2 Likes

It’s to do with the way Computer Vision technologies like this one work. You can only effectively train them at one level. In this case, it’s species level.

Where possible, the system then takes that species data, and mashes it together to try and predict a genus.

So what you’re seeing here is two different processes. The first is the raw confidence data for the observation, displayed as a list of species. The second is the result of a calculation of that data to work out whether it passes the required threshold to predict a genus.

When doing the genus calculation, the system only uses the relatively short list of possible matches (the species list you can see), so it knows there’s one possible match in the Sanguinaria genus, but doesn’t bother to see if there are any that didn’t make the list. Obviously though, it was quite confident on S. canadensis, because that one species was enough to pass the threshold for genus prediction

2 Likes

I believe there is a way for curators to mark any taxon as “complete” (containing all of its descendants) in iNaturalist. Seems like it wouldn’t be too hard to have the system check for this status, check the number of descendant species in the system, and always provide the species taxon if the suggested taxon is complete and monospecific.

2 Likes

I think that is making it unnecessarily complicated. It gives both genus and species, states it has high confidence in the genus and lists species as the first option in the list. To me, that’s a win!

If an identifier doesn’t happen to be aware that the genus is monospecific, however, and decides to be prudent and choose genus, then that still requires an extra ID to move it along to species level later. Might just save some superfluous ID steps…

1 Like

This does seem like a bug to me. The list of suggestions strongly implies that an uncertain user should choose the monotypic genus. But choosing the species in that genus is always better. I understand that there is some reason why the computer vision algorithm happens to work this way, but that doesn’t make the end result any less of a mistake. Also, it seems straightforward to add post-processing to the computer vision algorithm’s current behavior to achieve the clearly better result of offering the species as the “pretty sure” choice.

I don’t think this is a particularly important bug, but I do think it’s clearly a bug from the UI point of view.

1 Like

Hmm… I guess makes sense and it isn’t really a bug. Still seems like a strange algorithm. Something more like this seems like it would make more sense:

  1. If score for the top species is >0.8, display that species.
  2. If top two taxa both score >0.7, display the common ancestor of those two taxa
  3. Otherwise don’t display any confident identification

Made up numbers but you get the idea. Really, the algorithm shouldn’t ever be saying “We’re pretty sure this is in genus x” if the top two suggestions aren’t both in genus x.

2 Likes

I was going to argue against this, but after a quick re-read, it made a lot of sense. Even the numbers seem pretty good. I would add the following:

  • Keep the species list underneath, but de-emphasise it (make it smaller, or greyed-out or something). I like the idea that if the AI gets it wrong or isn’t confident, an experienced identifier can still quickly spot the right species in the list and click it (many IDers like to do this rather than typing for efficiency)
  • Common Ancestor suggestion should have a ceiling. “We’re pretty sure it’s Life” shouldn’t be a serious prediction (which would happen in your model if a top-two of sea sponge and mushroom was <0.7). Maybe Class - above that should be obvious to almost everyone in most cases.

one could argue that in general we could turn off genus level ID for monotypic genuses and just have any attempt to use it default to species id

1 Like

I can imagine a researcher who is planning on splitting a monotypic genus might prefer the ability to only put it to genus until it gets published, but that seems like a real edge case, whereas the Sanguinaria situation has probably come up many, many times.

Nothing would prevent them from doing that. They just couldn’t use Computer Vision as a shortcut, and would have to type in (part of) the genus ID.

I do find that I am frequently adding IDs to observations where monospecific genera have been specified, just to get them to species level. Would be more efficient if I could just agree to an ID that was already at species level in those cases.

1 Like

I was interpreting this statement:

as a suggestion to automatically change the id to species-level site-wide. If it were only on the computer vision part, then I agree, there would be no issue.

It was. But I don’t feel strongly about it.