Don't suggest genus for monotypic genera (computer vision)

Not sure if this is exactly a bug, but the auto-ID sometimes suggests monotypic genera. For example, this observation: https://www.inaturalist.org/observations/24639253 suggests the genus Sanguinaria. There is only one species of Sanguinaria. This seems to imply that it is more confident in the genus than the species which makes no sense:
image

There are no research grade observations at the genus level: https://www.inaturalist.org/observations?lrank=genus&place_id=any&subview=table&taxon_id=51045

2 Likes

It’s to do with the way Computer Vision technologies like this one work. You can only effectively train them at one level. In this case, it’s species level.

Where possible, the system then takes that species data, and mashes it together to try and predict a genus.

So what you’re seeing here is two different processes. The first is the raw confidence data for the observation, displayed as a list of species. The second is the result of a calculation of that data to work out whether it passes the required threshold to predict a genus.

When doing the genus calculation, the system only uses the relatively short list of possible matches (the species list you can see), so it knows there’s one possible match in the Sanguinaria genus, but doesn’t bother to see if there are any that didn’t make the list. Obviously though, it was quite confident on S. canadensis, because that one species was enough to pass the threshold for genus prediction

4 Likes

I believe there is a way for curators to mark any taxon as “complete” (containing all of its descendants) in iNaturalist. Seems like it wouldn’t be too hard to have the system check for this status, check the number of descendant species in the system, and always provide the species taxon if the suggested taxon is complete and monospecific.

6 Likes

I think that is making it unnecessarily complicated. It gives both genus and species, states it has high confidence in the genus and lists species as the first option in the list. To me, that’s a win!

3 Likes

If an identifier doesn’t happen to be aware that the genus is monospecific, however, and decides to be prudent and choose genus, then that still requires an extra ID to move it along to species level later. Might just save some superfluous ID steps…

3 Likes

This does seem like a bug to me. The list of suggestions strongly implies that an uncertain user should choose the monotypic genus. But choosing the species in that genus is always better. I understand that there is some reason why the computer vision algorithm happens to work this way, but that doesn’t make the end result any less of a mistake. Also, it seems straightforward to add post-processing to the computer vision algorithm’s current behavior to achieve the clearly better result of offering the species as the “pretty sure” choice.

I don’t think this is a particularly important bug, but I do think it’s clearly a bug from the UI point of view.

5 Likes

Hmm… I guess makes sense and it isn’t really a bug. Still seems like a strange algorithm. Something more like this seems like it would make more sense:

  1. If score for the top species is >0.8, display that species.
  2. If top two taxa both score >0.7, display the common ancestor of those two taxa
  3. Otherwise don’t display any confident identification

Made up numbers but you get the idea. Really, the algorithm shouldn’t ever be saying “We’re pretty sure this is in genus x” if the top two suggestions aren’t both in genus x.

3 Likes

I was going to argue against this, but after a quick re-read, it made a lot of sense. Even the numbers seem pretty good. I would add the following:

  • Keep the species list underneath, but de-emphasise it (make it smaller, or greyed-out or something). I like the idea that if the AI gets it wrong or isn’t confident, an experienced identifier can still quickly spot the right species in the list and click it (many IDers like to do this rather than typing for efficiency)
  • Common Ancestor suggestion should have a ceiling. “We’re pretty sure it’s Life” shouldn’t be a serious prediction (which would happen in your model if a top-two of sea sponge and mushroom was <0.7). Maybe Class - above that should be obvious to almost everyone in most cases.
2 Likes

one could argue that in general we could turn off genus level ID for monotypic genuses and just have any attempt to use it default to species id

4 Likes

I can imagine a researcher who is planning on splitting a monotypic genus might prefer the ability to only put it to genus until it gets published, but that seems like a real edge case, whereas the Sanguinaria situation has probably come up many, many times.

2 Likes

Nothing would prevent them from doing that. They just couldn’t use Computer Vision as a shortcut, and would have to type in (part of) the genus ID.

I do find that I am frequently adding IDs to observations where monospecific genera have been specified, just to get them to species level. Would be more efficient if I could just agree to an ID that was already at species level in those cases.

3 Likes

I was interpreting this statement:

as a suggestion to automatically change the id to species-level site-wide. If it were only on the computer vision part, then I agree, there would be no issue.

It was. But I don’t feel strongly about it.

I heartily agree with this suggestion. The monotypic species issue is a point of needless complication, in my opinion. Both for this specific issue and people leaving monotypic species on genera or even the family (or higher) level.

2 Likes

I just ran into this again today when the computer vision suggested the monotypic genus Diadophis. It seems so obviously wrong!

For what it’s worth, I support this idea a lot. Though it might take some effort to ensure iNat can detect “monotypic” genera correctly, as opposed to the genus having only one out of several species listed.

3 Likes

We discussed this and we’re pretty comfortable with it if the genus is marked complete. So it would affect most vertebrates and some other clades.

9 Likes

There are also identifiers who are aware of taxonomic revisions, and that a monotypic genus may have been split at any time since the last time they looked it up. Or who do not know that a formerly multi-species genus has become monotypic since the last time they looked it up. Cladistics seems to have made taxonomy a lot more unstable that it used to be. I sure was surprised to find out that cacao is considered to be in the mallow family now; all these years it was Theobromaceae for what had seemed like good reasons. My awareness of this makes me less confident of identifying something to species.

6 Likes

A lot of plant genera in particular probably need updating then to account for this. I don’t think many of those are marked “complete”, that tends to come with bird or mammal taxonomy more often.

1 Like

This really does need to be addressed. The taxon that comes to mind for me is Malosma laurina, the only member of a SoCal genus Malosma. I can’t tell you how many times I’ve IDed genus-level Malosma observations to the single species, and then had to write out a comment saying “this is actually the only species in this genus, so no need to ID to just genus level”. It’s just aching for a fix. Although, I’m told by forum moderator that they aren’t taking CV-related feature requests, which seems odd. Maybe a lack of in-house expertise, or concern over having to take it offline for some short period of time to make changes?

1 Like