Genus not included in CV model

Please fill out the following sections to the best of your ability, it will help us investigate bugs if we have this information at the outset. Screenshots are especially helpful, so please provide those if you can.

Platform (Android, iOS, Website): website

App version number, if a mobile app issue (shown under Settings or About):

Browser, if a website issue (Firefox, Chrome, etc) : chrome

URLs (aka web addresses) of any relevant observations or pages:

Screenshots of what you are seeing (instructions for taking a screenshot on computers and mobile devices: https://www.take-a-screenshot.org/):

Description of problem (please provide a set of steps we can use to replicate the issue, and make as many as you need.):

This may not be technically a bug, but I consider it a very serious design flaw in the CV model:

I have been identifying crickets and huge numbers of observations of Gryllus species are being identified as Neonemobius (which is in a different family), and this is because the CV model is frequently suggesting Neonemobius. The reason for this, as has been explained to me, is that the CV model stops including a genus in the model if one of its species has been included. This is really bad for Gryllus, because most species cannot be identified to species by photo. Song recordings are needed. So the vast majority of observations are identified to genus level. So in a given region, there may be no Gryllus species seeming to occur in the area, so the CV just ends up suggesting Neonemobius. Gryllus is very recognizable from photos and there are thousands of observations with two or more IDs for them, plenty for the CV model to be able to robustly identify them.

This is a truly huge mess. There are thousands of Gryllus observations currently misidentified and it is a huge burden on the identifier to clean things up. The CV model really should be modified to handle this situation correctly.

7 Likes

This is a known issue that has frustrated me for a long time. Earlier this year I saw staff member Tony Iwane say, “If the child of any genus or family, etc, is in the model, then that genus will not be included in the model. Which is not great, but they’re working on functionality that will surface higher-level taxa as suggestions in the future.”

I don’t know how exactly they’ll implement it, or what the timeline will be. But hopefully that will improve this situation.

6 Likes

Thanks for raising this issue. I agree it is something worth fixing (or at least considering).

1 Like

+1 for this (it could be a feature request to include parent taxa in the model, even if the children are already included)

Thanks for doing this. I’m sure you’ll find people here happy to help fix it if you can provide some guidance in the Identifriday thread. I agree it that making the model suggest coarser taxa would be great.

This is correct.

Currently iNat “walks up” the tree to try and provide a common ancestor option based on suggested species. So while the parent taxon won’t be in the model, iNat will often provide one as a suggestion, eg:

In this case, iNat is taking the scores from the results and under “We’re pretty sure it’s in the genus ” it’s suggesting the most likely coarser taxon based on the scores.

2 Likes

I can see this working fine when many of the child taxa are already in the model, but the problems arise when most of them aren’t, even when there are many observations identified to that genus (and which can’t or shouldn’t be identified further based on the evidence provided). For some large genera, only the most distinctive (and different-looking) taxa have abundant observations identified to species, so the standard ones may not be accurately identified.

One tweak would be to stop including a genus in the model only when more than some percentage (50%?) of the species in it have been included.

2 Likes

Just to be clear I wasn’t disagreeing with you I was just trying to explain what’s currently happening. I do think there needs to be improvement here.

2 Likes

Of course, no worries! Just trying to think of ways forward.

I think at present that when higher taxa are presented there is no geographic information used (Expected Nearby), so that might be another element to work on for higher taxa that aren’t directly in the model.

1 Like

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.