Force computer vision to back off on the specificity of suggested IDs in regions with cryptic or hard-to-identify species

Do you DQA to make it RG at Genus - with your copypasta explanation for both ‘why back to Genus’ and ‘why RG’ ? That may eventually make people both more informed and less unhappy about it. You could include - Nearby is Australia!

2 Likes

I had another thought - this method, if implemented and tweaked, could also be useful for getting the CV to proceed with more caution with taxa where the CV tends to generate worse misidentifications (e.g. not just overspecific, but flat-out wrong).

1 Like

The disagreement metrics to be included in the confidence score for the CV is something I suggested some years ago, and Tony was commenting on it, so it has been on the radar of the staff at least at some time point…

https://forum.inaturalist.org/t/computer-suggestions-use-disagreements-as-a-measure-for-difficult-taxa/18311

4 Likes

I wholeheartedly agree with this comment! After many years of helping ID invertebrates in the western US, I am almost ready to drop out completely and only use iNat for my own observations. Not only is the CV often wrong, but it sets up an inherently depressing and antagonistic relationship between IDers and observers right from the start - instead of feeling like I’m helping someone by narrowing down their ID, I am instead always correcting over-specific IDs. And it’s impossible to keep up with all of the incorrect IDs - many taxa seem to be hopelessly beyond correction. Staff - please take steps like those suggested on this thread to have the CV suggest less specific ID’s for “difficult” taxa and/or warn users that the suggestions may be wrong! :folded_hands:

11 Likes

I share these sentiments so much, Ken. I also am teetering on the verge of quitting.

5 Likes

I would so desperately love a ‘species not known to occur locally’ flag

It might actually let me get Macrolepiota finally cleaned up in the states. I’m struggling because I’ll clean it up, and then before the CV updates, people start IDing stuff to M. procera or M. clelandii or whatever, and so there’s still stragglers corrupting the CV next time it updates.

It feels like a sisyphean task to fix.

Literally fixed this last week ;_;

5 Likes

I use a text expander

You have chosen a sp from
Please check the distribution maps.
?

had a run of New Zealand Endemics.

could have a new DQA - similar to Location not Accurate. New DQA - Out of Range! With the same - push to Casual effect, till the CID is back IN range.

Oh I explain to them but it doesn’t stave off the tide of new IDs.

I’d rather not have to fight against it, in general.

5 Likes

Location-specificity of CV suggestions is a related but different issue. That’s tied to ongoing failures of the geomodel that powers the “expected nearby” part of the CV suggestions. Some recent updates this summer have caused the geomodel’s expected range maps for various taxa to be too broad, too narrow, both simultaneously, and even have disjointed & wildly out-of-range locations marked as expected ranges, and is discussed further in threads like this one.

At least in my opinion, those geomodel issues are due to both inherent issues with the geomodel software and how it has been implemented, as well as what appears to be an incredibly ill-advised step-by-step implementation of a change to the way the software uses and encodes elevation information (instead of changing the way how the algorithm uses and reads elevation info across the entire algorithm at once, they’ve been only implementing those changes in parts of the algorithm per update, unsurprisingly leading to wonky results).

A look at the current geomodel maps for the species you mentioned (to do so, go to the taxon page, scroll down to the map, click the layers button in the upper right corner of the map, select “Expected Nearby Map”) shows that:

  • for M. clelandii (which I assume is supposed to be an Aus/NZ endemic species), the Expected Nearby map includes portions of Mexico, Central America, and South America, as well as a couple of random portions of Florida
  • for M. procera, there are random parts of NY, NJ, and Newfoundland that are marked as being within the expected range of that species

Of course, the CV could also generally stand to have more restraint in suggesting species that aren’t in range per the geomodel anyway, as not all of the (presumably misidentified) observations of those species are in areas covered by or near the geomodel map regions.

That is ultimately still an attempt to have volunteer identifiers apply temporary band-aid fixes to the problem, as opposed to having the people who are paid to run and maintain iNaturalist put in structural fixes.

I still have yet to see any sort of staff input on this feature request.

11 Likes

Ironically, quitting is a self fulfilling prophecy. If identifiers stop identifying as much, the model is likely to get worse as more miss identifications are not corrected and certain taxa that could be very helpful to improve the model are not added. Without people to tend to CV miss identification loops, they will just become too large and time consuming to realistically fix by ones self.

I don’t understand the lack of address from staff on these CV issues. All I can suggest is to continue tagging every so often (like a week), or reach out via messages or email. What I do know is from what is visible and public on GitHub, I have seen almost none of the suggestions above and on other forum posts have even been tested, or mentioned. An exception is intermediate ranks, but it doesn’t seem like it has really led to anything. The experiments largely took place in 2024, https://github.com/orgs/inaturalist/projects/10

Unfortunately without support from staff, most of these issues are just not fixable by the community alone. Even if they were fixable by the community, the community isn’t as unified as it could be in dealing with and helping one another with CV issues from my experience. Some specific issues in certain cases are solvable, but need work

6 Likes

Right, but what I’m saying is you can’t tell when the geomodel is going to update - at least as far as I can tell, please correct me if I’m wrong here.

So if I clean it up, that doesn’t remove the bad suggestions, so it’s like a hampster wheel of trying to keep it clean long enough for there to be an update. I’m having trouble doing that with an extremely easy to ID group that should, theoretically, have other people contributing IDs - imagine how bad it is for less easily identifiable taxa.

EDIT: I’ve, without exaggeration, been going through every single needs-ID fungi in my county parks system that isn’t a lichen, rust, or powdery mildew since about 10am. A lot of them are just blurry observations that are difficult to ID, sure, but i’ve ran across some extremely easily identifiable things that got stuck at kindgom Fungi and never got further because they fell through the cracks. I don’t have time to clean up all of these AND fight against a CV which is continuining to insist that species that don’t exist here, exist here.

I do acknowledge this is a slightly different/adjacent issue but I’m pretty sure I’ve already brought it up in that thread.

4 Likes

You are somewhat correct.

The geomodel updates when the CV updates. The CV updates are somewhat predictable. They happen every 1½-2½ months give or take. Meaning you can pretty much be fine knowing it is not likely to update within a month of a CV update. Theres also sometimes delays. But if its been 2½, 3 months since a CV update, it should be happening soon.

If you relay more details on the specific taxon with CV issues, perhaps i could help.

Also its not just about needs ID observation data. Data from casual and RG are fed into the model. So your specific case may have observation data influencing the model you’re not aware of.

2 Likes

Thank you for the extra clarification.

The last point is a huge reason why being able to force it to not suggest certain taxa would be a huge boon. Force us to tag it first, hell, force multiple curators of a taxa to vote that it should be excluded from CV in a region. It’s not something I’d do for all taxa but there’s definitely a few that it would be useful for.

1 Like

agreed - it’s the interplay between the CV and the Geomodel. Both generate poor predictions for many fungi.

We need to encourage more identifiers and this issue is a significant dis-incentive. It also leads to significant erroneous observations and the consequent exclusion of iNat data-sets from research.

1 Like

I’ve probably said this before, but I think that invertebrates are considered sort of an edge case from iNat’s perspective - they are impossibly diverse, tons of (near)cryptic species, taxonomy is messy and always changing, many can’t be IDed from images (especially your typical cell phone images), they are often sexually dimorphic and undergo various types of metamorphosis which throws off image recognition - lots of issues. From iNat’s perspective, the Computer Vision works quite well for most things that most people observe. And as a small org. they don’t necessarily have the development resources to cater to the particular needs of entomologists and people who study other groups of diverse+cryptic organisms. Frankly I can appreciate that, even though it has been frustrating to me personally. iNat of course does not want all the bug identifiers to stop contributing their expertise, but they also don’t seem to have the resources to make these improvements. I don’t think any of this has been outright stated by staff and I don’t want to put words into anyone’s mouth but it’s kind of my take - while they’d love to be able to re-code parts of the site to improve functionality for certain groups (and many of the staff personally do love inverts), we are kind of an outlier and they can’t make the site perfect for everyone.

Like @kschnei I was faced with the decision to give up on trying to help with identifications, lest my frustration with these issues kill any enthusiasm I had for iNaturalist period. FWIW I have been enjoying the site a lot more as an observer first and an occasional ID resource second. This stuff has been going on for years - when I first joined and started helping with IDs (2018) there were already experts leaving the site due to essentially these same issues. Since then iNat has exploded in popularity and the observations are flooding in, and it’s become increasingly impractical for small groups of people to try and brute force our way around these problems. Nobody should feel responsible for maintaining iNat’s data quality. If you’re not enjoying IDing anymore, just cut back or stop - don’t let these issues kill your enjoyment of the site. This is not really directed at you personally @zoology123 but anyone who has these complaints :slight_smile:

11 Likes

I just noticed on Alex’s profile and journal that he’s moving on from iNat as of a few days ago. He was with iNat for 11 years, so that’s a lot of experience leaving. I don’t want to speculate too much but I imagine that will put a bit of a bottleneck on CV development if there wasn’t one already. It seems like he was working on fixing the issues brought up about the geomodel this summer (as well as the hybrid tests) so hopefully we can still see that for the next update.

7 Likes

I agree with some of your points but I really don’t think it makes sense to treat invertebrates as some fringe thing here. Invertebrates make up roughly a third of all observations on iNaturalist. Nearly three times as many as birds! I think even the most casual observers could easily get demotivated from using the app if no one ever even tried identifying any of their bug photos, or if they learned all invert IDs were wildly unreliable because the CV had devolved and all the identifiers had left. I don’t think this is about “the particular needs of entomologists” any more than having functioning plant IDs are just an issue affecting the particular needs of botanists. Like here are the observation totals if we arbitrarily split life into four big groups:

  1. Plants (107m)
  2. Invertebrates (91m)
  3. Vertebrates (55m)
  4. Other (18m)
8 Likes

I had no idea. That is pretty big news to me. To be honest I’m not sure really what’s happening anymore. iNaturalist doesn’t really do a good job conveying updates / news. There is supposed to be a monthly update post in the forums, but it has been skipped for many months lately.

3 Likes

Yes, it appears true. I don’t think it’s appropriate to speculate why here. I’m not sure who else to contact now for CV related things, wasn’t Alex the lead developer of it?

If there is currently no main developer working on it, I have concern when or even if these issues will be addressed anytime soon.

I suppose I’m just wondering how is the project organized now? Is there a team, one person? A lead dev? Multiple devs? I admittedly don’t really know how it was structured before. Eitherway who do we now relay concerns, bugs, or anything else related to the CV?

3 Likes

Previous CV update was 16 August. Blog post by @loarie

https://www.inaturalist.org/posts/115962-new-computer-vision-model-with-over-2-500-new-taxa

iNat staff https://www.inaturalist.org/pages/team