Not sure if this has been asked/answered before but I am curious if it is possible to ask staff to remove certain taxa from the computer vision AI identification reccomendations. (i.e. if there is a way this has been done in the past/correct way to proceed).
I study a lot of cryptic groups of plants, with a focus in Blackberries (Rubus spp). The problem is they are extremely diffucult to identify and most identifications on iNaturalist are incorrect, to attain species-level identification you often need many angles of many parts.
But where my question comes into play that Rubus bifrons was said to occur in North America for many years and in turn +10k iNat records in the PNW were under this name, though this entire time these were false identifications and all records from the PNW actually are either R. armeniacus or R. procerus.
Between the work of myself and other identifiers we have nearly overturned all the identifications.
The problem is places that no longer have any observations (neither RG nor Needs ID) still have CV suggesting R. bifrons.
What I would like in this instance is to have R. bifrons excluded from the CV, at least in North America (there could be made a case too that it shouldnât be in the CV globally though).
There are other instances where I think this could be applied but this is a clear and prominent example I wanted to highlight.
Though I believe the link above is more general about certain groups as a whole, while my request is to remove purely an individual taxon which seems easier to change.
Also I will say this also applies to several âcommonâ fungi and inverts.
Off the top of my head I know several Russula should be nixed from the CV, many requiring microscopy, and even with microscopy it is hard to confirm without genetic sequencing.
Next update will fix the geomodel and bifrons will not be recommended for the PNW,
bifrons being recommended way less in WA and BC after last update due to corrections in dec & jan and now that oregon is being cleared it will only get better
I have been pressuring staff about this for months now and have not received any satisfying answer. Itâs really frustrating that the staff are not engaging with the community when we keep requesting this rather simple modification that will really help with identifier burn out, but I digress, Iâve already made all of these points in the thread linked above for them to seeming fall on deaf ears from the staff.
Devastating :/ over the last few years this website feels like a hollow echo chamber, futile sisyphean efforts to try to get staff to work with and listen to their users.
Even from a business perspective it seems counterintuative to go against the only people you have as a demographic/consumer of the product.
How much does that change the geomodel and the âexpected nearbyâ label? Iâve been somewhat befuddled by what seems to be a recent change where the most common teatree in the Grampians is suddenly no longer âexpected nearbyâ despite around 200 observations, whereas another species I looked at is âexpected nearbyâ in New Zealand despite being only found in Australia - just to give two examples.
I wonât dive into it too deeply because I know it has been discussed elsewhere in greater detail and I do not want to de-rail my initial question. But I will state that my belief is very much not disingenuous and I know I am not alone in this belief.
Itâs pretty obvious that iNaturalist is trying to push for the ânext big waveâ with direct emphasis on things like Seek or their partnership with Google AI.
Users have been pretty clear that it not what they want yet they keep persisting; made even more evident with Kuedaâs statement he released not long ago.
I am not trying to say the staff at the iNat HQ are bad people with ill intentions I just believe their priorities are not where they should be.
I take it youâre talking about Leptospermum scoparium? The current geomodel does exclude despite it being common there: https://www.inaturalist.org/geo_model/54699/explain. Unless there was a recent taxonomic swap or something Iâm afraid thatâs probably the unrelated issue of the current set up of the geomodel really struggling in areas with dramatic changes in elevation. This is described in the second half of this blog post:
By default the iNaturalist computer vision only suggests taxa that it expects to occur at the location of the observation. It does this with an algorithm (the âgeomodelâ) which tries to infer a taxonâs range from the location of existing observations. So when the geomodel is inaccurate, it will start suggesting taxa which donât make sense geographically, or refuse to suggest ones that do, which often leads to incorrect initial IDs on obs.
No, itâs not possible to ask for a taxon to be removed from the model. The model gets trained every month, it takes about a month to train. So the current model is trained on data from well over a month ago. It can take time for the hard work people put in to show up in the model, unfortunately, but it does happen. Iâd recommend reaching out t people for help in identifying these observations.
Iâm not sure exactly what youâre referring to, but if itâs about the CV model providing too many suggestions at species level, work on improving that was announced in our Product Goals for JanuaryâJune 2026 blog post (emphasis is in the original text):
Reduce computer vision errors
Identifiers spend a significant amount of time correcting incorrect species-level IDs driven by overly confident computer vision (CV) suggestions.
Weâre going to improve our computer vision-power suggestion system to reduce how often people select incorrect species-level suggestions. Weâll do that by adjusting the modelâs precisionâaccuracy tradeoffs and how suggestions are presented across the platform. This is a seemingly small change that we expect to have outsize impact. Weâre still working out the implementation details, but we expect this to substantially reduce erroneous species-level IDs from our computer vision system.
My understanding (as a non-engineer) is that doing this is not a simple fix, unfortunately.
I understand the time delay. But say you add checkboxes to the taxon settings page that says âomit taxon from CVâ, âomit descents from CVâ, and âforce hybrid taxa inclusion in CV (hybrids only). Curators can set this in response to flags. Surely that is within the realm of our technology as it is only 3 boolean database values and updating your queries when you export for the CV training
Then when you get data for training every month you check if those value is set, then omit or include the taxon images into the model training. I donât think anybody is concerned about the timeframe, one month delay is far better than reidentifying every earthworm and tricolor hibiscus in perpetuity.
Identifiers know exactly which taxa should or shouldnât be included and are eager to have the ability to improve the computer vision suggestions. Please let us. This feature request is a meaningful way which will accomplish that exact goal.
I think we all recognize that the technology underlying the CV is certainly complicated and not trivial to optimize. Thatâs why myself and others have been suggesting the simple alternative of omitting taxa which the CV struggles on so they get forced to a higher taxonomy which is correct.
Yes, thatâs the species - and there was no taxon swap or any change that Iâm aware of. So Iâm kind of puzzled that until very recently (I think the latest release of the CV/geomodel) the geomodel seemed to work fine but suddenly itâs ânot thereâ. I guess Iâll just hope it changes back at some point - but it leaves me uncertain how much âfixing the dataâ will help, as Iâve always heard/assumed/found it does/did.
It may be simple to exclude certain taxa, but which ones should be omitted is not necessarily a simple question. Even if people can agree on which taxa should be omitted (this may vary quite a bit regionally), there are likely to be other undesired effects.
For example, at present the CV is only trained on leaf taxa â i.e., if a more specific taxon is included in the CV, it will not be trained on images of parent taxa. What if the taxon to be omitted is part of a species complex for which there are also identifiable species in the same genus? By removing the species from the CV, you may also create a situation in which the CV is no longer able to even correctly recognize the genus for species in that complex, because it now only knows a different and visually very different species in the genus. This is a situation we already have for many genera; removing specific species from the CV will likely not solve it.
Requiring the CV to only suggest higher-level taxa in some cases while still training it on lower taxon might be a more effective approach, but I still think that a mechanism that relies on people manually flagging taxa is likely to create a ton of extra work and be subject to a lot of disagreement, because âtoo many wrong suggestionsâ is subjective and people have very different views about what observers can be trusted to select correctly. (For example, I ID a lot of bees, which people tend to find unintuitive. I find that even when the CV provides correct suggestions â sometimes even the top suggestion â people quite frequently still select something different. Does this mean that the CV should stop making suggestions more specific than âAnthophilaâ? Would this really benefit either observers or IDers?)
I will say that the idea of a geomodel sounds not too appealling for certain taxa that occur within certain habitats.
Ex: Huperzia miyoshiana occurs in Alaska/British Columbia and disjuncts to Newfoundland, but not within the space in between. I would be afraid that the geomodel would try to fill the gaps between these two disjunctions.
Or for instance the subalpine/boreal taxa that occur in sw. North Carolina as well as northern New England (ex: Paronychia argyrocoma). There are a gap of at least 7 states between these two ranges.
Overall itâs not bad (especially for animals which have a tendency to move, something that plants donât typically do in a timely manner), I actually quite like it in many ways but I just hold some reservations.
My experience from dealing with a similar problem species is that itâs only the website CV (and maybe Android?) that gets updated with the new model monthly. I just checked the ID feed for that species and CV IDs were still coming from Seek and Next until December 2025 and the species was officially removed from the CV in February, 10 months earlier. At this point barely any new IDs of the species are coming in (although weâll see what happens when active season hits again) and Iâve finally resolved the flag on it, over 8 years after flagging it.
For taxa where the popularly understood species concept matches onto an abundant and easily observable phenotype, itâs going to take multiple ID blitzes over the course of multiple months to resolve it, along with communicating very clearly to literally everyone whoâs identified it in the past. A single naive identifier can easily get the ball rolling to restart the CV feedback cycle unless most of the relevant identifier community knows to correct their IDs.
The onboard model for both Seek and iNaturalist are not updated every month, theyâre both currently on 2.20. However, with iNaturalist the app will use the current online model after you take a photo, or when you import photos, if the device is connected to the internet.
Iâm not a computer programmer, but I suspect if the fix were really this simple, it would have been done by now. The fact that this hasnât been done suggests to me that the situation is far more complicated than we understand, rather than that the staff in charge lack the will to make such a change. One of the situations that I know for sure is more complicated is:
I canât say I agree with this. Identifiers active in a particular geographic region often have very strong opinions about wanting to omit taxa that get commonly mis-identified in their region, but often a species thatâs difficult to identify in one part of its range is easily identified elsewhere due to lack of similar species in that region. Or the species is easily identifiable from a photo of a particular structure, which is rarely adequately photographed. Rarely is the situation so simple as âspecies A is never identifiable from a photo anywhereâ yet species A is in the CV system. And if that is the case, itâs only in the CV system because identifiers put it there by IDing a bunch of stuff with that name so that the species got into the CV training. If it shouldnât be there, re-identify all the observations to a higher level, and itâll be removed from the CV next time itâs trained due to lack of training data. Iâve seen this done with moths before. If itâs a situation where the species is identifiable easily from a certain type of photo or in part of its range though, then itâs best to leave it in the CV. I wouldnât want someone from three countries away âunckeckingâ a species from the CV outputs due to ID challenges in their region, when the ID situation is far more straightforward where I live.
The CV is a tool that takes in the data we give it and draws conclusions based on what itâs fed. If we feed it bad info, it spits out bad info. I personally donât think that the solution is to alter the CVâs output for every problem area that gets on every identifierâs nerves. Thatâs just putting a bandage on the wound. The CV will never learn to do better; itâll just keep being wrong and weâll add a bunch of post hoc algorithms to make what the end user sees more accurate. To heal the problem, correct the incorrect IDs, and the CV will learn from what youâve done. In some cases, a taxon swap can help too- if there are 100,000 observations labeled as species A, but modern taxonomy recognizes that this is in fact a species complex, then a taxon swap can update all the species IDs to complex IDs at once.
Simply put, I donât want frustrated identifiers to have the power to nix things from the CV suggestion list just because they donât want the CV suggesting it anymore. I know the frustration- Iâve gone through and re-identified 50,000+ moths in a single genus while the CV spit out nonsense IDs, and sometimes wished I could just âturn it offâ. But that would go against the collaborative nature of iNat. The CV is meant to be a passive image-matcher, not something that ten thousand identifiers try to micromanage. As soon as I get to unilaterally say âthe CV will never suggest Symmerista albifrons again because itâs usually wrongâ, the flood gates are open, and the CVâs entire nature has changed. When you use the CV, youâd no longer be getting an actual computer vision ID based on training on other observations; youâd be getting a curated version of that ID with the actual CV results hidden by other users who donât trust you with them.
I can see the posts that would ensue if we adopted a practice like this already: âSpecies X is easy to identify where I live but itâs no longer CV-suggested because someone in another country marked it as unidentifiable!â âI keep marking this species as unidentifiable but someone else with a different opinion keeps switching it back, and now thereâs no consistency in where the CV has put these observations!â âHow bad does the CV need to be at IDing a species before I turn it off?â âShould we be preemptively marking species as unidentifiable before the CV trains on them if weâre pretty sure it will be bad at IDing them?â âI have 20,000 genus-level obs that will now take 2 additional IDs to reach RG because someone turned off the CV on this species when they shouldnât have!â
If the situation were as simple as âeveryone agrees that these 137 species shouldnât be in the CV trainingâ, then Iâd agree with taking steps to nix them. But itâs never that simple, and having a checkbox for every species to allow or disallow it to be CV-suggested based on what a self-declared expert has to say about it this week is not in line with how I thought iNat was supposed to function.
tl;dr Thereâs nothing simple about 500,000 checkboxes being toggled on and off by 500+ curators based on 100 different visions of how precisely to interpret the phrase âtaxa which the CV struggles onâ.