Hello, I have been using iNaturalist for several (~4) years, mostly for birds. During these years, I observed a constant gradual improvement in iNaturalist AI suggestions, with the recent year (2025) being amazing. iNaturalist was spot on 99.5% time and more!
However, in the last month, I observed that iNaturalist AI started to miss more, especially with bad and not so good photos, which now seems to work poorly. Previously, even with bad photos, iNaturalist managed to keep good suggestion accuracy, but now it seems noticeably degraded.
As an IT guy, I sense something fundamental has changed in the algorithm, but it also seems the new algorithm needs polishing and tuning, because it seems worse than the previous one.
Seems the new algorithm concentrates more on the color patterns in the bird, and less on the shape of the bird, which seems to be a wrong decision, at least to me, as color patterns heavily depend on the light conditions, while shape can tell the bird ID even from poor photos.
Can you please provide specific examples? Note that the current model also probably hasn’t caught up with the current bird taxonomy, which changed in the last month or two. I suspect that’s the culprit here, but without examples it’s not really possible to determine.
Birds Of the World (BOW) webinar: What’s new in avian taxonomy (2025 Nov 20)
https://youtu.be/D5tku8RXYHA?
https://birdsoftheworld.org/bow/news/whats-new-in-avian-taxonomy-2025
Disappointed to see that the new Clements taxonomy doesn’t match AviList. AviList was supposed to be the One List to unite them all. Instead we now have 5 world bird taxonomies instead of 4: AviList, Clements, IOC, BirdLife International, and Howard & Moore. Fun times!
Somehow, XKCD has a cartoon for every situation ![]()
I have seen this in other taxa like Passiflora vesicaria vs edulis (ovary is clear diagnostic character for IDers) that regressed few weeks back and is still fluctuating.
I dont think there is change like this - atleast none in announcements or subset public code but there is change in geomodel (see pt 2 below), maybe we are only noticing correlations and what the model learned from data but not algorithmic change. another reason if it can happen suddenly is if the training samplers are more aggressively tuned on quality of observations (but i believe inat is not avoiding subpar quality images or silhouettes observations all the time in training?)
all modern CV models can encode contours, geometry, edges well as emergent internal representations learned from data itself. if there is observed weightage in color in particular predictions its not from algorithm perse but from input data and lookalikes.
with june 2.22 model and introduction on new geomodel technique the birds were having 77-86% accuracy on top 1 prediction (3%ish drop in geomodel perf from SINR) but the latest november 2.26 has 78-90%(granted these numbers are just for inat samples and your experience was different from region/particular taxa grp/… but still it isnt suggesting regression or downward trend on global pattern)
my thoughts are
- taxonomy changes takes time to reflect in model and intermittent transient IDs confuse the established stable training regimen - say when long-stable species X suddenly got new species Y ids from splits as in above bird taxonomy change, it will just as be confusing to model which is purely looking at data with no idea of semantics, and until majority IDs on inat are not reflecting this (where training sample is subset of them) this taxonomy change in your input data will end up actually conflicting, just as without a field guide/what-changed-in-taxonomy knowledge it will be similar for humans who also keep guessing what character is stable in groups of sets of photos: can this be the reason? you just have to random sample sets of birds observations whose taxonomy changed and whose didnt and see how well current CV does across these partitions (sadly you cant test previous model scores to compare but maybe inat can, if you can provide some set). if its this case, waiting for few more days will solve eventually or fixing collision IDs faster for those.
- the geomodel, whose score which is also now integrated into final prediction, does have architectural changes and is still catching up to previous algorithm performance in raw score, and so is the better way to handle observations’ altitude information is still tweaked upon, so both of these can have unintended sideffects now compared to earlier models better predictions you observed in some observations.
- people routinely correct older IDs even without taxonomy change, and those changes can also have similar effects - say if species X which was wrongly misIDed at some area forever until there is recent community corrective IDs, the changes may take time - for example as of dec 30 the last CV model update uses oct 19 data snapshot, and so can also underperform during this time if these changes ends up causing subpar training dataset for true species X/Y upon this corrections (say the correction observations photos have more information and angles for model to learn on earlier but these observations are now excluded from sampling now because those observations are in collision stages still and until more IDs happen these doesnt contribute to X/Y training for now)
I am not fully sure of exact training data pipeline or rules for data sampler - i assume some things of these are done already internally? but just my thoughts: it could be advantageous in also doing quality-stratified evals (instead of single global eval scores as published on blog post) - for low resolution/low light/… how is the model doing on these - and can particularly catch your example case of silhoutte or observation quality based bad predictions; and when there is considerable change in some taxa group IDs in region suddenly, then maybe sampler can weigh towards selecting observations with recent IDs in its dataset and downweigh presplit observations as some of which may not have seen ID changes yet in that region but its easy to let time handle that?
I would really love to see the raw heatmaps evaluations for images more explicitly, even if not on every observation everytime but atleast for those evaluation data summarised as plot during CV version updates as that will surely show if something is informationally advantageous/suddenly biased in some taxa grp in new model than aggregate numbers but i guess its also IP and internal?
From the eBird website: “With this [October 2025 update], the eBird taxonomy and the AviList taxonomy are in nearly full alignment; we expect complete alignment by October 2026.”
The old lists will get outdated and my understanding is that several of the people who maintained the competing lists have moved to the AviList team, so after a few years of taxonomy changes it will be difficult to keep using them.
Thanks for the extensive reply. Maybe it is a really geo-related change.
For example, I have observed the correct identification of Circaetus gallicus on a very bad old photo a couple of months ago (https://www.inaturalist.org/observations/329356461), but now it suggests the wrong species on the very same photo.
oh i think that example is just bad luck in training samples.
the new vision model also surfaces the main training image that is responsible for top prediction as thumbnail
and this is the observation that biased those scores order. well i guess the training sample didnt have representative photo of that pose from your actual ID gallicus.
if it was top 1 prediction few months back, a possible representative image back then may not be in current training sample? or both didnt have that pose in sample and it used another character to predict.
Nope, the thumbnail displayed with the suggestions is taken from the taxon photos. These are not the same as the images used in training.
I wonder if the problem is a perverse result from the greater number of organisms now covered by the CV. (You’ve said the problem is mostly with poor quality photos.) I thought I saw such an issue with some plants. At first, the CV has only abundant species A from a group. It ID’d that correctly even if photo quality was low; it was the only choice and since A is abundant it’s usually the correct answer. Now, it has 12 species in the group. The species can be distinguished well in good light, if certain details are shown, but now the CV has too many subtle distinctions to make and it often can’t do that correctly. (Using geography can help, but the geography part has seemed a bit . . . clunky.)
oh thanks now i checked the blog, that seems indeed to be second step for thumbnails after the prediction is made first. but its still only “need not be” and varies per taxa - no exclusion rule exists that training photos cant be taxon photos but i agree that that exact image thumbnail may not be the reason as i implied, although i still think whether similar pose dominance in input samples exist or not for these taxa is still relevant for that prediction.
guess then only staff has to dig on why the model itself is predicting differently now for this observation across model versions if the prediction order changed only recently as per observer.
I’m really glad you posted this! I’ve noticed the same thing with regards to Ramphastidae, and the regression in ID accuracy has nothing at all to do with the taxonomy changes AI is giving.
To explain what I’ve been encountering, here’s what I wrote to iNat Help about this a couple of weeks back (and the response I got blamed it on the taxonomy change, which indicated that my explanation may not have been effective, but hopefully this thread will drive our collective message home):
I do a lot of Ramphastidae identifications and I’ve noticed that since the avian taxonomy changes about two months ago, a lot of observers have been making the same erroneous suggestion for a particular species observation. I assume they are doing this because of the iNat AI suggestion they are being given.
The species should be Pteroglossus torquatus, but many observers have recently been suggesting Pteroglossus erythropygius.
The taxonomic changes included the elevation of P. erythropygius to species level from being a subspecies of P. torquatus. But the places where I’m seeing the erroneous IDs (mainly Costa Rica and Panama) are not places where the previous P. t. erythropygius subspecies was erroneously offered as an ID (the subspecies range is much further south) in the past. There’s no question that there has been a big change in the mistaken IDs from observers in the past month or so, so I’m wondering if there’s a geomodel component of the algorithm that is incorrect.
Here’s one example of one of those observations: https://www.inaturalist.org/observations/332319380
When a species is split on iNat, all outputs of the split essentially inherit the pre-split species’s computer vision model and geomodel predictions until the next update to the cv/geomodel (because the contemporary cv model and geomodel were trained on observations of the pre-split species collectively, so they can’t discriminate between the post-split species):
The training dataset for the current cv model and geomodel was exported a bit before the 2025 Clements checklist release and associated changes to bird taxonomy on iNat, including the split of broad Pteroglossus torquatus into a narrower P. torquatus and P. erythropygius, and thus both post-split araçari species are mapped to the same taxon in the cv/geomodel (pre-split P. torquatus). The next model update should take into account this and the other recent bird species splits, giving them all their own separate cv and geomodel predictions (if they pass the threshold for inclusion, which both of these araçaris do).
curious if the new model released today has helped https://www.inaturalist.org/blog/122720
its using data exported on November 16, 2025, but I’m not sure when most of the the 2025 Clements updates were made https://www.birds.cornell.edu/clementschecklist/


