Is CV re-trained from scratch with each new set of images?

gcwarbler · March 16, 2026, 6:52pm

I’ve been unclear on this particular detail: Is the new random set of (up to) 1,000 images for each CV training event (a) the only input to CV training? or (b) is there residual knowledge within CV from previous training sets? or (c) are the new images added into some/all previous image training sets? I suspect that scenario (c) is incorrect after the max limit is reached; that would create a burgeoning computational requirement. Scenario (b) could be advantageous because CV would be building on previous learning, but it could also be problematic if there are numbers of misidentified images included from previous training efforts. That happens.

Basically, for those maxxed-out taxa, I’m curious if CV is having to relearn that taxon anew every time (irrespective of previous sample sets). If so, that morphs into more of a test of the skills of the contributing photographers and the knowledge base of the identifiers, rather than a test of CV’s learning capabilities, per se.

Does this perspective seem fair?

lappelbaum · March 16, 2026, 7:08pm

I don’t know all the details of how it works, but I can tell you that after correcting a large number of old observations of Swamp Rabbit that were RG misIDed as Eastern Cottontail, the CV is now really good at telling them apart.

einsum · March 16, 2026, 8:18pm

What I believe is the new CV is trained on top of last model weights with minor changes.

tldr: it could be mix of all a-b-c strategies you thought already with other tweaks.

The data handling should be done by iNat for each update (I am not sure if code is on github) and that should change each species observation sample to reflect the changing community IDs. If such observations dataset falls short of minimum threshold it may add new observation samples if there are, but capping to the max 1000 needed set for each species. (not sure if they increase this number of sampled photos set as much as possible until 1000 by autosampling for each update)

and then taking the (pretrained) last model version is like using the residual knowledge as you were thinking and then it trains to correct that knowledge to reflect this new dataset sample (of any corrections and new species). The math and computational effort works out in such a way that the new version only has to incrementally learn to adapt to these changes by rewiring certain numbers (weights of connections), it wont be strictly linear effort but its simpler and faster than starting from scratch.

Now the caveat is if there is a fundamental change in model architecture (like what happened with geomodel last year in consequent versions), then the numbers kind of lose their meanings and even if its possible to use that residual knowledge by working around the edges technically its going to get suboptimal sometimes, so the only recourse then is retraining full dataset in those cases but I think its rare in current CV updates; but again idk how frequent the underlying architecture changes in iNat models (it isnt public on github)

noah_vale · March 16, 2026, 8:24pm

I noticed that for Stereum ostrea and Stereum versicolor in Tasmania. S. versicolor is not found in Australasia despite the large volume of missidentifications. After fixing roughly 2/3rds the CV stopped suggesting it as the default species and new observations were usually genus level or S. ostrea

optilete · March 16, 2026, 10:28pm

Maybe the geomodel is more relevant in this case then the photo’s?

lappelbaum · March 17, 2026, 2:22am

Agree. With cryptic species splits (with different geographic ranges), I think it’s more a matter of what shows up as “seen nearby” rather than the CV being able to tell them apart.

pfau_tarleton · March 17, 2026, 11:04pm

One of their posts states “This training run is starting with the last checkpoint from the previous training run, rather than starting from the standard ImageNet weights like we did for the previous training run. Basically, this training run gets a head start in understanding what kinds of visual features are important for making iNaturalist suggestions.“

My interpretation is that all previous images aren’t included with each training event, but some kind of prior knowledge is carried over. So I’ hypothesize scenario b!

Topic		Replies	Views
Question about the AI Training Method General	11	532	November 13, 2023
How influential are incorrect Research Grade observations for CV learning? General question	29	1612	July 2, 2022
Use of trail camera images by CV? General question	8	707	July 8, 2023
Computer Vision Update - July 2021 General	21	3216	September 13, 2021
Allow some non-leaf taxa to be added to the CV model Feature Requests	23	762	March 4, 2026

Is CV re-trained from scratch with each new set of images?

Related topics