How does the CV adjust when there are changes in taxonomy?

The recent Clements checklist has resulted in some changes in bird taxonomy on iNat, and that made me wonder how the CV responds to such changes. For example, does it automatically adjust by ignoring records of a taxon that was previously included in another but is now separated?

As an example, New Zealand robins were previously all included in one species with three subspecies: Petroica a. australis, P. a. longipes, and P. a. rakiura. (I may be wrong, but I assume records of all three, being of the same species, would have been used to inform the CV when it suggested species P. australis.)

In the latest Clements, P. a. longipes has been raised to species level and is now P. longipes. Will the CV now include or ignore previous/existing records of P. australis when suggesting P. longipes? Will the CV now include or ignore previous records of P. (australis) longipes when suggesting P. australis?

I appreciate that subspecific differences are very often subtle and may normally be beyond the ability (or design) of the CV, but I was curious to know whether the CV is actually designed to adapt to changes like this. It could be a more significant issue when changes are at higher taxonomic levels.

2 Likes

As far as I know, the vision model has never known about subspecies, and the more general automated suggestions also don’t recommend subspecies. The vision model is hard coded with taxon IDs, so when it suggests a taxon ID of a taxon that is inactive, we try to map that to an active taxon and replace the active equivalent with the inactive one in the results. This is only possible for taxa involved in taxon swaps and taxon merges, so the vision model (and the automated suggestions) will probably get tripped up by splits until we train a new model. Nearby observations may compensate for that a bit.

How often does is the model retrained? I suspect it would take a really long time, with the amount of training images (hours? days?)

We’re currently for shooting for twice a year, but 2019 had a lot of interruptions in the form of experimenting with and training a model that could fit on a phone for Seek. Given the current way we do it, it can take months from start to finish, depending on how accurate we need / want it to be, how much QA / QC we do, problems we encounter during the process, etc. We’re shooting for at least twice a year, which we’re hoping is reasonable and gives us enough spare time to experiment with different stuff on the same hardware. It’s possible we’ll figure out a way to update the model without fully re-training it, which would allow us to add new taxa at a higher frequency. That’s one of those experiments we haven’t had time to do yet.

1 Like

Wow, didn’t realise it was that long. I guess that explains why species cleanups don’t flow through to CV predictions very quickly.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.