This is a follow-up to some older topics such as https://forum.inaturalist.org/t/include-captive-cultivated-species-in-id-algorithm/711 and https://forum.inaturalist.org/t/casual-observations-in-the-cv-training-set/35558
I’m curious what the current status is of which Casual obs do and don’t get used in CV training. I’m mainly asking because I’ve noticed that certain taxa that the CV suggests incorrectly have many Casual obs (missing date, missing location, marked as captive, etc.) which are misidentified and which go “under the radar” when identifiers are working on a taxon. I recently got through all the Needs ID backlog for the genus Acrolophus, for example, and then I checked the Casual obs out of curiosity… there were years worth of misidentifications that have been ignored by identifiers because the obs were ineligible for RG.
So my question is, which types of Casual obs are currently used in the CV training: “captive”, “missing date”, “missing location”, “no evidence of organism”, “evidence doesn’t pertain to a single organism”, all, or none of these? Are captive obs with only a single ID treated differently for training than ones with an agreed-upon CID? And if some of these types are currently included, is this more harmful or beneficial to the CV as an overall practice? I can see the advantage of including these images in the training if they have an agreed-upon ID, but since they’re largely ignored by identifiers I’m sure “Casuals” have a higher-than average percentage of misidentifications.
And from a personal practical standpoint, I’d mainly like to know which observations are worth my time to work through as an identifier if I’m looking to combat misidentifications making it into the CV training. If a certain category of Casual obs aren’t even impacting the CV, my time would be better-spent adding IDs somewhere else.
That explains which taxa are included in the CV, but I’m still not finding the details of which obs are used in the CV training. The link states “Observations do not need to be Research Grade in order to be used in training, but observations with a matching Community ID will be prioritized.” I’m looking for elaboration on what that actually means. Specifically, which of the following Casual observation types are/aren’t fed into the training and which are/aren’t “prioritized”?:
-Captive/Cultivated with 2 or more agreements on CID
-Captive/Cultivated with only 1 ID suggestion
-No date given - with or without agreement on CID
-No location given - with or without agreement on CID
-Marked as “location incorrect” - with or without agreement on CID
-Marked as “date incorrect” - with or without agreement on CID
-Marked as “no evidence of organism”
-Marked as “not evidence of a single subject”
-User has opted out of CID and enough users have disagreed with their ID to send the ob to Casual
I don’t want to waste my time IDing casual observations that will never make their way into the CV modeling due to inactive observers, and I can’t find any guidance on which of these Casual types, if any, are still included in the training. There are thousands or more observations in each of these categories in the taxa that I primarily ID, and if some are more likely to improve the CV by having their IDs corrected, that’s where I’d prefer to spend my time as an identifier.
i could be wrong, but i think you might be overthinking this. especially among moths, i’m thinking that by the time a taxon is eligible for inclusion in the computer vision training, it’s going to have far more verifiable observations than casual. so i don’t see how correcting or not correcting a handful of casual observations is going to make much of a difference either way.
in general, the only time i would expect casual observations to potentially be a significant part of a given training set is if you’re dealing with animals or plants that are primarily encountered in captivity.