Description of need:
iNaturalist’s computer vision is getting better and better, but at different rates for different taxa in different areas. It would be excellent if researchers and community naturalists wishing to assess the reliability of CV-generated IDs can discern those taxa for which the CV works great vs. those for which it often arrives at an inaccurate or imprecise ID. Ultimately if iNaturalist could provide some sort of rating in the CV-guess suggestions to the extent of “we’re 82% sure it’s this species and 99% sure it’s in this genus” based on comparison of CV-guesses versus community-validated IDs, I think everyone would benefit.
Feature request details:
At the most basic level for now, I would like to request that the CV guess of the taxon ID at the moment of uploading be recorded on the back end (and preferably available as an export field). Simply documenting this would allow one to figure out when the CV guess consistently agrees with the ultimate community-consensus ID, versus cases where the CV guess is consistently wrong. If this simple data logging step could be implemented, it opens the door for all sorts of future applications that can improve the CV capabilities and human interaction with CV. For example, experienced naturalists can go and preferentially get very good images of taxa for which the CV currently fails, so as to bolster the training data for the CV to get better for that particular taxon in the particular location.
I’d like to see some kind of analysis and breakdown of the two separate components that lead to inaccuracy. One component is the intrinsic accuracy of the CV model for a particular group and the other is some assessment of accuracy of the data being used for training. In my area, mycology, there are vast numbers of imprecise/wrong RG observations. For many fungal taxa it is difficult and often impossible to give species-level identifications from macro-photos, even to generic level. That situation has become much more prevalent since we moved to phylogenetic-based species concepts over 20 years ago. The phylogenetic approach continues to reveal very many cryptic, regional species, most undescribed, but nevertheless obvious in the data. However, many observers stick to dated morphological concepts that can no longer be supported, and they are backed up by a significant community of identifiers who either aren’t aware of the issue or ignore it. People naturally like these ‘pragmatic’ identifications. These records are then used to train the model, which leads to further RG observations. It is hardly surprising that many iNat suggestions seem poor to those of us who are aware of the issues. Garbage in – garbage out.
These are all good points! I think there’s a lot that can be learned here if iNaturalist logs (1) what the CV suggestion is at the time of upload, (2) the CV score, and (3) the model(s) used to produce the suggestion and score. If these fields are exportable (not reliant on API), it would be even better.
@pisum do you know if these scores are currently being systematically logged and kept as part of the record’s permanent data?
I’m almost certain they are not logged. The workflow is that the client (e.g. browser) sends the image to iNat’s CV engine and it returns a ranked list of suggestions (optionally within a particular high-level taxon). Each of the suggestions has an an associated confidence which is only visible in the UI by browser users who choose to install that Chrome extension. When you select an ID (or dismiss the suggestions) those confidence estimates are gone.
To be clear, the confidence estimates are a measure of (relatively) how likely iNat’s current CV model thinks the first image in the observation is to match a particular taxon in the CV model. It is not a measure of how well CV compares to community identifications. The CV model is trained on iNat data, which includes IDs with all levels of confidence from people blindly accepting previous CV suggestions to others applying a great deal of knowledge and experience. So the confidence estimate tells you how much the current CV model thinks this image matches a particular taxon based on the model’s knowledge of that mix of earlier identifications.
i think the computre should assign a taxon ID with very low weighting, like 0.01% weighting compared to a human ID, or that can’t count towards research grade. or even a separate field entirely. So one could peruse the map and sort by computer IDs as another field. Might not be doable, especially if you wanted it to be run each time the algorithm were updated, but would be fascinating.
i’m fairly certain the suggestions are not being recorded by the system. however, there has been at least one effort by a community member to capture the top suggestion (without scores) for not-yet-identified observations: https://forum.inaturalist.org/t/unknown-family-projects/38693. (it’s not an approach i personally recommend, but some folks appreciate it and i don’t think the iNat staff have officially discouraged it either.) and it is possible to adapt that general approach to capture the full set of scores for each observation in a given set of observations, if you’re really interested (although, again, it’s not something that i would necessarily recommend doing, and it wouldn’t capture the exact situation at the time of a particular CV-assisted ID).
it’s also worth noting that computer vision suggestions will vary not just based on the version of the model, but also on:
the exact image being evaluated (which could vary slightly between an unresized image to be uploaded and a resized image that has been uploaded, or could have changed because the images or the order of the images in the observations has changed)
(if the “nearby” option is invoked) the location of the observation (or lack of a location) at the time the CV is pinged, along with the existing observations in the iNat database at the time
(if the “nearby” option is invoked) the observation datetime (or lack of an observation datetime) at the time the CV is pinged, along with the existing observations in the iNat database at the time
the “iconic taxon” of the observation, based on the identifications that have been made at t he time the CV is pinged
so you can see it’s hard, if not impossible, to really track all the variables that would affect the suggestions that come out of the CV.
Hmm, that’s too bad. Although, even despite all the variables, I think it’s STILL helpful to record the CV suggestions, since it would be REALLY nice to know how often, and for what taxa, CV suggestions are leading people astray, so we can address those problem areas more effectively. All the variables you mentioned will definitely contribute noise and make the data harder to parse, but even noisy data are better than no data!
From here, what’s the procedure to have this request looked at by an iNaturalist admin? Or do we just keep the conversation alive until someone pops in and gives us an answer of “yes we will / no we will not implement”?
I’m not an admin, but, in general, some requests are easy and have a clear benefit and are made quickly. Some are large/challenging and may be declined after discussion by staff as impossible/not worth it.
Many are challenging but staff think they would be interesting or beneficial, but would take a lot of resources to implement. Those requests tend to stay open for discussion, sometimes for years, and some are eventually acted on. But if you take a look through the feature requests section, there are many great ideas that are still open.
So, be hopeful, but don’t hold one’s breath, is probably good advice.