I went through Chamaesaracha corrections today - Edwards Plateau Five-Eyes misidentified as Hairy Five-Eyes in Central Texas. Where some old observations have RG with multiple agreeing wrong IDs, my “correction” is labeled Maverick. You can see the map has many red dots left in Central Texas, so I’m sure the wrong species will continue being suggested. Of course, observers could select genus, but that’s been discussed at length.
All that to say that this request could be a chance to avoid some future headaches for identifiers.
This got me thinking that this could potentially be automated given a threshold related to the Similar Species tab. It might also have to be able to track when observation initially identified as that species have been pushed back to genus as opposed to reidentified as another species though, not sure how feasible that is.
Enoplognatha ovata doesn’t have very many observations listed on its Similar Tab (the number of observations on the first row of “similar species” add up to 1% of the number of observations of the species), but Penstemon strictus has quite a few (7%). P. strictus also has a lot more “similar species” than E. ovata so if I added them all up rather than just the first row, the percentage difference would be even larger.
I kind of like this idea. Though I’m not sure how effective it would be, it would be cool to see if some data could be gathered on some common misidentified species to see if it helps!
One note is that the exclamation point is already used as a symbol on the site to denote invasive/non-native species (though there was a whole other thread about that). So using it in this context might lead to confusion. I’m sure other symbols could be devised though.
if the problem is that observers too often rely on computer vision suggestions without considering potential alternatives, i think the simplest solution is to train users that computer vision suggestions are not meant to be comprehensive. there could be other possible taxa that are not included in the vision’s training set. folks should identify to a rank that they’re sure about, not try to identify to a low rank just because that’s what vision or another identifier suggested.
in order to accomplish what you’re suggesting, you’ll have to create a whole technical framework to input and display the warnings, develop some sort of criteria to define when a taxon should receive a warning and train curators to apply those criteria uniformly, translate the warning reasonings, etc. you’d have to set these all up in one massive effort initially (since such an effort would require a lot of local and taxon-specific expertise), and then you’d have to reassess with any taxon change or vision model release.
since it’s unlikely you could realistically flag all the taxa needed, folks might interpret the lack of a warning flag as an indication that it’s okay to take a vision suggestion as is, which would undermine the point of the flags in the first place.
I think you are making it more complicated than it needs to be.
We already flag the problem taxa. We just hide the flags behind the scenes (or add to CV clean-up wiki). I’m just saying to make these flags more public.
This is not something which has ever crossed my mind on iRecord, where they have a version of this idea built-in to the system. Even if it did, I don’t see how that would be worse than it is at present, where a large portion of users don’t pay any hindrance to CV accuracy whatsoever.
If iRecord can do it, I don’t see why it would be technically insurmountable to have something similar on iNaturalist. And from a coding POV I don’t really get why this should be so complex tbh. The closest thing I can think of I’ve done was a software development class learning about model-view-controller and similar hierarchies - under those sort of standard frameworks at least, this would seem fairly straightforward to me - wouldn’t it just latch on to the existing structure ?
I think this would happen organically, as it’s crowd-sourced. But if there were extra care taken beyond that which is necessary for a period, it wouldn’t be the end of the world. Better to encourage the users to err on side of caution than the opposite, as we have at present on the whole.
I am not sure about other taxa, but when I’ve looked at the problem taxa on the CV clean-up wiki for Diptera, I don’t see much up for debate. These are known issues most identifiers would agree need help to mitigate. If a flag were misapplied, and people were warned unnecessarily until the flag was later revisited, I don’t think that would be the end of the world. Again, better to err on side of caution.
A generic warning flag just to say to take extra care on this taxa would just be a nudge also - it wouldn’t require policing so extremely imo. Going to the detail on the flag would enable the user to decide if its relevant to them.
Visually-speaking, I can’t imagine a much simpler solution to training users that CV suggestions aren’t always to be relied on. What simple solution to training users did you have in mind?
in general, i think the simplest approach to training for any given feature is to have a simple tutorial page with a switch at the bottom that allows the user to turn on the feature. (the tutorial / switch could even be incorporated into the settings page.) this could ensure everyone gets the same training and can be easily translated.
in terms of what to include in a hypothetical vision tutorial, i think a very simple how-to-use guide + usage notes would be enough. by usage notes, i mean something similar to what i wrote above:
flagging taxa that could be mixed up by vision is a complicated topic.
just for example, a lot of moss are not identifiable without good photos of certain structures. but given the right photos, some species are easily identifiable. so do you flag all moss species because most people don’t take good enough photos for species identification? or do you flag only the ones that can’t be identified even with a good photo of the proper structures? or, unintuitively, do you flag the ones that are easily identifiable because vision will have only these in its training set and will incorrectly offer these up as suggestions to most moss species, since most moss species are not in its training set?
another example: in my area, 99% of wild pine trees are Pinus taeda. so chances are good that a computer vision suggestion for a pine tree in my area with “nearby” functionality enabled will give a good species suggestion. but go 100 miles to the east, and all of a sudden that distant photo of a pine tree is harder to identify reasonably to species. so do you flag Pinus taeda or not?
finally, in my area, Odontotaenius disjunctus is fairly unmistakable as an adult, and other adult beetles are unlikely to be mistaken for it. its larvae are fairly distinctive among red headed white bodied grubs, which as a group are relatively hard to identify to species otherwise. as a result, vision is unlikely to make a bad positive identification of the species, but it might incorrectly suggest the species much of the time when the observation is just a grub. so flag or no?
Then I think we have to agree to disagree
I think the simplest approach to training for any given feature will be something inherent and intuitive within the design itself which doesn’t require reading instructions of some kind. Something which guides and trains the user whilst they use the system every day.
If the user themself is not capable of making this distinction, why would you encourage them to go to species solely based on an autosuggest? Don’t you believe users should be able to independently verify a species level ID to some extent?
In any case, if it’s causing large-scale problems elsewhere I see no harm in tagging it as something to beware of. In the link to the flag the identifier can explain what it relates to, as you just have. The issues around this come from people being overconfident in going to species. I don’t see issues arising from users not being confident enough.
If we had an easy to ID UK species which is flagged due to complexity elsewhere in Europe, I certainly wouldn’t begrudge the other European identifiers from flagging it up as problematic. It could (possibly) slow down RG datapoints in UK for this species. But I don’t see how that would be a bad thing(?)
Yes, flag - with a connected detail explaining to users what you’ve just said.
I have a book on mosses with symbols for the different levels of complexity. If a field guide can do it, I see no reason we couldn’t. In theory you could even have different symbols as they do, to indicate level of detail required ( e.g. microscopy symbol vs magnifying glass symbol ). Clicking the flag could also offer the user details on diagnostic features needed.
Your examples seem to form analogies as if a flag would prevent people entirely from taking an action. I would have no problem with a total prevention (e.g. the autosuggest never goes to species). But this request wouldn’t even do that, it would just nudge some to think twice.
Why not add regional flags? If there’s one species in the region, there shouldn’t be a flag on it, if another region has many and they’re similar, there should be (!) sign on it, it’s kind of the same as obscuring system should work, on a local level, not only global.
Yes, similar. That’s about improving the CV so it recognizes better when observations may not be identifiable past genus, whereas this is about indicating more clearly to users when observations may not be identifiable past genus, but I imagine the algorithm would be basically the same.
no, the purpose of my examples is to help clarify whether you want to flag species because:
a given species is visually similar to other species in the area
a given species is visually similar to other species in other areas
although photos could allow for visual differentiation of species, observers don’t often capture the features necessary for differentiation
the inclusion of a given species in the training set (due to commonness or distinctive features) makes it likely to be incorrectly suggested for species not included in the training set
the way i read your posts, you would probably be willing to flag for any of the reasons above and potentially other reasons. if that’s the case, then i think there could be some reason to flag most species, and the lack of flags on any given species would mostly just reflect no one bothering or getting around to flagging the species. in my opinion, flags are useful / meaningful only if the set that could be potentially flagged is a small subset / minority of the whole set – because in these cases, the flags would point to exceptions. if just about anything could be flagged (everything is exceptional?), then flags are not a great tool.
fundamentally, i think what you’re proposing here effectively leads down a path that ends up (largely, if not entirely) at something like what has been previously discussed in many other threads:
along these lines, tiwane proposed in another thread a way to achieve something like what’s proposed in those threads, using existing functionality in the system. it seems like a reasonable approach to me, but i don’t think this workflow has ever caught on.
The issue with this is it requires micro level range information to be meaningful. It is all well and good to tell someone (assuming they even look at it) that for example Leucorrhinia dragonflies can be difficult to separate visually. Someone where I live doesn’t need to know that, realistically they are only going to encounter 2 species, which are easy to separate. If they live 300km north of here with 3 or 4 species, or 800 kilometers north of here with maybe 6 possibilities it is important information.
Telling someone here where I live, in the same province (ie the provincial checklist is worthless when your province is 5 times the size of the United Kingdom) that information is counter productive.
The only realistic way to manage this level of micro range information is through the observations themselves. Which means dealing with those. And getting Seek and the iOs app to parity in terms of defaulting to presenting options seen nearby.
I can see this being helpful for monotypic (or regionally-monotypic) genera, especially those whose genus is also the common name, that are often identified only to genus level. I spend a lot of time adding IDs for Sassafras > Sassafras albidum, Galax > Galax urceolata, Oxydendrum > Oxydendrum arboreum, etc.