My idea is that, if we could manually adjust computer IDs not to identify certain things from certain area to species level, it would make computer IDs much more accurate.
(feel free to move this to feature request- at the moment I’m trying to discuss before requesting it as a feature. Often there’s a solution!)
For a lot of organisms, identification only based on image and locality is either impossible or very difficult that it is better not to expect Computer Suggestion to be correct to species level.
In this case, often the computer suggestion suggests the most commonly observed species in the group. Because it does look similar a lot of people accept it, and then all of the cryptic species are ignored until a keen identifier show up.
For example, a spider genus in New Zealand is know to include at least 20 species that are very similar from each other except for the reproductive organs.
However, the computer suggestion is keeping on identifying those as a single commonest species, and often it is hard to correct them due to lack of identifiers.
It would be most appropriate if the computer IDs are limited to genus level for these.
So, I think problems like this can be fixed if we could adjust the computer IDs not to ‘‘identify too further’’ for certain things… is there any other existing way to do this?
For the sake of accuracy, CV usually lists Genus or higher at the top, except in situations where it’s not sure what to recommend, then you might get a species first (that coding I don’t like). Users can select from additional suggestions below, but it’s not entirely the algorithm’s fault.
or it could suggest genus, and when you select a genus it gives you a checklist of the species found in the region, which you can look through and compare yourself or just leave at genus level to be ID’d by someone else later…
In my opinion, this is more of a user issue than a CV issue.
The CV does the best it can based on its training set, and it’s made clear to users that the CV suggestions are suggestions only.
It is up to the user to decide whether to go with the species specific ID or if to keep it at a genus (or above) level.
Unfortunately, a lot of users kind of gamify iNat and race to species level IDs, even going so far as to remove observations if they don’t reach a species level ID in a relatively short time (I just had yet another discussion with someone today who was asking if they should remove their observation as it had not been resolved to species).
What I might suggest is that a more clear recommendation be made with the CV system suggesting that users who are unfamiliar with the species in the observation keep their ID to genus level, and let the community take it to the species level.
I would recommend against changing how the CV system itself works.
Ohh that’s why this happens, yeah this is the opposite of what should be happening… But at what level is it “not sure what to recommend”? In your example surely it can be confident that it’s a moss? The lowest common taxon of the 3 top species there is Class Bryopsida.
Just quoting my post yesterday from another thread:
I often click thru to What’s This (if there are already 2 IDs - to see the web of life, and choose the taxon where I am confident) I work from Unknowns so have everything on offer. Definitely this for the ID, and could be this in a comment.
If that What’s This list were always a clickable option, those who want to, could chose their level. But identifiers have to care about the quality of their IDs first. There are wrong IDs from abandoned accounts which need 3 right against that wrong one.
This is an idea I could get behind. Stop the CV from recommending species ID’s in groups where we know for sure that typical iNaturalist photos can’t be ID’d to species. This doesn’t interfere with ID’s people make, people who may actually know. It doesn’t prevent correct ID of a species that has recently moved into an area where it wasn’t known before. It just stops the CV’s own often circular contribution to the problem.
As someone who is relatively new this platform, I absolutely agree with these ideas. Some more manual control over the CV model’s suggestions would go a long way.
As an example:
I had particular trouble with an orbweaver genus (Socca), because the majority of them present are of 1 species (S. pustulosa), but because most of the spiders in this genus look so similar, and because S. pustulosa varies in its appearance so much, it’s so easy for a beginner like myself to see it and think that it must be right (when in many cases there isn’t enough evidence).
Concerningly, the CV is even associating spiders in entirely different families as S. pustulosa now (for example, jumping spiders).
Manually adding a flag to this species to either display a message to the user when selecting it or to stop it being recommended entirely would (I think) help a lot with getting on top of this problem.
I think what would be ideal is a taxon-level database flag that says essentially “CV should never suggest IDs below this level” - this could be set at the genus level, or subfamily, or whatever is deemed appropriate by the curators of those taxa. The Computer vision could do a check against the database and see if the taxon (or parent) it “wants” to suggest has this flag set. So instead of choosing Musca domestica for every “house fly” the suggestion would get kicked up to Muscidae, or another level. For many taxa/identifiers it is far preferable to refine a coarse ID rather than correct an overly-specific (and wrong) one.
Alternately, it could show a pop-up message of some sort like “Note: Curators have suggested that Taxon is unlikely to be identifiable from photos in most cases. Are you sure you want to suggest this ID? [Yes/No]” - a generic framework like this that could be worked into the upcoming rewrite of the iNat app would be great to have. I know this may not work with Seek, but adding it into the app could be very helpful.
The CV works very well for large, charismatic lifeforms - particularly plants. But for small, cryptic/uncommon, difficult-to-photograph things (many arthropods in particular) it causes a lot of chaos for the identifiers… To the point where a lot of those identifiers just give up on the whole thing.
Some people have a hard time exerting caution where due - while others have a hard time being bold where possible. I’m not sure if a ‘computer’ would do better, trying to forcibly replicate caution under the guidance of timid imperfect primates :)
How big of an issue are the current “overdaring Computer Vision IDs”, in comparison to the potential burden of having to manually refine IDs en masse (if the CV was to replicate the extra-cautious approach of a select few human masters)?
Could it defeat the purpose of automation sparing human sweat? How cautiously should the CV identify things to always be right, while sending obs to the appropriate human identifiers? (I mean, ‘insect’ or perhaps ‘Diptera’ is more than enough to attract knowledgeable entomologists; no need for ‘Muscideae’ or ‘Muscineae’, as in any case humans will still have to peep at the pics and type ‘musca dom’ in the box.)
This has been suggested (in various ways) a lot of times before, and I think it has great potential, although it would likely require quite a bit of testing to figure out the best way to manage the process of flagging a particular family, genus, etc. as being the lowest level amenable to CV ID. But we also need to think through how/why we’re so sure that CV will never be able to offer accurate suggestions below particular levels.
I think it’s reasonable for taxon experts to say something like “even we cannot identify spiders in this subfamily any more precisely based on typical photos, because we know the distinguishing characters require microscopic examination of genitalia”. However, that does not automatically mean that CV cannot in principle distinguish some genera or species within the subfamily. Let’s imagine that CV is trained on a database of typical smartphone photos of spiders that have already been accurately IDed via microscopy (a big task). CV might then discover aspects of those photos that reliably predict particular genera and species, even though the photos don’t contain the microscopic detail the experts would need to key out those taxa. That’s because not every subtle character is contained in a species description or a key. Species differ in ways we haven’t yet recognized.
How serious is this issue? My feeling is that there’s a lot more information that may distinguish taxa than we appreciate. But unlocking that would require a large training dataset of accurately identified images that don’t usefully show any of the characters an expert would currently use to distinguish these taxa. And even if this was successful we would have no way of confirming that the species-level CV identification of an unknown spider was actually correct, so the value would be very limited. Plus, CV doesn’t (yet) tell you why it favors a particular ID, so there’s no logic to interrogate.
So… I think the limit of precision for human identifiers and CV in making IDs from photos may be different in principle, but not in practice. Which means (I believe) that it would be reasonable for the community to establish effective limits of visual identification that could be used both to guide CV and to inform less familiar users—“Spiders in this subfamily can’t be distinguished without microscopy” or “Fungi in this complex can only be distinguished by DNA sequencing”.
As a bonus, avoiding spurious accuracy for CV suggestions in some taxon groups might then allow CV to be opened up to offer finer IDs for other. Eventually, CV will suggest the right subfamily for Ageleninae spiders, the right hybrid name for Crocosmia x crocosmiiflora and the right subspecies for common flowering plants.
For what it’s worth, I was able to get the Computer Vision to act that way with a genus here in South Korea.
There are two species of Cophinopoda robber flies here that can only be distinguished by their genitals, meaning most iNaturalist observations don’t have the detail necessary to confirm the species. The CV, however, consistently suggested one of the two species with no mention of the other. After going through old observations and bumping them back to genus – with a note to the observer about the presence of the two species and how general photos weren’t enough for a species-level identification – and marking the community identification ‘as good as it can be’ at genus level, the CV now suggests the genus rather than the one species.
Because of the way iNat has set up the CID algorithm and then applies Ancestor Disagreement.
Coarse ID + 2 right = RG
Wrong ID + 3 right = RG
Wrong ID + right coarse ID (NOT hard disagreement) + 3 right = RG ultimately that soft disagreement serves no purpose, but helps the ID to come right.
Wrong ID + hard disagreement + 5! right = RG
The difference is 3 more competent and active identifiers!
First prize goes to the wrong ID withdrawn - then we revert to 2 IDs = RG
I think that this is a good point, but the bottleneck/limitation is the training data - if experts agree that we don’t have enough info to even make an accurate training dataset, then the CV model outputs for those taxa will be a bit of a fool’s errand.
On the other hand, if it is possible to make a carefully assembled training dataset based on some specimens with enough detail, the CV may indeed be able to ID accurately where experts fear to tread.
It’s also possible that some kind of unsupervised algorithm (which iNat’s CV is not) could separate the taxa accurately based on photos. That could then allow for researchers to try to figure out what characters can be used to define those groups, but that would require some extra work outside of iNat.