Platform(s), such as mobile, website, API, other: Backend
URLs (aka web addresses) of any pages, if relevant: -
Description of need: This change is needed to reduce the huge workload for identifiers who have to clean up wrong computer vision based IDs
Feature request details: A few years back, all hybrid taxa were excluded form computer vision suggestions. If i remember correctly the reasoning back then was that some rare kinds of bird hybrids could not be distinguished from their parent species. While it might have made sense for the birds, there are other cases where this decision causes lots of problems. There are hybrids out there which actually are way more common than either parent species. Lets looks at one example: Kalanchoe Ă houghtonii (= Kalanchoe delagoensis Ă daigremontiana). Since it is no longer suggested, new observations now get placed with other species instead, mostly as Kalanchoe daigremontiana. The statistics say that this had to be corrected 2510 times (incrediby much for a species which only has 524 observations). It is so much work to clean this up. I see no reason why computer vision wouldnt work on this taxon. Curators should have an option to enable computer vision on certain hybrid taxa to avoid cases like this.
One more thought id like to add: this is a self reinforcing problem: The many wrong IDs accumulate on the rarer parent taxa and are used to train the model again. This then leads to an even higher percentage of wrong IDs. There are not nearly enough identifiers to keep up with this
I prefer it not even be opt-in, unless it is at a high-level; we should just go ahead and re-enable CV for all hybrids in kingdom plantae/phylum tracheophyta, at least.
We should accept that it may take a training cycle or two before the performance on plant hybrids stabilizes, because there are CV/ID cleanups for hybrids that could be done but have not been because they have been excluded for so long.
After tracking the performance for plant hybrids, eventually we could reconsider whether or not it should be enabled for other kingdoms again. Avian hybrids were the main problem before.
My hazy recollection was that the hybrids-enabled CV model was suggesting Mallard x American Black Duck for every Mallard and American Black Duck. Which I guess makes sense since the hybrids, accounting for backcrosses, cover a spectrum of phenotypes between the two species. But itâs kind of misleading because hybrids are much less common than either pure species.
Would there not be similar issues with plants?
In my area there would be multiple observations of all three taxa so I think the CV would have high confidence âVisually Similar / Expected Nearbyâ for both the species and hybrid (not sure if the new geomodel changes how the nearby part works). I wonder if accounting for the ratios of number of observations in a way similar to this proposal would help with that. For my county the ratio of observations for Mallard:hybrid:American Black Duck is 762:18:102.
In my experience, both of these issues are often quite different for plants than for the problematic ducks. Many hybrid plants may be easy to distinguish from their original parent species and may appear in iNat observations much more commonly than those parent species.
Itâs pretty common for hybrid plants to have very predictable character traits that are distinct from their original parent species. One example is the widely invasive Crocosmia x crocosmiiflora, which was created in 1879 when Victor Lemoine crossed C. aurea and C. pottsii. These plants have a consistent set of traits that are quite easily distinguished from the parent species.
This scenario is very common with horticultural hybridsâa plant breeder (often in the 19th century) tried a bunch of crosses to get the right combination of traits, and that hybrid plant has been extensively propagated since. In many cases, these plants may be fertile, so the cultivars have the potential to reproduce themselves extensively.
There are 13,135 iNat observations of C. x crocosmiiflora, 835 observations of C. aurea and 55 observations of C. pottsii. Maybe 200 of those C. aurea observations are misidentifications caused because CV is prevented from considering C. x crocosmiiflora. I know that I and several other Iridaceae identifiers spend quite a lot of time correcting C. aurea IDs to C. x crocosmiiflora.
Because animal breeders have unleashed fewer fertile hybrid taxa into the wild than plant breeders, it seems that most hybrid animals seen by iNat users have come about ânaturallyâ, infrequently and with quite variable outcomes. Sadly for plants itâs the reverse.
Crocosmia x crocosmiiflora was the taxa I had in mind when I first saw this - in the UK itâs fairly invasive (in the west, at least) and far, far more common than itâs parents or other Crocosmia - with lots of observations stuck at genus because theyâve been vision-IDed as a full species.
Would also be very useful in cases where hybrids are natively far more common than their parents, like Ciracea x intermedia - present in Ireland as a native even though the Circaea alpina parent is totally absent, or Quercus x rosaea which is sometimes more common than the parents in parts of the UK - or even for hybrids that are just common like Populus x canescens (grey poplar) is.
I would also support re-enabling CV for all vascular plants. I donât think the âhybrid Mallard problemâ has a parallel for plants. Iâm not aware of plant hybrids with enough observations to qualify for CV that are also not at least as distinguishable from their non-hybrid parents as we might typically expect taxa to be on iNat.
Reynoutria Ă bohemica is also relatively hard to distinguish from most photographs, despite it being probably much more common than iNat would indicate â I think thereâs a good chance some of my own R. japonica observations might actually be R. Ă bohemica, which I need to look into and (potentially) fix sometime.
I donât think this is really a major argument for not including the hybrids in the CV, though. There are already tons of cryptic species out there, and we donât exclude them all from the CV. Excluding hybrids feels like a workaround for a very small subset of the actual problem (the CV misidentifying cryptic species) that ends up having a net negative effect (especially when itâs excluding very common garden hybrids and invasive hybrids that newer users are probably disproportionately more likely to be submitting to the website.)
Thanks for these counter-examplesâplant hybrids that seem to be tough to distinguish (and which I had little knowledge of). That leads nicely to @cigazzeâs next point:
Based on you folksâ knowledge of these hybrids, do you think it would potentially be worse, better, or little different to have them within the scope of CV? In principle, given a fairly accurate bunch of photos for the two parents and the hybrid, CV might be able to pick up on sufficient differences to make reliable suggestions (especially when combined with the geomodel).
If the current state of identification for these taxa is poor, we should also consider that concerted effort to improve ID quality might pay off through better future CV recommendations (but only once CV is allowed to include the hybrids).
I just tested some observations, I was surprised to find the CV suggested genus (Anas) for all the duck obs I tested (it then had both Mallard and Am. Black Duck after that for every example, but switching the order appropriately for whatever species it was). Presumably the hybrid would show up with these every time if it was an option. Which would be a bit intrusive but not an issue unless it was the actual top suggestion (instead of genus here).
I had similar results for the cattails - genus on top, then the two species. Having the hybrid on the list would probably be an improvement. Same with the honeysuckles I think, too bad the CV doesnât know how to suggest section rank rather than genus. For these plant ones itâs definitely appropriate that the top suggestion is genus.
I think it depends on the species, but for the examples given, probably same-to-better â these hybrids are very common, so the CV would have a good amount to go off of, and if it really is still impossible for the CV to tell it apart, the net result of the hybrids being more likely to be correctly IDâd but some non-hybrids getting CV IDâd as the hybrid is probably neutral.
For rarer cryptic hybrids, I think (assuming itâs 100% impossible for the CV to tell them apart) the result could be same-to-worse, but Iâd think that the CV would be less inclined to suggest a rare member of a cryptic species-hybrid complex than a common one, which would reduce the effect a fair bit. I could be wrong, though â I donât remember whether they normalize the sample size for every species or not when training the CV model.
I think there are threshold criteria for number of photos from verifiable observations, etc. Once those criteria are met, I believe there is no weighting given to common vs. rare taxa. The raw recommendations are based on visual similarity, as modified by the geomodel weighting (the âexpected nearbyâ logic). I think there is also some logic to boost the ranking of âsiblingâ taxa in the same genus as the highest-scoring matches.
I think in general adding more hard taxa to the CV makes it better, because it forces it to learn more sophisticated rules. It can also help reduce potential cases where the CV is currently far more confident than it should be, because it is unaware of the third possibility.
For example, in this T. x glauca observation, the CV reports 98.5% confidence (combined_score) for the observation to be T. angustifolia, but it shouldnât. I surveyed the CV scores for maybe 20, and in many if not most of the T. x glauca observations I checked with the CV, the CV incorrectly had >~90% confidence for one of the two species. Reducing such unfounded confidence would inherently improve accuracy.
For a true cryptic hybrid where it is 100% impossible to tell them apart from a single phot, I think humans maybe will also be unable to do them in most cases, so the performance is not worse than, and possibly better than, the status quo.,
Yes, surely it will greatly improve accuracy of CV responses. I guess, for tracheophytes general accuracy of the model could reach 94-95% due to one small step - just to allow hybrids to be included into the model.
Itâs a matter of common sense. We have hundreds of easy recognisable taxonomic entities which were excluded some years ago from the model, because they are marked as hybrids.
This leads to some creepy results, when common garden plants familiar to all CV models (PlantNet, observation.orgâŚ) are not shown as a possible option for identification at all.
No Strawberry (Fragaria Ăananassa), no Peppermint (Mentha Ăpiperita), etc.
So, this feature (in fact a tiny modification of the filtering code) will improve greatly functionability of the CV.
There was this post by staff from March last year; I donât know what has happened since:
Inclusion of selected hybrid taxon on an opt-in basis would be an improvement, but I agree with this:
I suspect the simplest solution would be just including plant hybrids and excluding birds if those are the ones that are causing issues.
Iâm going to quote myself about one important issue with not including hybrid plants in the CV:
A really common wild hybrid around here is Medicago x varia â I probably see it more than the âpureâ Medicago sativa. Here the CV generally does suggest the right genus, but it is frustrating to have to enter and/or correct the taxon name all the time when it should be able to suggest it, since certain forms of this hybrid have fairly distinctive multi-colored flowers (also: from a data entry perspective, I find that I have to type the name in a very specific way to get it to show up in the drop-down list of possibilities, so I waste time simply trying to get the taxon I want).
I think this would be very beneficial for most plants and support this idea. Giving curators the ability to control which suggested hybrids can be suggested really helps prevent a lot of the issues associated with having all hybrids in the suggested ID section.
I often see hybrid duck and goose observations that are so varied that the AI would easily get muddled up trying to learn to distinguish them. It makes sense for the computer vision not to auto-suggest these as it would contribute a lot more work for an already very active bird identifying workforce. Ducks and geese are among some of the most observed taxa (and may be among the most observed animal hybrids too?) Itâs easy to imagine how the AI might get swamped with incorrect IDâs as these hybrids are so diverse and sometimes can look really similar to either parent. Not to mention ducks are selectively bred into various breeds and colour morphs to diversify them even more! I can really understand why the hybrid suggestions were removed due to the problems associated with these birds alone.
I think this suggested feature really makes sense though as curators can untick suggested hybrid IDâs for problematic taxa. Things like birds in my opinion should most often require a human to suggest the hybrid, while plants tend to be easier (mostly) to distinguish so Iâd be fine with the AI making suggestions a majority of the time.
My main thing to note with hybrid plants in the computer vision are problematic plants within the horticultural trade such as Daffodils (Narcissus) which have been heavily hybridized and selectively bred for the horticultural hobby. These hybrids are then naturalized in the wild and planted everywhere by humans. You can usually tell itâs a hybrid as they have a mishmash of traits, but classifying them past genus is another story. It can be very difficult to work out the hybrids ancestry without genetic analysis as many cultivars have several species and cultivars in their ancestry. The highly popular cultivar Narcissus âJetfireâ for example has 6 different species in itâs ancestry! In this instance I think a suggested ID for Narcissus hybrids would be quite problematic, so Iâd support curators being able to disable hybrid suggested IDâs for Narcissus as well. Your feature request would allow curators to enable / disable certain hybrids from getting suggested as an ID and that seems like a good solution to this problem. Experts can still ID natural Narcissus hybrids, but it makes it significantly less likely that people who are unknowledgeable on the genus will be giving incorrect hybrid IDâs if the AI isnât suggesting the hybrid IDâs in the first place.
This might be getting off topic, but what about allowing curators to selectively flag taxa to be removed from being suggested by the CV? E.g. certain fly species that are only possible to ID to species with dissection. Those should never be suggested by the CV, in my opinion.
Hereâs more details from staff about why hybrids caused problems:
And a quote from the subsequent blog post discussing why hybrids were then excluded:
The first sentence of the second paragraph would predict issues with cryptic species and species complexes as well, so Iâm not sure if this is an issue that can be solved long-term just by excluding a handful of problematic taxaâŚ