Alex did leave iNat earlier this month, and his impact on iNaturalist has been huge. We’re all incredibly grateful for his contributions, both in engineering and just as a human and teammate, and I’m looking forward to hanging out with him in the future and seeing what he’s up to next.
Regarding contacting engineers: engineering is a team endeavor, and the engineers are going to continue the work as a team as they always have. Alex was a huge part of computer vision and the geomodel on iNat, but it’s also been a team effort with contributions from multiple engineers. As the iNaturalist team as a whole has evolved, we’re working on more clearly defining roles, and the engineering team will focus on engineering rather than interfacing directly with the community in threads like this. We have an engagement team at iNat who focus more on outreach and community engagement, and we’ll be the points of contact between engineering and the community. On the forum that’s primarily going to be myself, @carrieseltzer, and @seastarya. We can relay concerns and feedback to product and engineering. That’s a lot of what I’ve been doing for nearly all bug reports and feature requests for the past 7+ years.
We just released the latest Computer Vision/Geomodel update and have decided to revert back to the GRID approach that was used previously instead of the SINR approach that we went with for the last few models, based on feedback and reports from the community (thank you!). You can read more in the latest blog post.
From what I’ve seen in the threads you referenced, there would be no need to bother putting “in this region.” The complaints about this come from everywhere.
As you referenced fungi, I’ll revise that: people don’t like their obs being pushed back to Order.
Part of the problem is that many of those species did exist here until the most recent revisions. This is an inherent problem with the thread’s premise of “in regions with…” Given current taxonomic trends, that’s going to be every region, at least for certain taxa.
Macrolepiota procera and Macrolepiota prominens were the commonly applied Macrolepiota species in eastern NA before sequencing showed them to be not present and the proper species were described.
Those I at least understand.
M. zeyheri is a South African species, and M. clelandii is Australian, and I’ve never seen any indication that they’ve ever been commonly applied here. I pulled out Mushrooms Demystified, which was published originally in 1979, and the only Macrolepiota species even mentioned is procera (well, and rhacodes, but that’s now in chlorophyllum.)
All that is to say, I’m fighting the computer vision on species that no expert ever thought was present in eastern NA.
Also if something needs pushed back to order, it needs pushed back to order. I don’t see why that’s a problem. I’ve never seen why that is a problem.
Thanks for the update, Tony. To me, a priority should be to understand why the geomodel for some species is covering such a vast area without confirmed observations. For example, the map of Research Grade observation for this wasp looks like this:
It only has Research Grade observations in North America, yet its new geomodel looks like this:
Even leaving aside the large areas of ocean included, there needs to some mechanism to get the model to back off in areas outside of the known distribution, either by masking the geomodel to within some buffer of Research Grade observations, or being more conservative about offering genus or family-level suggestions.
For building the geomodels, it might be useful to discard some small percentage even of the Research Grade observations which are outliers in terms of geography or other variables, to produce more conservative models. [In this case, there might have been a very small number, two or three, of non-North-American RG observations when the model training set was pulled]
Such behavior would also be helpful with certain species that have a tendency to show up as package or produce stowaways (or other sorts of human-aided transport), but fail to establish a population because the climate just isn’t right for them.
Yes. As a Lycorma delicatula researcher I feel that the CV should back off on specificity in IDs for almost all species. We really need research-grade observations for every single successful and failed jump dispersal to get comprehensive reporting, but in this day and age we have enough(?) people to identify new jump dispersals of all kinds of species except when they resemble species similar to those in the same geographical area. I just hope people don’t use a revised CVM to misidentify very rare hitchhikers.
This would be a very bad idea for hardier hitchhiking invasive species that make new establishments easily because oftentimes iNaturalist is the first step in people reporting an invasive species they accidentally transported. I and others in invasion biology also need RG observations of certain jump dispersals to remain RG, only if they are correctly identified, to have enough data points to support that their invasion of interest is a problem. We should make special exemptions for invasives that spread very fast and have high tolerances.
This is a very salient point. I think it likely a majority of known biological species cannot be identified as such from habit/habitus imagery. (Overall visual appearance in plants, fungi = habit, in animals = habitus.) By definition, biological species are based on reproductive isolation, and there is no rule of nature that reproductive isolation will lead, necessarily, to a macroscopic change in habit/habitus accessible to human vision and digital imagery. I need to flesh out some thoughts on this, and search for publications on this idea. I will work on a separate post or a journal entry on this issue and then invite comment.
You can still get Research Grade observations of species outside their normal range, if the computer vision suggestions are to genus, for example, and the species is confirmed by someone who knows what they are doing. But the bar should be a bit higher for such observations. What you don’t want are lots of dubious observations at species level just because someone unthinkingly accepted the automatic suggestions.
I made a feature request that I was told was too similar to a comment in this thread. I’m just going to repost it here. I wish I had time to read through the whole thread but I’m super busy at the moment, so if I repeat someone else’s sentiment, apologies. I think the main issue is the computer vision keeps trying to suggest species in areas where they’re not native and are likely to never be found. I still see it suggesting species in oceans where they’re never found and there are no observations for some reason.
A way to lock out computer vision suggestions for out of place taxa
Platform(s): All
URLs (aka web addresses): N/A
Description of need: Currently, several problem taxa tend to be oversuggested by the computer vision, regardless of location, often resulting in even RG grade data that is incredibly inaccurate, as non-specialists tend to just agree with each other over an incorrect ID. This creates a misinformation feedback loop which becomes a headache for specialists who have to go through every single observation, often numbering in the thousands, of each of these species and correct them to prevent them from poisoning data on a massive scale. Clathria prolifera and Haliclona cinerea are two examples of sponges that create this nightmare. Clathria prolifera tends to get recommended for just about any beach wrack and Haliclona cinerea is a common suggestion for just about any pale pink or tan colored sponge from pretty much anywhere on the planet. Because this is just the computer vision acting as it’s programmed and in most cases this is a good thing, direct intervention is required to make this specific problem stop.
Feature request details: I’m not sure how difficult this would be to implement, but just allow curators to ban the computer vision from recommending a certain species within a certain region. Or restrict its recommendations to a very specific region or regions. Such as only the European region for H. cinerea. Or the US Eastern Seaboard and California (where it has escaped) for Clathria prolifera. Maybe even allow a user to select a box like they would with search results on the map. This wouldn’t prevent people from still selecting the correct species if they know it or even inputting the wrong one. But it does force people to have to type it out, so it makes sure that they have to think about it instead of just automatically clicking on something that’s completely wrong because the computer vision suggested it. This way people who don’t really know anything about the taxa involved can stop getting recommended terrible suggestions, and people who know about the taxa involved can record new instances of invasive species, for example.
What’s odd is that years after noticing this problem I’m still seeing the computer vision or recommendations refuse to allow me to select a species that has abundant observations in the local area, but still somehow people with absolutely no knowledge of sponges whatsoever are getting recommended species that don’t belong anywhere near the area that’s being suggested sometimes with zero observations of that species on that side of the planet. I have no explanation for this behavior, but it’s clearly flawed.
I think folks needs a better understanding of computer vision, personally. That included me at one point. Example: if you’re ID’g in a geographic area that lacks a heavy concentration of iNatters or observations, the CV will not give you local species because it doesn’t have photos to compare to. It will sometimes give a good clue on genus, or family or tribe, but if someone doesn’t know that CV’s accuracy relies in part on the images it was trained on, then they will unintentionally overrely on it.
I only figured this out by trying to identify/sort “unknowns” in “low traffic” areas of the world, and by beginning to learn about the challenges about identifying Bombus species in the Himalayas.
Also, if someone fails to crop a photo to the organism they are trying to ID (and put that photo as the first image in their observation), they will get screwy results also. Again, not self-explanatory.
The problem is, the CV is doing this in places where there are well established species with many observations but then anomalously recommending the wrong species like X. muta for X. testudinaria. I don’t know if I’m becoming paranoid or what, but it also seems to be happening more frequently now than it used to.
The CV model produces a numeric measure of visual similarity of a photo to the set of photos it was trained on. A clade needs a minimum number of observations to be included, so a lot of RG observations are not included in the model. The only improvement that could be done on the CV level is to include the parent clade.
The rest of the cryptic taxa related issues are from the suggestion logic, which takes the visual similarity score as a measure of identification confidence.
If the location is not specified, the geomodel filter is not applied and suggestions revert to “Visually similar”. When a taxa from afar is selected, this will corrupt the distribution maps.
Unlike the CV model, the suggestion logic has access to the taxonomy tree. Sometimes the parent of the clade with the highest score is displayed as the top recommendation. The rest is ignored.
This feature request is about marking taxa that should not be suggested. The options so far include manual curator action and metrics based on historical validity of suggestions. Both rely on people doing a lot of extra work, fixing identifications or updating taxa.
Also, both refer to the previous versions of the CV model, geo model and suggestion logic.
So far the only accuracy test performed is a limited scope survey.
Large scale batch comparison of current suggestions vs. RG identifications would give an automated accuracy score for suggested clades where the taxa with the worst performing suggestions could be marked by the validation batch. Develop once, run after each change.
The basic question is which has the priority: taxa level suggestions or accurate ones?
This. Folks don’t use the community taxon “is this the best that can be done” nearly enough, and I think it is misunderstood and worded a bit backwards (as someone who has hit the wrong selection….lol). There are Bombus species that simply cannot be identified past genus based on the photos…so check that box, and bam, it’s RG.
I know sometimes folks want to do exactly what species it is, and sometimes I feel like there’s a push on iNat to get to species, but species differentiation can be really hard especially if your photos are blurry or not specific enough (guilty as charged).
Yep it’s much worse now. Common species in the Caribbean that have been accurately identified by the CV for about a decade now are now only being recommended to genus level as if it’s not sure, while it’s still recommending down to species level in the Indo-Pacific region animals that have never occurred there. I haven’t kept up with any changes that have happened but the CV must be using AI now to get this bad of results. This is literally worse results than we already had.
Nothing has changed about the way the CV uses AI. Depending on what you mean by AI, either the CV has been using it the whole time, i.e. computer vision is and always has been a kind of artificial intelligence, or if you specifically mean generative AI, then the CV has never used and still does not use (generative) AI.
Well I don’t know what’s happening then, but it’s definitely changed in the last few months. I meant AI the same way that Google search results don’t actually search for what you’re looking for anymore but are now “interpreted” by AI, and just basically ignore whatever you type.
The main changes recently (other than ~monthly updates of which species are included) have been from the team tinkering with the geomodel. Maybe try the respective species’ pages here and see if their predicted ranges are representative of their actual ranges. If they aren’t, this thread might be relevant.