Is there any documentation regarding the second item, about training off only species that have a certain number of research grade obsevations? My impression is that that is definitely not the way that the algorithm currently behaves. I frequently get computer vision suggestions on liverworts in particular (but occasionally vascular plants) that have no records, research grade or otherwise, on iNaturalist. Just photos from natureserve or a similar source. And to bring things back to the original topic- yes, they are often described as specific to an entirely different continent.
@erikksen do you have a place defined in your settings? I am on the .nz domain, and have a default “New Zealand” filter picked up from that, so when I see results I am only seeing what is present in New Zealand. To see WW I have to deselect the filter for the place.
There are 13,730 species that have at least 20 research grade observations. We chose this as the data threshold necessary to include a species in our model. Technically, this number is closer to 10,000 species since we took steps to ensure that each species had at least 20 distinct observers to control for observer effects. We are moving a new species across this data threshold every 1.7 hours as new observations and identifications are added to iNaturalist. This means every observation you post or identification you make works to improve the model!
I dont know if it has changed, but there is this not easy to find page on the site describing the computer vision system : https://www.inaturalist.org/pages/computer_vision_demo
It clearly states that to go into the training for the system, there must be 20 research grade records.
If you are seeing something else, it is either a filter issue or described above, or something has changed versus the rollout documentation.
I have no idea if this is the right place to comment, but if it is not just remove this reply…but i am always surprised they (in general) never give a warning that the chosen species is not found in this country, has not been found last 10 years in the state or that it is from another continent.
I know the ObsMap app give a Fat red text if the species is rare and if it is very rare you have to confirm it by pressing an additional YES. The last option reduced the rare birdinds with 90% which was important as sms and email notifications are send on (mainly) rare bird alerts.
I was very happy that the inaturlist app show a very basic “the species is seen nearby” which, in my option, should be extend to warnigns like ‘not from this continent’,
I have no idea how difficult it is but i used to have a database with plant names (from 30years ago) and the present in the different European states. I am only afraid that for several species the data is out dated.
But a bold red text (can colour blind man see red?) will prevent some wrong admissions.
Nope. To add a little more information to this scenario (again, most common with liverworts and mosses in my experience):
In the upload process, the list of suggestions that drops down has maybe four suggestions I’d like to check out, so I right click and open each in a new tab. One or two might turn out to be as I described before- with photos from a natureserv article or something to that effect, but no existing inaturalist records. In line with this topic’s origins, most others often do show inaturalist records but existing observations may be only in asia only or similarly not geographically sensible (I understand that if I did have a location filter on, I would not see those observations, right?).
Really my only interest in pointing this out is to suggest that the computer vision is not trained solely on research-grade iNat observations. It may be that it is trained on a limited number of images from existing authoritative sources as well (otherwise how would the suggestions I’m referring to end up being suggested to me?). That may indicate that condition 2 in edanko’s post that I was responding to has changed at some point.
I’d be interested in seeing some examples. Post a new topic in General when you come across this again?
Will do, can probably reproduce it with dunmy uploads.
Turns out I can’t reproduce it. Maybe I had location filters coming on selectively for individual tabs? Who knows.
There is a different problem with the suggestions too: sometimes, e.g., many beans are suggested, from different genera - but the software does not suggest ‘bean’ itself as a option. As bean is consistent with the suggestions, all of them, that is what I want to pick, and I can as long as I know they are all beans. But if among the suggestions I don’t know what the lowest taxa they have in common is, I cannot decide on a lowest level safe or probable id, except, ofc, by clicking on many of the suggestions and checking their taxonomy. I check taxonomy often enough, I would like for there to be a summary id, in the case where the probable taxa are similar at some level which is not too high. Also even if the taxa don’t meet until, say, dicots, I think it’s not a problem providing that because dicots is a very safe id then, being so high level. The trouble will be if the algorithm often picks multiple taxa of the same low level taxonomic group while ignoring candidates from elsewhere. and then the error is no greater than when one of the given ids is chosen. At least it will be obvious how far up the disagreement is.
Thanks for connecting me to this thread, @charlie!
(Can color blind man see red?)
Not necessarily, but if the text is also bold then the color is not the only way to recognize that type of message.
I am not sure whether I missed something from this very long thread but I wanted to add an extra example.
This problem occurs widely even with VERY common species of mammals. Not only you get suggested a list of species which don’t even occur in the whole continent, you also get a “we are pretty sure this in the family:” suggestion which links you to the wrong family. The family suggestion is obviously linked to the wrongly suggested top species so even the cautious newbie who wants to be on the safe side fails in assigning a family ID.
Here is a very fresh observation in which the problem occurs:
A very common european rodent (Apodemus, family Muridae) is being wrongly assigned to american species (in the genus Peromyscus, which does not occur in Europe) and of course also assigned to the relative wrong family (Arvicolidae) by the algorithm.
I correct several of these easily avoidable mistakes and this leads me to think that geography is absolutely not taken into account.
There are quite a lot of observations of Viola sororia in Europe. I guess 99% are not Viola sororia, but native European species. Can someone check this observations in order to detect true Viola sororia species ?
Probably most are not V. sororia, and fortunately most are not Research Grade, but it seems that this species has also been introduced in parts of Europe.
A bit late to the discussion, but I face this issue constantly when working with myriapods (millipedes and centipedes): No matter where on Earth the observation was taken, the algorithm reliably suggests almost any dark millipede with yellow spots as Harpaphe haydeniana (a species in a genus restricted to Western North America), any long, banded millipede as either Narceus (Central-Eastern North America) or Paeromopus (California endemic), and any pale millipede (or frequently, beetle grubs) as the California endemic Xystocheir dissecta. I spend much time correcting these frequent mis-IDs: Like many invertebrate groups, myriapods at the species level are poorly known among the general public (or even biologists), and misidentification is rampant online. I think the AI is clearly biased by historic & current user base (California, North America). My question is: how do we bap the AI on the nose and say “NO!” (or how do we boost the “I” in AI)? Does setting Establishment statuses (e.g. “endemic to California”, “Native to the the United States”) lower the probability of that taxon (and/or its constituent taxa) being suggested in Asia or anywhere else? Maybe if an out-of-range suggestion is chosen by a user, a checkpoint message could be raised, something to the effect of “this taxon doesn’t appear to be known from this area, do you wish to proceed?”
Update (May 12): After poking around in some other observations I admit that I may be seeing a biased assortment myself, seeing only the obviously wrong IDs, and missing ones that don’t cross my radar because they were correctly suggested (either by AI or the user). I don’t have access to all the data, but I do see that the AI ranks certain suggestions higher or lower based on geography, contrary to my hyperbolic claim of “No matter where on Earth the observation was taken…”. I don’t think the AI is fatally flawed, but still would like to see some more cooperation between people and machines, e.g. maybe in the form of a checkpoint when choosing a taxon not known from the continent, as described above.
Under the current design setting the endemic status or a range will have no impact on the suggestions. It may have other or future benefits, but nothing here. Maybe 35 species of millipede have enough records to have gone through the vision training. Maybe 3 of these are black and yellow in colour. If it gets sent a black and yellow millipede, the suggestions will pretty much always have these, especially given the proposed results list 8 (or is it 10 I can’t remember ) options.
Well then, that there’s a big problem. The problem with machines (even computers) is that they are dumb: they cannot think and reason like people. It sounds like the only way to train the Computer Vision AI to be smarter (offer better suggestions) is to feed it more Research Grade observations (positive reinforcement). I wish there was a way to add negative reinforcement, otherwise it will keep making the same bad suggestions. In the meantime, educating people about what does occur in their area might help.
oh I dunno, people can be pretty dumb too, it’s not fair to single out the machines!