I’ve been doing a lot of identifications lately. It strikes me that this is the normal for iNaturalist: someone makes an observation and they guess a species, probably with computer vision, the guess is wrong a significant percentage of the time. After someone corrects the ID, the observer either agrees with whoever corrects the ID usually without any personal justification for the agreement, or they simply ignore the observation from here on out.
Has there been any discussion on trying to push the community of observers toward better identification practices?: 1) Don’t guess species unless somehow computer vision suggests a specific species is very likely to be correct, which I don’t think it can even do now (identify at an appropriate higher level), 2) withdraw your wrong ID, rather than simply agreeing with someone else’s ID, and let someone else with knowledge of the organism to confirm the id.
It is discussed all the time, but you can’t teach all the new users not to do that, so we just write comments for users not to do that, but it doesn’t lead anywhere in most cases, so I’d appreciate if it was clearly said for each user in start tutorials, plus change Agree button name to something else (https://forum.inaturalist.org/t/improve-id-function-and-name-of-agree-buttons-a-modest-proposal-not/6589)
cv suggestions can be totally right btw, but it’s working on pretty small number of taxa (comparing to all taxa worldwide) and not always/everywhere, e.g. it learned European plants on a decent level, sometimes better than some students, sometimes worse than any human
From my perspective, guessing is strongly encouraged for the initial ID – I have no qualms about parroting cv or making a w.a.g. on unknowns. (I try to label them as such.) The bar is higher for a confirming ID.
“1. Vision suggestions are 60-80% accurate, depending on how you define “accurate,” but more like 95% if you only accept the “we’re pretty sure” suggestions”
The CV has hugely improved since then as well, so these numbers would be even higher. So certainly not ‘normal for iNaturalist’. As a single example, there are now obscure, Australian brown moths that the CV regularly correctly IDs that I struggle to differentiate myself.
There seems to be this very strange inverse relationship where the better the CV gets, the more people criticise it for some odd reason. By all means, the CV is not perfect, and there are many taxa for which it struggles, but I never see anyone praising its strong points/highlighting the things it gets right, only ever focusing on the negative stuff.
IMO, guessing ID’s the half the point of iNaturalist. Its why I gravitated here above other sites in the first place, and it has helped me learn a lot. Engagement and Education are, and should be, the first priority for iNat.
I’d be open to some mechanism that make initial ID’s for new accounts not count until they get a certain amount right or something along those lines, but reliability metrics have been discussed on iNat a lot and the common consensus seems to be it’s working pretty well overall.
People are going to ‘put in’ what they ‘put in.’ You can’t change that. Personally I research every ID, make my best guess, and then if someone gives a different answer I research that and usually either remove my ID or change it depending on how convinced I am, but I’m highly motivated.
People name things incorrectly because they don’t know. Most of them are doing their best. This website has a lot of beginners, and it takes a while for them to learn. Using the computer-generated identification is actually a good choice perhaps as much as 85% of the time (in certain geographic areas). I view the misidentifications as an inevitable part of this citizen science project, something we just have to work with. Adding features that make using this site more difficult would, in my opinion, be counterproductive.
(I do know there are a lot of errors! I have a few pages of paragraphs I can use to explain/correct errors I see often, plus a list of 48 species misidentified as Timothy grass so far.)
On the whole the CV system is surprisingly accurate, but it is also heavily biased toward areas and taxa where there are lots of observations. Once you’re out of those areas or taxa the accuracy can fall quickly.
That said, yeah, there is a weird aspect of hostility toward the CV system and iNat itself in some communities.
As while back a friend of mine posted a photo of a cockroach instar to an entomology group on Facebook. The image wasn’t clear, so I cleaned it up, ran it through the CV system, double checked the potential ID with non-iNat sources for range, similarity at different stages of life, etc, and made an ID suggestion that I was pretty confident of, but I also made sure to include the caveats and mention what I’d done to get the potential ID.
One of the mods of the group started slagging me off and going off on iNat as well, going on about how iNat is full of problems, that it’s totally inaccurate, that “if you don’t have the chops to make an ID then don’t make any suggestions”, etc.
After a couple of exchanges I ignored him, and it turned out my ID suggestion was correct, but he never said anything about that.
iNat seems to trigger hostility in some people and they don’t seem to realize that it’s a tool to be used much like a field ID guide is, it’s not some be-all-and-end-all oracle handing out ubiquitous truths.
I encountered a similar bizarre situation on a facebook group for fungi ID. Someone posted a pic saying the iNat CV suggested it was X, but they wanted to double check. Three people all confirmed that yes, the CV was exactly right, but then went on to also say they didn’t like iNat because of its “Huge number of errors”…
Some top identifiers may misclick and accidentally choose the wrong option from the dropdown menu, I saw it happen three or four times. People occasionally make mistakes, it’s just what we do :)
Many experts would like to get a second expert opinion.
You can do what you want, but I would agree with a top identifier only if I’m sure it is the right taxon, after checking the range and all of the similar species in a couple of sources (mainly guides).
I dont think the initial guessing is a problem, but the baseless agreeing is because it leads to a lot of “research grade” observations which in fact only have one serious identification.
In my opinion, the “agree” button is just too convenient. Do we even need it?
A really cool example of how the CV has improved over time. When I first joined iNat ~3 years ago, there were extremely few (if any at all, there may have been zero) native Australian marine shelled molluscs suggested/suggested correctly by the CV. All the options were always morphologically-similar northern hemisphere stuff. At this point, there weren’t many people uploading pics of seashells (probably an order of magnitude lower compared to now), and a lot of the things that were getting uploaded weren’t being ID’ed (at least not to species).
Over these 3 years, myself and a great group of shell enthusiasts have ID’ed tens of thousands of shell pics and driven a lot of users to both upload old records + find new stuff. Now, most of the common natives are suggested/ID’ed all the time, often correctly.
The CV can identify shapes, so that a red leaf often are identified as a certain unrelated red fruited plant. It cannot distinguish between a plant leaf and a insect wing with the same shape. It cannot count legs to distinguish between 6 legged insects and 8 legged spiders. It also cannot distinguish between opposite leaves and alternative leaves. It also cannot count stamens to distinguish between a family with always 5 stamens and those with more stamens. It would be good to be aware of these limitations and work this in the guidelines.
As far as I understand, the CV can definitely differentiate between these things; just as a single leaf is a type of shape, so too is a branch with opposite leaves, or a branch with alternate leaves. If I fed it 100 photos of branchlets with alternate leaves and told it they were species X, and 100 photos of branchlets with opposite leaves and told it they were species Y, I am fairly sure it would then be able to, with relative high accuracy (and increasing accuracy with more training), ID whether a new photo of one of the two was species X or Y
This is not correct. The 100 photos with branchlets simply will not allow an experienced person to see if it is opposite or not, 100 times. Not all branches are nicely spread out (as in a herbarium specimen). If familiar with that species I may be able to recognise that. The CV is basing the ID on the colour and shape of the leaves, not because it is opposite or not.
Often plants with a few red leaves are being compared and thus identified with plants that have usually red fruit. This is also an indication of bad or not clearly focused photos. I have thought about it when 100’s of American plants are suddenly identified as African species and vice versa, always it is due to one of these factors.
I do not dispute that CV can work well in certain groups (e.g. birds) but it is clearly not working well with others.
I don’t quite follow how the shape of an individual leaf in a photo is any different to the shape/outline of an entire branch, whereby this shape changes based on whether the leaves are opposite or alternate. Why can the CV recognise/pick out the shape of one leaf in a photo (which can also be presented from multiple different angles), but not the overall shape of a branch + leaves?
By your logic, the CV would not be able to differentiate plants A and B below because the leaves are all the same shape, despite one having opposite and one alternate
This is correct. If the photograph is taken from the bottom left (away from the letters A and B) then it could be difficult to recognise opposite or alternative leaves. This may be something easy to experiment with and for the CV programmers to work for, identifying simple diagnostic characters as leaf arrangement. Number of leaflets in a leaf. Count stamens and legs.
Your comments suggest a misunderstanding of how CV technologies work. If “CV programmers” were to implement decision-making, it would weaken the AI, so this simply isn’t done (or is done at a very high level, like giving the system an understanding that a photo is a representation of 3D space). Ie, if we were to implement “outline recognition”, it may create a bias towards recognising shapes that interferes with the system’s ability to recognise other aspects of the image.
All the CV “recognises” is pixels and their colour. Not shapes, not legs, not eyes. If it has been told that a particular clump of pixels is a bird (usually hundreds of times over), then it can start to understand patterns and apply them to new clumps of pixels. Humans, including the people that build these systems, usually have no idea what decision-making criteria the system is applying. We can only guess, by understanding what it gets right and wrong, and usually only then if we have access to the training images (the “known” data).
This is not an easy concept to understand, and I know data scientists who have a hard time wrapping their head around the fact that data that used to require architecting and shaping can now be thrown into the system and it can work out its own rules (something like an old-school pilot learning to trust auto-pilot). However, this is the strength of AI, because it can find patterns that we can’t dream of, and will eventually be able to do this kind of work better than we will (and it already can in certain areas, ie chess).
At the end of the day, the only thing the AI needs to improve is more good-quality data. If we submit and identify more observations of certain organisms, the AI can train on it, learn the patterns and be able to recognise that organism.