Artificial intelligence and misidentifications

djpmapfer · November 19, 2019, 8:52pm

I am something of a newbie with considerable knowledge of microbe identities. I now routinely review new entries for what iNaturalist calls ‘Protozoans’. There is a recurring problem which I suspect may arise from poorly informed artificial intelligence. Micrographs that do not show diagnostic features are being given identities (one of the most common is ‘Euglenoids’) which are not justifiable. I think it highly improbable that dozens of observers make the same mistake. Is there a way to check if iNaturalist algorithmic identification has contributed to the identity that is given. If it were possible to check a few dozen of the erroneous records, we could establish if the artificial intelligence has been trained incorrectly. If so, the community can ask for certain identities to be reset and provide appropriate training materials (if that is how it is done). At the moment, iNaturalist is accumulating a lot of misleading records.

marina_gorbunova · November 19, 2019, 9:28pm

That happens with all complicated groups, if AI suggests wrong taxa it usually mean there’re too few records for the right id, too many photos of the taxa that is suggested or a poor quality photo. AI is also trained by those misidentifications too. With protozoans the website needs tons of high quality photos to learn them right and still it won’t solve the problem fully, as if there’s not enough diagnostic features AI still gives an id and people (especially new to the website) will agree with that.
While those observations don’t have RG and nobody agreed with misid provided by AI it’s ok to have those records, people learn and if they stay at iNat they understand that AI can’t answer all questions so there’s a hope they’ll revisit old obs and will change the taxa. And it’s a task for identifiers to check observations and give a proper identification, I’m not sure how many experts we have for that group, but definitely less than ornithologists for example, so it means wrong ids stay for longer time.
P.S. We have less than 40k of observations for Protozoa, it means AI isn’t trained at all.

paulexcoff · November 19, 2019, 11:26pm

This little symbol appears on IDs that came from computer vision suggestions. 07%20PM

mtank · November 20, 2019, 1:07am

Just to add to the previous responses, the training data isn’t something that can be manually corrected. It is generated automatically using data that meets a certain threshhold.

The only real way to improve the service is to correct any Research Grade observations that are wrong, and apart from that, contribute observations to make sure there are an appropriate number of RG observations for the service to use. However, as there is a huge amount of data used to build the AI model, it isn’t retrained very often - I believe it’s about twice a year, so it may take some time for corrections to flow through.

I agree with @marina_gorbunova that even with good data, the problem might never be fully resolved, but it’s also worth keeping in mind that AI may actually pick up diagnostic features that even experts can’t - so there’s definitely some value in contributing observations.

zookanthos · November 20, 2019, 5:44am

Before the last update to the computer vision, the AI would often suggest mosses and other completely wrong things for all my microscope photos regardless of what it was. It’s still pretty lousy now, but for some things like diatoms it can actually be useful and it at least tends to suggest other microscopic organisms more often. I’m sure after the next update it will be even better.

janetwright · November 20, 2019, 1:08pm

I would SOO love to see an article on the AI procedure and how it works in iNaturalist, including how observers and identifiers can cooperate to make it better. Every time I submit something I wonder about this. @mtank’s comment seems to imply that the AI only uses RG observations in training, and that training (whatever that consists of) isn’t an ongoing process but happens in pulses. Am I understanding correctly (probably not about the RG, but correct me)? If I submit 5 photos with an observation, does the AI use all of them, or just the top one? Has there been a general explanation somewhere? I love the AI; it is often scary good, and when it’s bad it’s often funny.

exonie · November 20, 2019, 2:40pm

There’s already a lot that has been said about computer vision:
https://forum.inaturalist.org/search?q=computer%20vision
Also in the help pages:
https://www.inaturalist.org/pages/help#computer-vision

To summarize what is relevant to the problem:
A taxon needs to have at least 100 obs in order to be included in the training and training happens only two times a year because it is very demanding on computer resources.
Fixing IDs will improve the training next time it is run. But you have to actually add an ID not just comment as I see you have done in many observations. You can give a more general ID and disagree with the previous narrower ID if you are certain it is not justified. Your help is very much appreciated!

janetwright · November 20, 2019, 6:23pm

Thanks, @exonie. I will check those sources. You are right that I sometimes comment on what I think something might be rather than enter an ID, if I have some notion but not a clear opinion. I find that often stimulates a conversation that everybody learns something from, and then I revise the ID later if we make some progress. It’s great that iNaturalist accommodates so many different styles and approaches.

shelley_b · November 28, 2019, 4:34am

Wow, I was unaware of this symbol for IDs that come from the CV. I have to say, I have mixed feelings. I can see it being useful info in some situations. However, I (and I expect many others) regularly select an option from the CV suggestions just because it feels faster than typing it in. For example, I’m subscribed to the taxon Hymeniini, which contains two very common species that the CV almost always gives as its first and second suggestions (as well as incorrectly suggesting it for the occasional other species). I typically know right away which member of the tribe I’m seeing, so I’m not relying on the CV’s expertise at all, despite selecting my ID from its drop-down menu. Knowing about this symbol now, I wonder if some users aren’t giving as much weight to my IDs—or the IDs of other identifiers who use this approach–because they appear to be CV-inspired.

jdmore · November 28, 2019, 7:31am

Same for me with my plant IDs (well, most of them… )

What I have done in these cases (via the web site) is pick the desired name from the CV list, then just backspace once over the last character, and re-pick from the resulting non-CV dropdown list of names (or just re-type that last letter). This turns it into a user suggestion instead of a CV suggestion. (EDIT: typing an extra space, or any other character, then deleting it, should have the same effect.)

That said, after discovering the ability to type short partial names, I rarely do the above any more.

For example (just for everyone’s info), to get:
Chrysothamnus viscidiflorus, I type Chr vis and select
Chrysothamnus viscidiflorus subsp. puberulus, I type Chr pub and select

Same works for common names – one of my favorite things about iNat! Occasionally 4 characters may be needed to narrow down the pick-list sufficiently; sometimes 2 is enough.

Again, this is all web based – not sure if or how all this works in the phone apps.

janetwright · November 28, 2019, 10:51am

I was unaware of the symbol too. I routinely call up the CV even when I know the ID because somewhere I read that helps the CV to train. That is, I thought I was teaching IT something. Oh well.

zabdiel · November 28, 2019, 3:38pm

Possibly some users might not. But I would guess most regular users will have used the computer vision in the same way as you have. At one point I was doing that for almost all ids on my own observations even though I could id most of them anyway (and if I couldn’t ID it I’d be using the CV suggestions as a starting point for checking a field guide).

The help says “Note: It is not possible to tell whether the user selected a computer vision suggestion because they are following the suggestion versus whether they are simply using the tool as an “autofill” to save time and effort typing out species names.”

lotteryd · November 28, 2019, 4:30pm

I do the backspacing thing not with erasure intent, but to pull up the alternate names to doublecheck there’s not some similar-named relative that would be more appropriate. Handily, a variety or subspecies name can populate the list if you add a space after the CV’s suggestion.

jdmore · November 28, 2019, 7:01pm

Just to clarify, we are all teaching the CV by creating correctly identified Research Grade observations, by whatever means. Doesn’t matter if the IDs came from CV or not.

Also, as I understand it, while we are “teaching” CV continuously, it only “learns” every 6 months or so when a new CV model is run to replace the current one.

djpmapfer · December 4, 2019, 9:28pm

I am pleased to be advised of that Symbol. Now, what is the mechanism to tell the system and users that the decision is plain wrong?

reosarevok · December 4, 2019, 9:46pm

My understanding, from what I’ve read around recently, is: you tell the user with a comment (and maybe some hint to not use AI much for this kind of organism, if misidentification is common), and the system gets better on its own (eventually) by actually giving enough observations the right ID.

djpmapfer · December 5, 2019, 1:45am

Regarding reosarevok’s suggestion that we fix the problem with automated identification (AI CV) by commenting to the user, this seems to be misdirecting the corrective effort. The problem lies with the AI system. I want a system where I as an informed user am presented with a proposed identification and I can respond to say that the proposal is incorrect (indeed massively incorrect). Indeed it would be nice to get this with all submissions. And while we are on the theme of corrections, how do we make a note that the subject of an image is not a living organism?

zookanthos · December 5, 2019, 2:00am

If you just wish to say that an ID is incorrect by don’t know what it is yourself, you can ID it as Life and will be given a prompt asking you to pick between two options: “I don’t know what this is, but it’s not that.” or “I’m sure this is a living thing, and it could be what it’s currently ID’d as, but I’m not sure.” The first option will make your ID bump the observation up to State of Matter: Life, unless there are enough other ID’s to override you. The second option will have absolutely no effect in this case unless it was an observation with no IDs at all and then it will change it from Unknown to State of Matter Life. Only useful for things where someone wasn’t sure if it was living or not.

To indicate something is not a living organism, you need to go to the DQA (Data Quality Assessment) area and check no for “Evidence of organism?”. This will make the observation casual and prevent it from showing up in default searches. You can combine that with a disagreeing ID of Life if it was ID’d as something. Just make sure to add a quick comment that you don’t think it is a living organism along with the ID. It would be nice to be able to ID as State of Matter: Abiotic, but that is not the way iNat is set up.

bouteloua · December 5, 2019, 2:23am

See the discussion on this related feature request:

system · February 3, 2020, 2:23am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Can the Computer Vision be stopped from this mis-ID? General	29	1292	December 25, 2021
Don't use computer vision General	169	9719	September 18, 2020
"Helping" the computer vision - is this wrong? General	37	2611	September 10, 2020
Problems with Computer Vision and new/inexperienced users General	134	5499	December 27, 2021
False "research grade" observations General	37	4256	November 1, 2020

Artificial intelligence and misidentifications

Related topics