"Seen Nearby" vision suggestions often lead to incorrect identifications

kuchipatchis · April 29, 2019, 7:11pm

I’ve realized that millipedes are chronically misidentified on inaturalist. Some months ago i went through every millipede observation in Florida and agreed or fixed as I could. I’ve learned more since so I might be due for another screening later, since i fixed a LOT of observations including many that have been promoted to research grade on misidentifications. But right now, I’m just going through specific millipede observations (currently trigoniulus corallinus) and disagreeing with all the wrong ones. Something unfortunate I keep noticing is little areas that look like they have a decent t. corallinus population, and every ID in that little area is wrong. But it looks like since someone IDed their observation as t. corallinus, other clueless people see it listed as “seen nearby” and choose it, and so on, until theres a little fake community of t. corallinus consisting of local millipedes, insects, centipedes, worms, what have you. I don’t know if theres a solution to this, but its frustrating.
I’ve got a lot of work on my hands, and from my understanding anadenobolus monilicornis is way worse, because the taxa image was wrong until I fixed it, so last i checked, the wrong millipede that looks slightly correct + the “seen nearby” has made an entire central american population spanning many countries that isn’t correct. That will be an endeavor to rectify.

cmcheatle · April 29, 2019, 7:19pm

I\m going to give the same answer I have when someone was commenting they wanted common names removed because it causes mis-identifications.

Right now there are 270 species of millipedes that have a research grade record on the site. But of those, fewer than 30 qualify for the computer vision to have been trained on them. And within those are some highly distinctive ones.

If someone runs the suggestion tool on a millipede, there is a very good chance they will get the same set of choices to pick from whether there is a \seen nearby\ indicator or otherwise. And getting the same list returned means they will still pick from that list.

bouteloua · April 29, 2019, 7:32pm

Correcting misidentified Research Grade observations is super helpful, yes. The “Seen Nearby” label only looks at RG observations, so once they are knocked back, that species suggestion in the computer vision results should no longer appear as Seen Nearby.

kuchipatchis · April 29, 2019, 7:33pm

I do understand that, and i do see that happen often, with say the odd mis-ID here and there, but what I see happening is these entire clusters of nearby observations keep getting the same mis-ID sometimes. I hope I cleaned it up, but for example, it looked like there was a strong population in a specific location in California (but many other areas too, it keeps happening), which makes me feel the “seen nearby” had to have been involved, unlike a lone observation in Canada or something, where they would have picked it regardless. It truly seems like in multiple areas I’ve found, one persons lone observation click like you said spawns a false community.

amdurso · April 29, 2019, 7:36pm

It seems that this “bandwagon” effect is one of iNat’s greatest weaknesses. It seems that it could be easily fixed if users were blind to the IDs of other users until an observation had reached e.g. 3 matching IDs.

gcwarbler · April 29, 2019, 7:50pm

@amdurso wrote, “…if users were blind to the IDs of other users until an observation had reached e.g. 3 matching IDs.”

I really like this general idea, in some form. The current City Nature Challenge is causing a tidal wave of mis-identifications, both because of erroneous, geographically inappopriate computer vision suggestions and–presumably–many “seen nearby” suggestions. Getting to Research Grade with just a single concurrence has always been slightly problematical, but after two or three concurrences, the chances of “piling on” incorrect IDs is typically minimized, thus also solving the “Seen Nearby” dilemma.

tallastro · April 29, 2019, 7:56pm

I also like the blind ID idea until some consensus is reached. I would recommend displaying some higher than species level name to make it possible for IDers to find observations in their area of interest.

Italopithecus · April 29, 2019, 7:57pm

I well understand you ;-)…
maybe the “seen nearby” feature has much to do with this but, in my opinion, it is also because in many cases, especially for critical taxa, “common” users keep to misidentify observations not taking into account that things may not be as simple as they could seem. And I know that correcting many misidentied observations may be tiring, especially if you decide to explain users the correct ID.

For example, here in Italy for many users almost every wild rose is Rosa canina as well as almost every Ornithogalum is Ornithogalum umbellatum and so on…

mangum · April 29, 2019, 8:09pm

Thank you for taking the time to do that! It is tedious work, but it is really a big help when experts like you come through and help get identifications back on the right track.

I run into the same problem with green shield lichen (Flavoparmelia caperata) down here in Florida. Unfortunately, the computer vision suggestions are not great with lichens in general. In many areas the primary suggestion for any greenish lichen is almost always green shield lichen. That paired with “seen nearby” creates a hurricane of wrong IDs centered around research grade observations. These research grade observations are often correct, but the observations that are the result of this storm are often not correct.

I admit, I don’t know the best way to tackle this problem. I 100% agree, correcting misidentified research grade observations is definitely helpful. How to combat the “bandwagon effect” may be a harder problem to fix. However, I’m not sure whether a blind ID is the correct fix. I would need to think on that.

kueda · April 29, 2019, 8:14pm

For what it’s worth, part of the problem is that our vision system currently only makes species-level predictions, and for many taxa in many places, including millipedes, species-level identifications are impossible from photos. Our vision system does its best with the species it knows about, which usually means recommending species from other places that look similar.

One solution we’re working on is a vision model that knows about ranks above species level, so if it doesn’t have much training data for some species but it does for a higher taxonomic level that contains those species, it will be more likely to recommend a genus instead of a (wrong) species. For example in California, we have a genus of millipedes called Tylobolus that generally can’t be identified to species, but which looks a lot like Narceus americanus, a species from the eastern US. With this change to the vision model, hopefully it will stop recommending Narceus americanus so highly in California and instead make the more appropriate recommendation of genus Tylobolus.

Another change that tons of people have requested and that we’re looking into is removing species that have not been seen nearby from vision suggestions. This wouldn’t solve the problem of observations by naive identifiers inserting species into “seen nearby” but it might stop some of these poorly-identified observations from happening to begin with.

Also, I’m going to change the title of this thread to be more descriptive. “Like a virus” made me think there was some kind of malware issue happening. I guess iNat itself is malware from some perspectives…

sttpgh · April 29, 2019, 8:15pm

I wonder if the City Nature Challenge causes mis-ids due to the competitive desire by non-qualified identifiers to bring an observation to Research Grade. This has probably been suggested before, but I wonder if another grade up like “Verified” would help, in general. This would be a grade assigned by a master identifier, if you will. Sorry if this has been covered before, I’m new to the forums, and really just an amateur on iNat. But I wanted to suggest that, as the latest City Nature Challenge has me wondering how many of my Research Grade obs are really truly correct.

charlie · April 29, 2019, 8:15pm

iNat is growing exponentially and this mostly is comprised of people who are more observers than identifiers, at least at first. The frustrations are valid and you all have seen that i have mine as well. But we should also put it in context that we are part of this thing that is exploding in use where millions of people are adding biodiversity data. I think our biggest task is to get ahead of the growth wave, stop marketing for new growth (not gonna happen but i can dream) so it only grows organically, and do various things both from the development standpoint and the community standpoint to clean up the data and get it fed into the algorithm and adjusted. Yes it’s frustrating, especially if you are newbie here and suddenly see all this wrong stuff. But when I get frustrated i just pull up something like this. Yes there are bad data points such as that one in Montpelier i see right now but just look at that DATA! it’s amazing. it’s important. it’s huge. Someday we will get it to the point where harder taxa are like this too.

cmcheatle · April 29, 2019, 8:26pm

I dont think there is a single behaviour driving any of the mis-identifications from the computer vision tool.

The exact opposite to the ‘seen nearby’ causes people to pick it is also true. I just did an admitedly small survey of incorrect id’s for a species pair I correct almost daily. Specifically the Great Blue Heron of North America vs the old world Grey Heron.

I found the 20 most recent cases where someone identified a bird that is actually a Great Blue as a Grey, and they chose therir id from the computer vision suggestions. I then ran the CV on it to see what it pumped out.

100% of the time, Great Blue Heron was both the first suggestion and had the seen nearby label, and Grey Heron was further down (usually 2nd since they look so similar) and did not have that seen nearby label. Yet all those users chose Grey Heron for their id.

charlie · April 29, 2019, 8:27pm

i have seen this with plants too… cases where the correct species was displayed first by the algorithm and someone searches below and chooses a wrong species instead.

KitKestrel · May 1, 2019, 2:02am

I think the comments touching on the role of CNC in the current situation are spot-on. It’s a classic case of processes that work for an organization where everyone can sit around the conference table are rarely usable when you can’t fit everyone into a footie stadium! (I’ve been listening to Aussie detective audiobooks & it’s rubbed off lol). I do think getting to RG should require more IDs, tho it won’t solve everything and will boost the ID backlog by a lot.

We’re dealing with some of the real life challenges of generating big data. Folks (in many fields) are all excited about big data, and think all it takes is getting a bunch of datasets and sticking them all together, with some vague hand waving about how AI will take care of the quality issues, and go to town running analyses and thinking the results actually mean anything! We’re sitting on the bleeding edge of the reality down in the weeds - the creation of the data and the rules necessary to ensure that people in the future who want to use the data understand what it is and isn’t good for. The data will never be “perfect” or completely “clean,” but with a consistent set of rules applied with forethought, we can create something that is “good enough” for a heck of a lot. So, it will take time, and patience, and imagination, and lots of people willing to go in and fix the errors until we get the tools that can help us with that, and with preventing them.

Sorry - I can go on at some length about this. It’s one of the areas I care deeply about.

janetwright · May 1, 2019, 2:28am

Could there be a wiki advising identifiers of things they can do to make their own lives easier by helping train the AI? I did not realize, for instance, that correcting bad RGs is so important. Many of us have some taxon we’re passionate about for one reason or another. If we knew more about how our efforts could improve the Computer Vision for that taxon, many of us would put in that effort.

jdmore · May 1, 2019, 4:42am

At risk of suggesting another Feature Request, I wonder if a user/account-level setting to allow a user to have their observation/ID filters default to “Needs ID” plus Research Grade observations would be helpful? The current filters default to “Needs ID” only, and I am constantly opening the filter dialog and adding a checkmark to “Research Grade” before proceeding with my IDs.

And then I could go way off topic and suggest similar account-setting functionality to control many of the other filter defaults too…

jenssommer01 · May 1, 2019, 6:35am

Couldn’t agree more - this is great idea!

charlie · May 1, 2019, 12:07pm

I do that sort of ID help sometimes too, that way i am doing at least some basic level of QA on ‘research grade’. I wish it were possible to filter for needs ID plus just 2 IDs type research grade stuff because when i run that filter I also get things that like 6 people have already identified which are less likely to be wrong

jdmore · May 1, 2019, 8:16pm

OK @charlie @janetwright after seeing a similar thought on another thread, I went ahead and made a Feature Request. Didn’t drill down as far as your idea of limiting to no more 2 agreeing IDs, as I think that might require a new API parameter to implement. I would sure use it though if it existed!

Topic		Replies	Views
Has the computer vision reverted to suggesting taxa from the wrong continent? Bug Reports web	3	769	July 15, 2021
Possible increase in CV errors around organism range/location General	22	932	November 1, 2022
ID Tool and Persistent Errors General	5	493	October 18, 2019
Wrong Seen Nearby? General	6	678	November 30, 2022
Inclusion of barely observed species in computer vision suggestions General	6	960	August 29, 2021

"Seen Nearby" vision suggestions often lead to incorrect identifications

Related topics