False "research grade" observations

bernhard_hiller · July 9, 2020, 9:17am

There are many “research grade” observations which are plainly wrong. Sometimes I can detect them by a “geographic” species name, like Neurothemis taiwanensis in Borneo: https://www.inaturalist.org/observations/38287280 or a species outside their normal geographical range (e.g. european species in south east asia).
Some users - sometimes the observer, sometimes other people - suggests a wrong species. Well, that can happen. Next, other people just agree with that - and violà, we have a “research grade” observation.
Very often, beginners just go with any id someone provides, just agree with it, without any actual knowledge. And again, that causes a “research grade”.
Such false “research grade” observations may confuse other users who look at the images of a species, and will be shown the false ids, too.
How can we deal with that? I.e. how to prevent such false ids in the first place, and later on, how to detect them for clean up?

robotpie · July 9, 2020, 9:50am

Really the only way I see how to do it is to go through the research grade observations of your chosen location and/or taxa and see if there are some suspicious looking ones. Also I don’t think there is really any way of preventing false IDs as it really depends on the user’s expertise and their reliance on the ID suggestions feature. I think the safest and most polite way is if you know a research grade obs is wrong, just provide the ID as best as you can and if possible provide reasonings. Whether the identifiers choose to act on that new information is really up to them.

bernhard_hiller · July 9, 2020, 10:05am

Exactly that’s the problem. I do not know many plants / animals at species level, but rather at family level. So I fear I’d falsely “destroy” to many ids at species level by doing so.

Just a rather fresh (not research grade) example: https://www.inaturalist.org/observations/52120519
I really doubt that the id given by the observer is ok. I was tempted to say it’s Rubiaceae, but not sure. So I just left a comment and wait for the user to react.

lotteryd · July 9, 2020, 10:05am

Welcome! FYI here is a page related to dealing with some of the same issues that you identify:
https://forum.inaturalist.org/t/computer-vision-clean-up-wiki/7281

If you notice a pattern of misid with a particular species, you can add it to the list. Or, if you notice a familiar species on the list, feel free to help with corrections based on your expertise!

sbushes · July 9, 2020, 11:09am

I think, as mentioned recently on another thread, some sort of welcome tutorial to go through, similar to the one in the forums… to teach new users what this means and the importance of not agreeing blindly to observations, could help significantly with false RG obs and overuse of the autosuggest.

I think often though, the autosuggest should just say… “Not sure on this one” … .
It makes no sense to me that on the one hand it can be “pretty sure” of family at times…but then other times, it isn’t pretty sure of anything…yet suggests a species level ID ! At the very least, these should be genus level suggestions only…

If a new user using an autosuggest ID… as with the second link you posted @bernhard_hiller, I would use the disagree prompt to push it back to genus or family. I think in this instance, with no ID history and the logo indicating use of autosuggest, you can safely presume they don’t know any better.

cmcheatle · July 9, 2020, 11:55am

You can’t ‘prevent’ it. People make mistakes, people (usually students) are assigned to used the site and don’t care about the quality of what they put it, some folks intentionally put in fake identifications, if you turn off the computer vision, people will just use Google or something else and search for 'blue dragonfly ’ or whatever and enter the first thing they find.

As others have said, it is more a matter of finding and trying to clean.

sbushes · July 9, 2020, 12:36pm

Prevent - no…mitigate - yes!

Finding and cleaning might be easy in some countries and some genera, but for others this is arduous work getting bigger by the day. I would say a large percentage - maybe 50-75%? of IDs in UK Diptera right now are simply pulling an unwarranted species level ID back to genus or family. Repetitive actions like this don’t feel like particularly fulfilling work. They just feel symptomatic of larger systemic issues that could be addressed.

If offered a coarser ID, some might just google “blue dragonfly” instead but I think the majority of new users I see appear to just choose the elephant path. They are presented with a list of options …so they choose the top one, or the closest one visually. If you present them only with coarser, genus or family level options, they would opt for that. Realistically - many new users won’t even know the difference between family, genus and species. They are just delighted to put a name to a face, regardless of taxonomic rank.

It also creates a vicious cycle.
The worse the dataset is on GBIF, the less respect it has in the community outside of iNaturalist.
The less experts join to fix the set, the worse the set becomes…

That said! I’m optimistic the AI could potentially reflect back on itself in the future to detect, flag or reassign outliers in its own data…

amarzee · July 9, 2020, 1:02pm

An option to limit the damages would be to prevent the “computer vision” from suggesting species that have not been found on the continent. Working on treefrogs, I very often get suggestion from European or North American species that are not in Asia. If I weed these ones out, the then computer vision is almost always correct.

DianaStuder · July 9, 2020, 1:22pm

Where you notice, where you can, tidy up a mistake.

It is unfortunate, that new people coming from other social media, looking for a way to say ‘thanks for the ID’ can see only one option. Click Agree. With no malice intended.

robotpie · July 9, 2020, 2:05pm

Unfortunately I can’t see any other way besides being knowledgable enough on a certain taxa to at the very least know an ID looks suspicious. Otherwise you also run the risk of contributing towards the false research grade observations.

yayemaster · July 9, 2020, 2:25pm

There are some observations that I have put to Research Grade for a minute, and then redacted the identification. I have had people put observations to Research Grade that are so out there. Sometimes there are four or five false identifications.

krancmm · July 9, 2020, 3:51pm

I agree with the scenarios you’ve presented. However, there remains a big problem, totally unrelated to the myopic computer vision: Agree-bots. Last summer a new user agreed to every single species that needed an agree to reach Research Grade - over 100,000 in less than two months. Maybe 1000-2000 were moths. This individual knew nothing about the moths he was agreeing to, including some very tricky ones that most of us “power-users” didn’t have the knowledge (guts) to agree…most of the agreements were incorrect but are now Research Grade. A professional actually wrote to staff to have those egregious supporting IDs removed (not a single one was leading or improving); my understanding is that he was told by staff, in effect, too bad, so sad. The professional no longer is involved with iNat. So another good IDer gone. Not one of the regular IDers have the time to repair that many bad Research Grade obs.

There should be a mechanism to reverse egregious agree-bot supporting IDs, or as @sbushes predicts:

pssw · July 9, 2020, 4:40pm

I think the Key Word is…Suggest.

As a new user, as I read Tutorials and Forums and connect with people that are more experienced, I am constantly learning more about how iNat works.

I am presented with " Suggest an I.D" so if I know what it is I “Suggest an I.D.”.
If I don’t know, I can put a Placeholder there or leave it blank, This seems to affect it’s presentation to the rest of iNat. I also thoroughly check the ‘identify’ info so as to make my best ‘Suggestion’.
I can also go with an iNat suggestion.
I expect making a suggestion leads to someone agreeing or disagreeing.

I find it odd that it can become Research Grade with only two suggestions. Doesn’t seem to be to critical.
I HOPE that someone who really knows the specifics of the observation will gladly say " this is definitely a … "
I’m here to share the little knowledge I have and to learn from those who know.

iNat is advertised as a way to find the identity of a species for ‘anybody’ that wants to know.
I suggest advertising it as a way to share and verify the identity of species… for serious minded people.
Otherwise you will get more…people that just ‘Suggest’.

One other thing…cryptic answers in Forums…don’t help.

PS …whoops , on the like button.

mertensia · July 9, 2020, 5:42pm

Wow, I hate that story and completely understand why they would leave. Accuracy should be the goal here and not internet points or spam or what have you. Pretty scary.

fogartyf · July 9, 2020, 6:11pm

I see this as one of the biggest challenges to the utility of iNat data. It’s made worse by the fact that there is quite a bit of pushback/disagreement from several active users about even trying to clean/correct these records in the first place.

bouteloua · July 9, 2020, 6:48pm

Hmm, that’s unfortunate.

cmcheatle · July 9, 2020, 7:20pm

I’m not sure there is anyone actively arguing in favour of keeping bad data, at least I would hope not. I do think there are at times some legitimate questions/discussions raised about how it should be done, from a technical and process perspective.

saturnring · July 10, 2020, 12:21am

On the ones that are already Research Grade another option is to check the box in the Data Quality Assessment Section that says the community ID can be improved, and add a comment as to why you think the community ID may not be right. That way, you don’t change the community ID, just the Research Grade status.

Star3 · July 10, 2020, 2:42am

There’s a few suggestions you could upvote related to that (forgive me if you are already aware of these):

https://forum.inaturalist.org/t/better-use-of-location-in-computer-vision-suggestions/915/13

https://forum.inaturalist.org/t/provide-relevant-geographic-data-confidence-level-accuracy-scores-with-ai-suggestions/9226

https://forum.inaturalist.org/t/evaluate-geographic-data-for-inats-suggested-id/6999

bernhard_hiller · July 10, 2020, 3:16pm

Thanks all for background information and suggestions.

I did not know that feature yet, and I am sure I’ll use it quite often.

Topic		Replies	Views
Don't use computer vision General	169	9098	September 18, 2020
Change computer vision suggestions to only above species level Feature Requests	32	2780	May 29, 2019
Issue with users automatically agreeing to an identification General	64	8140	August 14, 2019
Agreeing with experts and "research grade" General	117	14104	September 30, 2019
Problems with wrong suggestions General	31	3268	May 6, 2020

False "research grade" observations

Related topics