I don’t know of any scientists who would blindly use “RG” observations assuming they are correct IDs. RGs are a decent starting point for a dataset to vet for research, that’s all. I don’t think there’s a real danger of incorrect RGs leading researchers astray.
And there is by no means an “immense” number of wrong RG ids, at least not proportionately speaking. For plants in areas I look at the error rate isn’t much higher than it is in “real” unvouchered data taken by field techs and stuff. And the inat data is by no means the most problematic data on GBIF either. All data has issues and you need to consider that when using it. It is what it is.
Charlie is definitely correct–all data has issues and anyone using outsourced data should be especially cautious (at least if they have any integrity).
It doesn’t make sense for any of us to say there are or are not a large portion of IDs because the scope is much too broad and no one has done a comprehensive analysis to speak for the entire body of data on iNaturalist. Anything we could say is just conjecture or highly-specific examples. For example, I have a lot of ideas about certain things based on what I’ve looked at with dragonfly data, but the way people relate to and perceive plants or fungi is quite different and often culturally-influenced. My own musings only make sense in certain context.
Also, this is not at directed at any one person but just to remind ourselves there are higher standards to qualify ourselves by–whether or not iNaturalist is the most problematic data on GBIF or not is not a relevant metric to evaluate.
In the end, this thread is about renaming “Research Grade” as it is somewhat of a haughty misnomer, not debating the overall accuracy of data on iNaturalist. It could definitely be improved, as could everything in the world. I think we are all trying to move toward that in our own ways or we wouldn’t be participating.
“from the groups we’ve looked at” is what it says.
I’m not seeing that quote. I’ll just copy Scott’s post over so that it’s here for easier reference.
We’ve been doing a lot of analyses of the proportion of incorrectly ID’d Research Grade obs. From the experiments we’ve done, its actually pretty low, like around 2.5% for most groups we’ve looked at.
You could argue that this is too high (ie we’re being too liberal with the ‘Research Grade’ threshold) or too low (we’re being to conservative) and we’ve had different asks to move the threshold one way or another so I imagine changing would be kind of a zero sum game.
One thing we have noticed from our experiments though is that our current Research Grade system (which is quite simplistic) is that we could do a better job of discriminating high risk (ie potentially incorrectly ID’d) from low risk (ie likely correctly ID’d) into Research and Needs ID categories. As you can see from the figures on the left below, there’s some overlap between high risk and Research Grade and low risk and Needs ID. We’ve been exploring more sophisticated systems that do a better job of discriminating these (figures on the right).
We (by which I mean Grant Van Horn who was also heavily involved in our Computer Vision model) actually just presented one approach which is kind of an ‘earned reputation’ approach where we simultaneously estimate the ‘skill’ of identifiers and the risk of observations at this conference a few weeks ago: http://cvpr2018.thecvf.com/
you can read the paper ‘lean multiclass crowdsourcing’ here:
Still more work to be done, but its appealing to us that a more sophisticated approach like this could improve discriminating high risk and low risk obs into Needs ID and Research Grade categories rather than just moving the threshold in the more or less conservative direction without really improving things
Thank you @bouteloua, I mis-typed it, sorry. Here is what it says:
This is becoming tangential to the topic of renaming “Research Grade” as it is more about actual data accuracy but my point was for us to not try and make generalizations out of disparate datasets.
Yeah, I was responding to “just conjecture” since they’ve definitely looked into levels of accuracy.
To loop the discussion back around, Scott mentioned potentially weighting IDs differently and/or that the >2/3 agreement threshold may not be the same standard to reach “research grade” (i.e. remove from “needs ID”) in different risk/accuracy scenarios. So community/majority consensus or even “community” at all may be irrelevant.
If, say 5 years down the road, the computer vision IDs a certain species correctly 99.9% of the time, could an observation IDed by CV be “research grade” without a 2nd confirming human ID? Why put those in the default “Needs ID” pool at all? :)
More like an immense proportion. What prompted me to join here, is that I noticed ~80-90% of RG ids in the taxa I work with on here were wrong.
It might be helpful for you to tell us what this mystery taxon is.
The result of extrapolating the “data quality” from specific taxa to all organisms is conjecture.
Aquatic insects in general, and especially Coleoptera and Hemiptera. Many of these cannot be identified from photos, especially a single photo, and most have many very similar species.
It seems obvious that an organism that can’t be identified by photos won’t be identified via photos. That’s the vast minority of species posted on inat. It’s been discussed before and yes at some point it might make sense to make some mechanism to keep these from becoming research grade at a toon specific taxonomic unit but that isn’t set up yet.
I agree with this. I don’t like that I can agree with somebody whogives id tips when I don’t know what the id is otherwise, or that somebody cna agree with me without knowing what the id is, even if I [think I] know it, and it becomes research grade. The one thing is that’s it’s used for research but the more important thing to me is it not longer comes up in needs id and I’m afraid it won’t be found and rectified if it needs to be. There are things I think I know, and I would like to id them, in case somebody can confirm, but I also expect the id to be duplicated, by a certain user/s, so usually I make a comment, or only do the id if I feel like it’s ok if my ID settles it, which is probably good ettiquette anyway but still a little creepy, how easy it is to get RG from a duplication. (Three people would mean at least somebody besides the poster and the first person to id has to agree, which would be safer feeling.)
And in reading the forums I realize this is true and is probably very true among people who know their taxa and so I worry less about it now.
It is difficult to separate the discussion of the label from the discussion on how the label is allocated and, therefore, what it actually means. I believe that the word “Community” attached to any label suggests a broad consensus. Assuming a change to the label will be made before a change to the way the status is derived, it would still be assigned when only two people agree, one of whom can be the observer, which does not represent anywhere near a community consensus.
For newbies looking for a Like or <3 or Thank You option - clicking agree tips their obs to Research Grade … which is not a good way for iNat to function.
And some downvote the Not Wild, because they WANT an ID, thank you.
Which makes for a layer of confusion, and skews the distribution maps.
I propose “Community Reviewed”.
Similar to peer reviewed. It does not imply that the ID is correct and only that the members of the community think it’s correct. All of the above options suggest a level of quality that is not necessarily present. Community reviewed is not a quality assessment but a descriptive one. It can imply higher quality, but is flexible enough to understand that there are times when it fails to provide the correct ID.
I like this. I’ve seen a couple endorsements in other places as well in the past couple days.