“Research grade” is misleading also because there are taxa that may have very usable observations even if they’re not ID’ed to the species level. For example, a whole lot of marine invertebrate IDs (at least to the full potential of visual observations) stop at family or genus levels, and they are good enough to be used in research or checklists at that level.
However, I do not expect that it would be easy to draw a line at coarse-level ID consensus… this would differ too much from case to case for an algorithm to manage on its own.
But to come back to an alternative label for what is currently “research grade” - maybe just a simple green check mark would suffice (with as long and verbose an explanation as required, for anyone who wants to delve into the full meaning of the check mark).
While I agree that observations can be very useful and never be eligible for “Research Grade”, in this case they usually can be made “Research Grade” with the “ID is as good as it can be” checkbox in the DQA.
Highlight the words in some appealing way(for example, give it a green light symbol or a green border) if the rest of the DQA is fine.
If not, skip the highlighting for “ID accepted” and consider identifying the problem with the observation in words or with a hyperlink (for example, “Lacks observation date” or “Lacks #s 2, 3”).
BTW, when I first saw “Research Grade” on an observation, I assumed it meant the photo quality met iNat’s documentation standards. And, when I first read “can the Community ID still be confirmed or improved?” I thought a “No” meant the photo was really blurry.
I don’t know of any scientists who would blindly use “RG” observations assuming they are correct IDs. RGs are a decent starting point for a dataset to vet for research, that’s all. I don’t think there’s a real danger of incorrect RGs leading researchers astray.
And there is by no means an “immense” number of wrong RG ids, at least not proportionately speaking. For plants in areas I look at the error rate isn’t much higher than it is in “real” unvouchered data taken by field techs and stuff. And the inat data is by no means the most problematic data on GBIF either. All data has issues and you need to consider that when using it. It is what it is.
Charlie is definitely correct–all data has issues and anyone using outsourced data should be especially cautious (at least if they have any integrity).
It doesn’t make sense for any of us to say there are or are not a large portion of IDs because the scope is much too broad and no one has done a comprehensive analysis to speak for the entire body of data on iNaturalist. Anything we could say is just conjecture or highly-specific examples. For example, I have a lot of ideas about certain things based on what I’ve looked at with dragonfly data, but the way people relate to and perceive plants or fungi is quite different and often culturally-influenced. My own musings only make sense in certain context.
Also, this is not at directed at any one person but just to remind ourselves there are higher standards to qualify ourselves by–whether or not iNaturalist is the most problematic data on GBIF or not is not a relevant metric to evaluate.
In the end, this thread is about renaming “Research Grade” as it is somewhat of a haughty misnomer, not debating the overall accuracy of data on iNaturalist. It could definitely be improved, as could everything in the world. I think we are all trying to move toward that in our own ways or we wouldn’t be participating.
We’ve been doing a lot of analyses of the proportion of incorrectly ID’d Research Grade obs. From the experiments we’ve done, its actually pretty low, like around 2.5% for most groups we’ve looked at.
You could argue that this is too high (ie we’re being too liberal with the ‘Research Grade’ threshold) or too low (we’re being to conservative) and we’ve had different asks to move the threshold one way or another so I imagine changing would be kind of a zero sum game.
One thing we have noticed from our experiments though is that our current Research Grade system (which is quite simplistic) is that we could do a better job of discriminating high risk (ie potentially incorrectly ID’d) from low risk (ie likely correctly ID’d) into Research and Needs ID categories. As you can see from the figures on the left below, there’s some overlap between high risk and Research Grade and low risk and Needs ID. We’ve been exploring more sophisticated systems that do a better job of discriminating these (figures on the right).
We (by which I mean Grant Van Horn who was also heavily involved in our Computer Vision model) actually just presented one approach which is kind of an ‘earned reputation’ approach where we simultaneously estimate the ‘skill’ of identifiers and the risk of observations at this conference a few weeks ago: http://cvpr2018.thecvf.com/
Still more work to be done, but its appealing to us that a more sophisticated approach like this could improve discriminating high risk and low risk obs into Needs ID and Research Grade categories rather than just moving the threshold in the more or less conservative direction without really improving things
Yeah, I was responding to “just conjecture” since they’ve definitely looked into levels of accuracy.
To loop the discussion back around, Scott mentioned potentially weighting IDs differently and/or that the >2/3 agreement threshold may not be the same standard to reach “research grade” (i.e. remove from “needs ID”) in different risk/accuracy scenarios. So community/majority consensus or even “community” at all may be irrelevant.
If, say 5 years down the road, the computer vision IDs a certain species correctly 99.9% of the time, could an observation IDed by CV be “research grade” without a 2nd confirming human ID? Why put those in the default “Needs ID” pool at all? :)