Should an observation require 3 IDs to reach "research grade" when the observer is just agreeing with a suggestion?

I’ve thought it would be an interesting addition to disagreements about the relative value of digital data vs. specimens to estimate misidentification rates on iNaturalist and in herbaria. It would not surprise me if misidentification rates were higher in herbaria. If you’re checking the IDs (which should be the default in either case, at least for most uses beyond casually poking around in the data), the difference is that with a specimen you can get much more information to use in identification, but at a higher cost in time, effort, and logistics. Whether that trade off is worthwhile is unlikely to have any generally applicable answer.

3 Likes

You’re misinterpreting my point and perhaps misread my initial comment. I’ll try to clarify.

My issue isn’t that one needs to do the analysis before bringing up the issue. Bringing up the issue and asking others for their perspectives is an excellent means to seeing if one’s anecdotal observations are in keeping with those of others. I have absolutely nothing against that. I think that it’s great that @paulexcoff brought it up. And discussing his observation here is a reasonable buffer against the frequency illusion problem, although given comments I’ve seen in this forum previously, I suspect that others here suffer from the same bias, hence it may not be as helpful.

However Paul’s initial post quickly jumped to the next step: Asking what we should do about the (now presumed) problem which he has identified. Simply put, my contention is that one should be certain that it is a problem before jumping to the solutions part of it. Hence why the main point of my initial comment was, “Before proposing a fix, you need to quantify the problem.” It was not to say, “we shouldn’t be discussing this potential issue”.

&

Given that iNat has previously done this, it doesn’t strike me as “exceedingly difficult”. This is also why I believe that the proposal of @paulexcoff (“to require an extra concurring ID to reach research grade”) is likely more costly than the potential benefit.

In 2017, the iNaturalist Staff ran an Identification Quality Experiment, which did that for a reasonable sample (3000 observations) using expert identifiers.

Based on the data so far, 92% of the sample of observations were correctly ID’d. This is relative to the expert’s IDs and assumes the expert is always right

The base accuracy was 92%. And most identifiers have an accuracy exceeding 95%. Some of iNat’s features have since been changed or upgraded (i.e. computer vision), so I suspect that the number is probably better overall now. Although for some taxa, it might be worse. There are some taxon-dependent issues, particularly with those who rely a little heavily on computer vision, but Paul’s query doesn’t touch on that due to the nature of the phenomenon which he is alleging (these would have to be IDs which are most likely correcting or challenging a CV-based ID).

iNaturalist already had a tool / interface for this exact challenge, so that would take out a considerable bit of labour, although it may be easier to simply queue up a list programmatically and distribute that to independent users for verification.

But I still feel doubtful whether the issue described is substantive, given that iNaturalist had (at least in 2017) an average accuracy of at least 92%. Is it really worth increasing the workload by 50% to improve that number, as Paul suggested? What level of accuracy should be required for iNat’s “Research Grade” observations? I know that people would like 100%, but that isn’t practical, particularly in science and given that even expert IDs come with uncertainty. North of 90% strikes me as a good place, particularly given the low cost of the current system.

2 Likes

As I said, we agree that the kind of analysis you describe is desirable. However, I was reacting to the sentence you’ve repeated here: “Before proposing a fix, you need to quantify the problem.” You’re not really in a position to impose demands on other people, and I consider this particular demand to be obviously unreasonable.

You realize that the average commenter doesn’t have the same resources available to them that the iNaturalist staff does, right?

I think I’ll drop out of this part of the discussion after this, but since both you and I apparently enjoy the literature on cognitive biases, I have a suggestion that has been helpful to me and might be helpful to you as well. Put some effort into watching various cognitive biases play out in your own mind. How often you notice these biases in yourself is a kind of scorecard for how well you understand the phenomena.

1 Like

I think you’ve made a very common misidentification. A mountain for a molehill.

How’s that a question? Everyone who did some iding can see that the most of new users jump on each new id an there’s a string of their agreements from e.g. order to species. It also happens with quite a few big/old users, and many new users leave the site, so they never learn what is what, and some don’t care. It is a frequent problem, and OP wanted to have a solution that works only in those cases of observer agreeing with id, not all ids.

2 Likes

#1 can be done programmatically with 3 caveats:

  • you can tell if the observer made an identification that agreed with the previous observation taxon, not that they clicked on the Agree button to make that identification.
  • it’s harder to determine exactly when an observation became research grade. you could infer it based on various assumptions, but i don’t think it’s really necessary to incorporate whether the observation actually became research grade or not. i think tor the purposes of this kind of discussion, it’s enough to assume that an agreeing observation by the observer would push the observation closer to RG, if not to RG.
  • this would be done by taking a random sample of the observations

#2 could be harder to do, depending on how you approach it. you could simply take the numbers from iNat’s latest Computer Vision accuracy study and just say that it’s likely that a significant portion of the time, IDs are correct. or you could try to look for how often a disagreeing identification is made after an observer’s agreeing identification (assuming the full identification history is not destroyed by folks deleting their identifications rather than withdrawing them).

i’m always surprised that folks assume that staff should be responsible for this kind of data collection, or that they would even bother with this data collection just because folks are talking about it on the forum. i can’t speak for how staff think about these kinds of things, but the way i thought about this thread is:

the solution being debated is a change to the community ID algorithm. that’s a major change to the system, and that’s a dealbreaker right off the bat, unless someone has made a really strong case for change. has anyone made an actual case for why the benefit for this kind of change is big enough to be worth doing (especially considering all the other things could be done)? no? well, if it’s not enough of a priority for someone to attempt to make the case, then why are we even discussing this?

in my mind, it’s not clear to me why it matters that observations reach research grade with the wrong ID occasionally. i think the assumption with this kind of community ID approach is that these will be discovered and corrected over time. and as others have noted, if you’re really going to use the data for research, it’s the responsibility of the researcher to either review the underly data themselves for accuracy or to otherwise correct for / factor in potential errors in the data.

i think you did your best to help folks get to the right approach, assuming the end goal is to spur action / change, but, sometimes, i think threads like this aren’t really intending to reach any specific action in the end. so if folks want to just talk to talk, then so be it.

7 Likes

I agree that sometimes people just want to talk and get feedback, which is fine.
I also agree with @murphyslab in the sense that formally making a decision on this issue would require more analysis to proceed. I have no idea what the scale of the possible ‘problem’ is. In my personal experience it seems to be small. But that is only my perception. I have also voiced my opinion on this issue on several posts (see my comment above).

2 Likes

In principle, yes. Without privileged access to the back end, though, I don’t know how you would get all of the identifications associated with a particular observation other than by pulling up the observation on a web browser and manually entering the data you see on the screen. Do you have a way of doing this?

of course. iNaturalist’s API is relatively good at providing a lot of useful information. just make the necessary GET /v1/observations requests using whatever tool or programming language you like. the staff would probably do the exact same thing to get this kind of data.

1 Like

Thanks, I guess I hadn’t gotten that far the last time I was trying to figure out anything with the iNaturalist API. I’m certainly glad I’m not trying to actually do this analysis through the API, though. With access to a pile of related tables this would be easy…

Indeed! And, if the Withdraw function even existed in the iOS app.

When I exclusively used the iOS app, I had people commenting to me to use the Withdraw. At first, I thought they were asking me to withdraw from using iNat at all, and I was a bit disconcerted. Someone finally cleared it up saying I could only withdraw on the website app. Huh? there’s a website with more features than the phone app? :flushed:

7 Likes

No horse in this race whatsoever, as someone that spends time identifying I do see instances where it appears a user simply agrees with someone else’s ID that otherwise wouldn’t be able to come to that conclusion independently.

My reason for posting here though is more around the discussion of quantifying the occurrence of this issue. I would argue this would be very difficult if not impossible to do, given you really can’t be certain when it actually happens. For example, I am relatively new into botany and I will redact change my initial ID (say genus or family) and provide a species-level identification that agrees with the previous user’s identification once I am able to independently come to that conclusion. This would be virtually indistinguishable from an instance where a user is just clicking agree without any knowledge of the actual identification. All said in done the inaccuracy of quantifying the issue may be higher than the rate of occurrence of the issue itself :slightly_smiling_face:

3 Likes

That ‘more than 2/3’ rule racks up quickly with one, or far worse, two disagreements.

3 Likes

YES! I completely agree with this. I’ve come across some observations where 4-5 people have blindly followed a clearly mis-identified posts by the OP or by 2nd parties. I think many novice posters are just trying to support their friends, but this presents a real problem for what is definable as “research grade.”

2 Likes

A bit aggro with the tone here, dude. It’s not a cognitive bias, I made no claims about how widespread the issue was, just that it was a thing that I have observed over a long period of time being an identifier and user of iNat data.

Sure, if I was making a feature request doing a formal analysis of the scale of the problem may be an appropriate first step, but for starting a general conversation about what other people think about this issue I don’t think that’s a requisite first step. And chiding me for not doing a quantitative analysis first is ridiculous. You can just say you don’t think it’s a big problem and roll on by.

8 Likes

That is frustrating, but this proposal would do nothing to change 3+ wrong IDs. All we can do is disagree and maybe recruit others users, if there are any familiar with the species in question. In your case, you can add that you are a published authority on Castilleja and here’s why they are wrong about ‘x’.

Kind of arrogant calling for a massive change in the functionality of a process with zero evidence.

The thing is that you raised two issues, not just one. And the 2nd issue was framed as contingent on the first. And you offered zero evidence for the first. If it were just the 1st issue (“Hey I’m seeing a trend here”), I wouldn’t criticize that. If it were the 2nd issue, separately, I would disagree given the evidence available. But you chose to make that flimsy argument, Paul. And I merely pointed it out.

Yes, I almost always do just that (add a clarifying comment). Because at least 95% of my 40,000+ identifications are of Castilleja species and almost all the rest are of closely related genera, I can provide at least a somewhat numerically-informed comment, if not “rigorously quantified.” I do run across this issue on a regular basis, perhaps 1-2 per 100 observations. I’d also observe that these cases occur with much greater frequency (ca. 9 out of 10) with species that have closely-related species with similar morphologies and occur in the same area. A perfect example is Castilleja exserta and C. densiflora, which are similar in color, growth form, and phenology when not examined closely, and they occur in the same areas and even in the same general location (e.g. Edgewood Park, in the south Bay area). These two species are frequently (ca. 1 out of 4) mixed up, with errors in both directions. Fortunately, unless the photos are really poor quality, a zoom-in allows for definitive ID, when the best characters are looked for. Some posters have responded to me that they just accepted the machine-suggested ID, and then others agreed, for whatever reason. I’d also note that there are other drivers of this problem that pop up only occasionally, such as people who were on the same field trip and were told by a leader or a “checklist” of the site that the plant they found was species x, and then they post their observations and agree with each other.

Anyway, as egordan88 observed, a 3-agree rule would help but likely not solve the problem.

2 Likes

I’m not sure I think 3 minimum identifications is a a good strategy, but I did want to point out that this proposed change would reduce identifier time in some areas.

In my experience, finding and correcting incorrect RG observations is much more time consuming on a per observation basis than confirming/correcting ‘Needs IDs’. So every observation that would be saved from incorrectly reaching RG by upping the bar might potentially save someone time in the long run.

I do think it would ultimately increase effort required overall, just wanted to point out a potential plus that wasn’t just related to better data quality.

1 Like