Should an observation require 3 IDs to reach "research grade" when the observer is just agreeing with a suggestion?

I agree.
Just because an observation has not attained “research grade” does not invalidate its use in research. There will always be some obs at research grade that don’t belong there.
Give the researcher some credit. It is their choice what to include.
I see no need for change as there will always be some who don’t do as they should, whether accidentally or on purpose.

2 Likes

What you are describing is a cognitive bias known as the frequency illusion:

after noticing something for the first time, there is a tendency to notice it more often, leading someone to believe that it has a high frequency of occurrence.

Before proposing a fix, you need to quantify the problem. As I see it, your alleged problem has two factors:

  1. How often, out of all RG observations RG, has been achieved solely through the OP agreeing with the first person to ID to species.
  2. How often, out of all of those where (1) has happened, the consensus ID has been incorrect.

These are things which can be measured.

Again, yes, it is “too dramatic step” particularly when you have not quantified the scale of the alleged problem.

Again, y ou have failed to quantify the problem. You have no basis to say “a lot of bad data” without … you know… actual data.

First empirically quantify the scale to demonstrate that it is a problem worth acting upon.

3 Likes

I disagree. Your way appears designed to lead to inaction on problems.

However, discouraging users from adding extra agreeing IDs will reduce the number of high-confidence IDs. I think we SHOULD add extra, confirming IDs if we can be bothered, and this will allow data users to select observations with relatively high confidence in their ID.

1 Like

It would help if the withdraw button was visible :
https://forum.inaturalist.org/t/make-the-withdraw-function-visible-as-a-button-on-the-observation-block-with-a-connected-tool-tip/14659

3 Likes

Please provide evidence that the “problem” exists; that this is a substantive issue. That is what I am asking. It is not impossible, nor is it unreasonable.

Energy and effort on the part of those who run the site, as well as human volunteers has a cost and is finite. Blindly asking others to do something by alleging a “problem” without evidence risks diverting those energies from more important or more productive matters.

Each approach demands effort, but the act of measuring the scale of an issue has far less risk and requires far less effort than changing out the site works without any evidence.

4 Likes

Problem is, we almost never know why they are concurring. They may have hit the agree button as a “like” or “thank you” to the identifier, or may have selected a Computer Vision suggestion because they didn’t know a better option to use. Or, they may have done either of those things because they actually knew what the organism was, and it was a convenient short-cut for adding their own ID.

I think the better solution is to provide a “like” or “thumbs-up” option on IDs and comments, and to rename the Agree button to something that more clearly describes what “Agree” is supposed to mean.

Exactly. There are plenty of “research grade” scientific specimens in herbaria and museums that are not (yet) correctly identified. They still have great value for research - arguably more value for the very reason that they may be difficult to identify because they don’t fit existing taxonomic hypotheses very well.

In that sense, every observation on iNaturalist is “research grade” if it contains any discernable evidence at all of what the organism was, and where and when it was seen. On iNaturalist, the “proxy” for sufficient evidence has been whether more than 2/3 of the identifiers can agree on what it is. But lack of such agreement doesn’t mean that there is lack of sufficient or valuable evidence.

6 Likes

While it is possible I think it is unreasonable to expect it at this stage of the conversation, and unnecessary.

What is happening here is just spitballing - someone throws an idea in and it gathers feedback from people with the same or different perspectives, supported by opinions and a bit of anecdata. The amount of energy required to do this is minimal. It’s an iterative, incremental way of making sure that not too much effort is put into a change before the change is deemed worthwhile - “good”* ideas gather momentum, “bad”* ideas wither on the vine.

When (well, if) further investigation is deemed worthwhile, then sure, go do a proper problem assessment, sizing, impact analysis, cost benefit assessment etc etc.

*Subjective, ultimately. I suppose “popular” and “unpopular” would be more appropriate.

5 Likes

Careful there! You’re using a classic debating technique to stall a conversation.

You’re under the mistaken impression that one needs to provide double-blind, peer-reviewed references in order to have a conversation.

8 Likes

You can have 2 people blindly agree as easily as one, so the change imo would create more problems than solve. Not all species/regions have the same number available identifiers.

Besides, no one says you need to use the research grade, you can use the number of identifications for your work, if you require 4, than use only records with 4, your choice.

4 Likes

No, this is about someone making a specific claim in with a complete absence of evidence:

(Moreover, I have offered a plausible cause for the impression that @paulexcoff has noted.)

Such claims, made without evidence, should be dismissed:

“What can be asserted without evidence can also be dismissed without evidence.” It implies that the burden of proof regarding the truthfulness of a claim lies with the one who makes the claim; if this burden is not met, then the claim is unfounded, and its opponents need not argue further in order to dismiss it.
https://en.wikipedia.org/wiki/Hitchens’s_razor

Maybe a problem does exist, but if one does, one should not jump to the next step (as @paulexcoff did), asking, “what must be done about it?” Rather the next question should be, “Can we quantify the alleged problem? And if so, how?”

I’ve done plenty of IDs (roughly just as many as Paul) and I’m not seeing a substantial problem as he seems to be claiming. Hence quantifying it. Otherwise the effort (1) wastes time and effort and (2) sets perfection above good enough, despite the 50% increase in effort which he suggests.

1 Like

In the scenario where ID A and B are agreeing and user C comes along to disagree, it currently requires at least 1 more person to agree with A or 3 more people to agree with C to reach RG again. If the bar was increased to 3, does the math for maverick adjust accordingly? That’s asking a lot when identifiers are outnumbered 10:1.

2 Likes

I think your point has been made.

The community can still have a conversation around the OP’s hypothesis, even if it hasn’t been tested yet. Anyone who feels that it is a waste of time and effort is not obligated to participate.

8 Likes

Another use case for having complete and accurate identification histories be easily accessible! :-)

I agree that this kind of analysis is desirable, but expecting someone who thinks they have noticed a problem to engage in this kind of analysis before saying anything is absurd. The first part, here, should be relatively easily doable for someone with some scripting experience, if we had access to complete identification histories–which we don’t. The second part would be exceedingly difficult, though not impossible.

4 Likes

I’ve thought it would be an interesting addition to disagreements about the relative value of digital data vs. specimens to estimate misidentification rates on iNaturalist and in herbaria. It would not surprise me if misidentification rates were higher in herbaria. If you’re checking the IDs (which should be the default in either case, at least for most uses beyond casually poking around in the data), the difference is that with a specimen you can get much more information to use in identification, but at a higher cost in time, effort, and logistics. Whether that trade off is worthwhile is unlikely to have any generally applicable answer.

3 Likes

You’re misinterpreting my point and perhaps misread my initial comment. I’ll try to clarify.

My issue isn’t that one needs to do the analysis before bringing up the issue. Bringing up the issue and asking others for their perspectives is an excellent means to seeing if one’s anecdotal observations are in keeping with those of others. I have absolutely nothing against that. I think that it’s great that @paulexcoff brought it up. And discussing his observation here is a reasonable buffer against the frequency illusion problem, although given comments I’ve seen in this forum previously, I suspect that others here suffer from the same bias, hence it may not be as helpful.

However Paul’s initial post quickly jumped to the next step: Asking what we should do about the (now presumed) problem which he has identified. Simply put, my contention is that one should be certain that it is a problem before jumping to the solutions part of it. Hence why the main point of my initial comment was, “Before proposing a fix, you need to quantify the problem.” It was not to say, “we shouldn’t be discussing this potential issue”.

&

Given that iNat has previously done this, it doesn’t strike me as “exceedingly difficult”. This is also why I believe that the proposal of @paulexcoff (“to require an extra concurring ID to reach research grade”) is likely more costly than the potential benefit.

In 2017, the iNaturalist Staff ran an Identification Quality Experiment, which did that for a reasonable sample (3000 observations) using expert identifiers.

Based on the data so far, 92% of the sample of observations were correctly ID’d. This is relative to the expert’s IDs and assumes the expert is always right

The base accuracy was 92%. And most identifiers have an accuracy exceeding 95%. Some of iNat’s features have since been changed or upgraded (i.e. computer vision), so I suspect that the number is probably better overall now. Although for some taxa, it might be worse. There are some taxon-dependent issues, particularly with those who rely a little heavily on computer vision, but Paul’s query doesn’t touch on that due to the nature of the phenomenon which he is alleging (these would have to be IDs which are most likely correcting or challenging a CV-based ID).

iNaturalist already had a tool / interface for this exact challenge, so that would take out a considerable bit of labour, although it may be easier to simply queue up a list programmatically and distribute that to independent users for verification.

But I still feel doubtful whether the issue described is substantive, given that iNaturalist had (at least in 2017) an average accuracy of at least 92%. Is it really worth increasing the workload by 50% to improve that number, as Paul suggested? What level of accuracy should be required for iNat’s “Research Grade” observations? I know that people would like 100%, but that isn’t practical, particularly in science and given that even expert IDs come with uncertainty. North of 90% strikes me as a good place, particularly given the low cost of the current system.

2 Likes

As I said, we agree that the kind of analysis you describe is desirable. However, I was reacting to the sentence you’ve repeated here: “Before proposing a fix, you need to quantify the problem.” You’re not really in a position to impose demands on other people, and I consider this particular demand to be obviously unreasonable.

You realize that the average commenter doesn’t have the same resources available to them that the iNaturalist staff does, right?

I think I’ll drop out of this part of the discussion after this, but since both you and I apparently enjoy the literature on cognitive biases, I have a suggestion that has been helpful to me and might be helpful to you as well. Put some effort into watching various cognitive biases play out in your own mind. How often you notice these biases in yourself is a kind of scorecard for how well you understand the phenomena.

1 Like

I think you’ve made a very common misidentification. A mountain for a molehill.

How’s that a question? Everyone who did some iding can see that the most of new users jump on each new id an there’s a string of their agreements from e.g. order to species. It also happens with quite a few big/old users, and many new users leave the site, so they never learn what is what, and some don’t care. It is a frequent problem, and OP wanted to have a solution that works only in those cases of observer agreeing with id, not all ids.

2 Likes

#1 can be done programmatically with 3 caveats:

  • you can tell if the observer made an identification that agreed with the previous observation taxon, not that they clicked on the Agree button to make that identification.
  • it’s harder to determine exactly when an observation became research grade. you could infer it based on various assumptions, but i don’t think it’s really necessary to incorporate whether the observation actually became research grade or not. i think tor the purposes of this kind of discussion, it’s enough to assume that an agreeing observation by the observer would push the observation closer to RG, if not to RG.
  • this would be done by taking a random sample of the observations

#2 could be harder to do, depending on how you approach it. you could simply take the numbers from iNat’s latest Computer Vision accuracy study and just say that it’s likely that a significant portion of the time, IDs are correct. or you could try to look for how often a disagreeing identification is made after an observer’s agreeing identification (assuming the full identification history is not destroyed by folks deleting their identifications rather than withdrawing them).

i’m always surprised that folks assume that staff should be responsible for this kind of data collection, or that they would even bother with this data collection just because folks are talking about it on the forum. i can’t speak for how staff think about these kinds of things, but the way i thought about this thread is:

the solution being debated is a change to the community ID algorithm. that’s a major change to the system, and that’s a dealbreaker right off the bat, unless someone has made a really strong case for change. has anyone made an actual case for why the benefit for this kind of change is big enough to be worth doing (especially considering all the other things could be done)? no? well, if it’s not enough of a priority for someone to attempt to make the case, then why are we even discussing this?

in my mind, it’s not clear to me why it matters that observations reach research grade with the wrong ID occasionally. i think the assumption with this kind of community ID approach is that these will be discovered and corrected over time. and as others have noted, if you’re really going to use the data for research, it’s the responsibility of the researcher to either review the underly data themselves for accuracy or to otherwise correct for / factor in potential errors in the data.

i think you did your best to help folks get to the right approach, assuming the end goal is to spur action / change, but, sometimes, i think threads like this aren’t really intending to reach any specific action in the end. so if folks want to just talk to talk, then so be it.

7 Likes