Gamify accuracy? Award value to quality, not just quantity

calebcam · July 15, 2020, 7:29pm

I seriously wonder what adult in their right mind would waste their time this way. :-) I guess we’ve found one…

IDers that are way out of their league or adding faulty IDs on purpose. Of course, misbehaving users should be warned too.

marina_gorbunova · July 15, 2020, 7:35pm

Many adults would do that, and waking up to hundreds of agrees to bird observations is something I think everyone had to deal with if the notifications are on. I have no idea why people do that other than leaderboard stuff (but this was already discussed before).

sbushes · July 15, 2020, 7:39pm

Yes! I also wondered about this.
Another criticism I’ve heard in feedback from UK entomologists was the lack of detail in the data.
There’s currently no acknowledgement of the work that goes into annotating data … I agree gamification could help here too.

calebcam · July 15, 2020, 7:44pm

I know a few bird identifiers that went overboard, indeed.

Now … if they are really experts, that’s great if they want to be on the leaderboards. More IDs, more accuracy! I wouldn’t care (and I don’t think anyone really should?) if a kid was using a field guide to ID on iNat, because at least he is using a trusted source. Its the people who just click at the ‘Agree’ who are probably dangerous.

Thumbnail IDing is a no-no for anyone who is serious about IDing, IMHO. Admittedly I have thumbnail IDed before but only if it’s a super obvious species.

scubabruin · July 15, 2020, 7:51pm

I’m not in favor of gamifying iNat at all. However, if it comes to that, I certainly would not award points/recognition/etc to anyone agreeing with an observation that is already RG. At least that may help deter those users so hell bent on being on the leaderboard and overwhelming our notifications of observations that previously reached community consensus.

cmcheatle · July 15, 2020, 8:40pm

It’s not 65% accuracy, it is 65% plus an additional 20% that may or may not be accurate, you can’t interpret the too precise group as inaccurate. Some may be right, some may be wrong, the point is the experts were not able to validate either way.

I’d really be interested in seeing more about what the experts felt were mis-identifications. For example I have a hard time believing almost 10% of the bird records are improperly identified on the site.

For example, the 50 most observed research grade species of birds currently account for almost exactly 30% (30.36% as I write this) of RG bird records. These 50 species are generally highly distinctive, with many eyes looking at them. I won’t say there are no errors in them, but the rate will be very low.

To then get to an overall 10% error rate, means outside of the top 50 species, just about 1 in 7 research grade bird records has to be wrong, I can’t believe that is right.

FWIW - the distribution curve for insects is not that different, the 50 most observed insects on the site represent 19% of all insect RG records. That’s 50 species out of over 79,000 species with a RG record generating a fifth of all records. And just like birds, most of those 50 are pretty distinctive, and relatively easy to ID.

Unless the dataset in the experiment is weighted to match the distribution of records on the site, its relevance as an error measurement is a little unclear to me.

sbushes · July 15, 2020, 11:51pm

Interesting. Where are you getting these stats from? (The 30.36% for example)
Is there a page I’ve not seen or…?

I can see top species listings at least …
And if I go to Diptera (where I’m active), I see some of the top ones globally…

Lucilia sericata has 8048 observations.
My guess, purely based on the UK observations I monitor - about 8000 of those should be at genus level or are incorrect.

Calliphora vicina and Clogmia albipunctata both have 3800 observations.
My guess, purely based on the UK observations I monitor - about 3500 of these should be at genus level or are incorrect.

The top ten in Diptera are actually more like the top ten worst offenders, and the least accurate of all the species level identifications as %s go, due to AI oversuggest and blind agreement from those who know no better perpetuating the issue.

As noted on the parallel thread… placing birds alongside insects can be very misleading and is not comparing like with like. In UK, we have 620 species of bird but 27000 species of insect. We also have many many more active identifiers in birds than insects.

sbushes · July 15, 2020, 11:53pm

Incorrect, no.
Inaccurate, yes.

cmcheatle · July 15, 2020, 11:55pm

I just went to the respective explore pages
https://www.inaturalist.org/observations?place_id=any&quality_grade=research&subview=grid&taxon_id=3&view=species

https://www.inaturalist.org/observations?place_id=any&quality_grade=research&subview=grid&taxon_id=47158&view=species

and then typed the counts of the top 50 into a spreadsheet to calculate.

cmcheatle · July 16, 2020, 12:04am

Potentially inaccurate. If a record is RG as Sympetrum sanguineum and an expert suggested it is genus Sympetrum, that does not mean it is not S. sanguineum.

It may or may not be. There may be a good probability it is not, it may be impossible to tell
based on the evidence, but it does not mean it is inaccurate.

Inappropriately precise does not equal inaccurate.

For example this observation, https://www.inaturalist.org/observations/8086092 I would have no issue with an expert or other putting it at Sympetrum based on the evidence provided.

But there is a high probability (I intentionally chose a record from my local area where I am familiar with distribution) that it is correct. This species far outnumbers the other alternative locally. It is arguably too precise, it is however not provably wrong.

sbushes · July 16, 2020, 1:06am

Well…I hate to argue semantics…but given the relevance to the topic…
I’d say inappropriately precise does indeed equal inaccurate.
And that this is one of the critical issues in Diptera at present…

Again, in less complex taxa the issues are less pronounced.
In your link you have 50/50 chance of being correct…you also allow for distribution and local knowledge …(which the bulk of the misidentifications in the ones I listed will not).
I think it might be a fair call. Not such a big deal at least.

But for the Lucilia I mentioned, in UK we have 7 species.
So thats just a 1 in 7 chance of being correct. A blurry photo with insufficient detail could only ever be accurately recorded at genus. Anything else is just polluting the dataset.

Also worth noting actually, that some of the examples I gave often aren’t even accurate to genus level in iNaturalist observations. The majority of Clogmia albipunctata observations aren’t even Clogmia as far as I know… so they have to be bounced back to family level. Thats more like a 1 in 100 chance of being correct.

sbushes · July 16, 2020, 1:16am

Nice!
Looking at the insect top 50 I can see what you mean though.
Maybe there is just more issues in Diptera than elsewhere due to its complexity.

cmcheatle · July 16, 2020, 1:21am

It’s not a 1 in 7 chance. It would be a 1 in 7 chance if the species were equally and randomly distributed at both the time and place of the observation.

No one, least of all me is suggesting there are not groups where there are too many overly precise identifications. I am however taking exception that one small experiment with unclear parameters, outcomes, even inputs (for example were the experts only shown the photo, or also given access to any comments, descriptions, observation fields filled it etc) demonstrates than 35% of insect records on the site (or 10% of birds or any of the other ones listed) are inaccurate.

sbushes · July 16, 2020, 1:26am

Agreed. I think it was stated by the OP that it was a fairly ad-hoc experiment…
But it would certainly be great to see more methodical and repeated attempts to measure the accuracy.

marina_gorbunova · July 16, 2020, 9:54am

Calliphora vicina has many right ids, though, not the hardest species, but AI loves to name any big fly this species. Clogmia should go to the family level and if it’s not in the cleaning AI wiki, then it should be.

bouteloua · July 16, 2020, 1:37pm

Staff thoughts on gamification:

The site is always going to have some aspects of gamification simply because user stats are shown though, and some people will be checking their stats compared to other users.

One idea I had fleshed out, but don’t think I ever posted, was to show only improving, or only improving+leading IDs in the default view of stats. i.e. don’t tally all the supporting IDs. More details:

Refine ID stats to highlight expertise and disincentivize blind/mass agreement

The proposal was, for several areas where identifications are tallied, to:

include IDs on all observations (for=any) instead of just those on other peoples’ observations (for=others), and
limit to “Leading” and “Improving” IDs…or just “Improving IDs”

Definitions of the four types of IDs: Improving, Leading, Supporting, and Maverick:

Leading: Taxon descends from the community taxon. This identification could be leading toward the right answer.
Improving: First suggestion of this taxon that the community subsequently agreed with. This identification helped refine the community taxon.
Supporting: Taxon is the same as the community taxon. This identification supports the community ID.
Maverick: Taxon is not a descendant or ancestor of the community taxon. The community does not agree with this identification.

The intention of this proposal is to highlight people who can accurately identify taxa and disincentivize mass agreeing to appear at the top of leaderboards and top identifier listings. Whether or not the ID was made on your own observation or someone else’s has no bearing on if you should be highlighted as someone who can identify the taxon. See linked discussions at the bottom of this post for more on mass agreeing and leaderboards (i.e. people who add a lot of “Supporting” IDs, but few “Improving” or “Leading” IDs).

I understand the desire to highlight people who do a lot of identification for others vs. just for themselves, which is why the ID stat could remain as is on user profiles (for=others). If the intention of the leaderboard is to highlight IDs for others, I think it should be limited to Leading/Improving IDs. And, since there will be different intentions when looking at the ID stats on the Explore page and Project pages, it might be better to include several different ways to filter that information. For all stats that vary throughout the website, what is being calculated should be explained clearly in a tooltip. Identification stats areas preferences:

	IDs for others and for self (`for=any`)	IDs for others only (`for=others`)
All ID Types (Leading, Improving, Supporting, Maverick)	Optional, non-default view on Explore and Projects	-User Profile -Calendar (add) -Site Stats (add)
Leading/Improving IDs only	-Explore Identifiers tab default view -Project Identifiers tab default view -Taxon page (Top Identifier) -Observation detail page (Top Identifiers)	-Leaderboard, if an overall leaderboard page has to remain on the site

Default view for Explore/Project pages:

Previously (“Identifications” and “Top Identifiers” should include identifications on personal observations, not just IDs made for others):

@kueda: I personally like showing stats for IDs made for others. It helps highlight people who don’t add a lot of observations but do help other people out a bunch. I feel like it might make sense to show stats for identifications added on your own observations if we only showed improving identifications. That might separate out the people who are good at adding identifications that the community supports from people who just observe a lot and people who confirm a lot. Those are important too, but we already have stats for observers.

And some other related conversations:

gwark · July 16, 2020, 8:23pm

As someone with a bit of an obsessive completionist tendency, I can understand why someone might want to do it even apart from leaderboard stuff.

It’s not hard for me to imagine (especially when I was younger and had more free time) deciding it was fun to spin through 100s or 1000s of observations adding IDs (even if I wasn’t aware of a leaderboard).

That said, from my perspective now, it doesn’t really seem like a sensible way to spend time, but I can still understand why others might feel differently. (It’s also true there are several other things that seem like a strange way to spend time to me as well - for example, adding ssp. IDs for mammals or birds that are based only on range - I’m still trying to work out the benefits of that, and how it’s not just circular reasoning).

I have enough observations (and observations I’m following because I added a comment or ID) that I did find it necessary to turn off notifications for confirming IDs, lest I be flooded with notifications when someone went through and did a bunch of agreeing. I still occasionally have a minor flood of notifications, but they are mostly associated with someone adding a bunch of ssp. IDs to mammals.

sbushes · July 18, 2020, 11:05pm

Yes…I added Clogmia to wiki!..and flagged…

C.vicina has some right IDs sure. Not the hardest, no… but not the easiest either!
In frame of discussion - certainly not a distinctive and easy to ID one of the sort cmcheatle mentioned, that the average member of public will guess correctly…

However… I also realised the numbers I gave for species to @cmcheatle are out…
I wasn’t looking at the RG obs only… in fact only 1000 C.vicina are RG.
And looking properly at the RG as @cmcheatle did when stating this, I see it is indeed more of the distinctive members of Diptera which are logged higher. Not comparable to birds still I think, but a little less pronounced than my comments made out.

marina_gorbunova · July 18, 2020, 11:08pm

The fact is we all make mistakes, when you’re adding a lot of agreeing ids, it’s much easier to overlook them. Birds have enough attention to not do that. Plus, as I purposed in another thread, you always can mark observation as checked, without adding another id.

sbushes · July 18, 2020, 11:17pm

I think this could be a significant improvement ( if I understand you correctly )
Especially given the connected debate today and the comments I’m reading following the agree button change.

I think my one concern would be that a shift to focus on improving IDs would then encourage identification obsessives to be taking things to species, which should really be left at genus, and so on.
So the definition of improvement and how that is rated, is important.

As you say gamification exists regardless. And personally I enjoy this aspect. I think too much can definitely be cheap…it can also be problematic given the context. But a small amount is just good fun. I totally agree with @kueda that its good to show thanks. I think that gamification can be tasteful and provide subtle shaping to community objectives, if done carefully.

Topic		Replies	Views
Overzealous Identification General question	114	14541	September 6, 2019
Over a million identifications General	144	3558	April 7, 2023
Rampant guessing of IDs General	136	8190	September 19, 2021
Speed identifying, quality controls, 4th, 5th, 6th, IDs: community perspective? General question	33	2064	September 14, 2020
Unknowledgeable commentators General	75	4267	August 26, 2020

Gamify accuracy? Award value to quality, not just quantity

Refine ID stats to highlight expertise and disincentivize blind/mass agreement

Related topics