iNat's top identifier algorithm is broken

iNat is much more than a ‘place to share pictures’
People can do that on Facebook or Instagram or …

7 Likes

iNaturalist is different things to different users. One of the goals of iNaturalist is to make a huge database of what’s where when and then let scientists use that in their research. It’s pretty effective at that.

22 Likes

The Top Identifier is more of an indication of the amount of time someone has spent identifying a given taxa, not how knowledgeable they are about that taxa, and no matter how it’s calculated that will probably always be true. I’ve become the top identifier in lots of taxa that I’m not particularly knowledgeable because I’ve identified hundreds or thousands of “easy” observations where I’m sure there are no similar species in range, but wouldn’t be able to ID them out of that context (where a true expert would be able to). Likewise I’ve learned a few local species well enough to correctly ID them when I photograph them, and have uploaded hundreds of observations of those taxa (where my ID is the first correct ID), but I am far from an expert, and mainly ID them based on pattern recognition, not really knowing what the diagnostic field marks are.

16 Likes

As someone who does a lot of identifications within my area of expertise, I still often agree with people, under this proposal any expert who puts in a correct ID based on their own knowledge does not count if the observer was also correct

Basically this does not count any correct ID unless the observer was wrong

10 Likes

I think the question is what do you think the purpose of the “top identifier” lists are? I don’t think anyone interprets them as being the “world experts” in anything, just the people who spend the most time in that taxon on iNaturalist. I don’t think breaking down identifier lists by leading/supporting/improving would make a whole lot of a difference in most of the lists, honestly. And I can confirm as someone who spends a lot of time in a few taxa, I click “agree” loads more than I add a new ID in some taxa, because the CV has gotten pretty good at recognizing some of the obvious species. That’s not because I click “agree” randomly to be bumped up some leaderboard; it’s because I want to confirm the IDs that are correct and get them out of Needs ID. I don’t see why we’d want to incentivize adding leading IDs over adding supporting IDs, as both are important for iNat to function. Sure, there are a few people who click “agree” too readily, but there are also plenty of people who go through and add CV IDs to observations without vetting the IDs themselves, which is just as big of a problem. I don’t see the “people who ID irresponsibly to get on leaderboards” as being necessarily tied to any particular ID type.

13 Likes

I agree that such a system (one emphasizing leading or improving IDs) might work against expert observers who are busy elsewhere and don’t add the initial ID, but I do not think it would encourage others to compete to add the first ID and hence increase the number of incorrect IDs. iNat being a robust community, most initial IDs if incorrect will almost always be corrected over time and eventually an initial incorrect ID won’t figure in the counts.

2 Likes

I think it is incorrect to say that

They will get credit for the ID that they entered - but someone agreeing will also get the same amount of “recognition”.

As others have noted, the ID rankings aren’t perfect, but part of the responsibility for using that information in the rankings also lies with the person who uses the lists. @ItsMeLucy has given an example of a good way to use the rankings, though I am sure that there are others.

Knowing which users are frequent IDers/more active on iNat is also useful - they may be more likely to respond to mentions for many taxa than some specialist users who may be on iNat less frequently, so the fact that the lists essentially incorporate this info has value.

On a side note, the Discourse forum lets you respond to multiple other users in one post via quoting and mentioning. This is often useful in keeping threads easy to read as opposed to having a series of many consecutive shorter posts from the same user. Additionally, if users post too many times in one thread in a short period, I think the system may throttle their posts in some circumstances (not 100% sure on how this works though).

4 Likes

It is EXCEEDINGLY common for an observer who does not know the species of one of their observations, to simply agree with the first suggestion made. And, Bingo, it’s “Research Grade” (I don’t think!!!)

5 Likes

Absolutely. I think that everybody needs to interpret the leaderboards in this context. Any change to the way that the leaderboard is calculated, which only uses information from the leaderboards on iNat will ultimately come down to the amount of effort spent indentifying.

I do think that having the leaderboards weight “leading” identifications over “supporting” and “improving” be an improvement but it’s still ultimately a measure of how much time they spend on the site. Also excluding any ID made with computer vision from the leaderboard calculations would be good.

I also spend most of my time identifying in Hawai’i and there are many invasive species where I am the global leader and get tags on observations in Brasil or Mexico where I have no idea about the look-a-likes in that region, so I’m definitely not an expert in that species but just spend a bunch of time identifying in Hawai’i

7 Likes

If its really a contest to see who identifies it correctly first, why don’t observers who identify their own observations get credit?

2 Likes

That would be great, as it would allow experts and taxon specialists who use CV to auto-fill their IDs routinely, to finally opt-out of leaderboards.
At last, immune to the endless stream of requests “Hi I see you’re the top identifier for Taxon X, please look at this observation”!

2 Likes

I hope it’s not a “contest”, and I personally don’t care about getting “credit”.
Speaking as a non-expert who will never be at the top of any “leader boards”… Just interested in doing my part, however small.

5 Likes

There’s positives and negatives to different way of calculating. The current system leaves the opportunity for someone who knows little to quickly top a leaderboard, even with vastly accurate IDs, just by finding an already well-curated taxon. I’ve seen it happen, but I think people gaming the leaderboards over data integrity or the community is the exception. When it happens, I think it can usually be corrected or if they don’t actually care about the community, there’s good chance that they’ll eventually do something that will get them suspended.

Maybe a leaderboard for each of various categories would be more informative? If there is to be only one, though, I think this or something very close is best. There are a lot of very knowledgeable, diligent people who rightfully should receive credit for carefully agreeing. Many topping leaderboards for 97+% supporting IDs are also people disagreeing with the most incorrect IDs in the taxon.

For the time being, if you’re using leaderboards to see who to trust or tag, you can enter the user ID, taxon, and category into the url for Identifications:
https://www.inaturalist.org/identifications?category=&current=&for=others&page=1&taxon_id=&user_id=joedziewa

1 Like

I strongly disagree - I find it extremely useful, and I assume most of the people who tag me for IDs because they saw me on the leaderboard also find it useful.

I am slightly confused by the assumption that the leaderboards are some kind of prestige ranking - as far as I’m concerned, it’s just way to see who spends a lot of time working on a taxon and thus might be able to contribute useful advice.

For what it’s worth, I have only encountered a small handful of people who want to “game” the leaderboards and recklessly submit IDs in an attempt to climb them.

A lot of my IDs show as being CV, but only because I have an old injury that makes typing slightly painful - so I will often use the suggestions to pull up the taxon I want, rather than typing out the name. Combined with the fact that you can remove the CV mark simply by clicking the taxon name a second time before you submit your ID, I’m not sure if this would be all that effective.

How about a system that takes into account the number of times one of your IDs has been disagreed with, and lowers the leaderboard score based on that?
Mind you, it’d probably need to be combined with some changes to the ability to delete IDs - I have already encountered a few users who immediately delete any ID that gets contradicted, so that would probably become more of an issue.

6 Likes

Agreeing with previously suggested IDs is not necessarily inherently a bad thing. The intent behind the agreement makes a huge difference and while it is not possible to know intention for sure, it is possible to infer that in some cases.

Compare the following scenarios:

  1. Identifier #1 goes through all observations of certain taxa in order to check their IDs. In the process, they will add identifications (which could be concurring or dissenting with the community ID) and in many instances, they might agree with previously suggested IDs as a way of indicating that they have checked through them as well.

Identifier #1 usually shows some degree of familiarity with the taxa they identify.

  1. Identifier #2 goes through a select number of observations of certain taxa, choosing to add agreeing IDs to only those observations where another identifier who is higher than them on the leaderboard (“Higher Identifier”) has already given their ID, and another identifier(s) has/have agreed with them (i.e. the observation is at RG already with two or more concurring IDs (one of which was made by Higher Identifier, and Identifier #2 is piling on a 3rd or 4th ID in order to boost their own ID count without boosting Higher Identifier’s ID count)

Of the select number of observations Identifier #2 chooses to identify, they deliberately avoid all observations that have been identified by Higher Identifier that are not yet at RG (i.e. the observation has only one specific ID given by Higher Identifier and Identifier #2 does not wish to bump up Higher Identifier’s RG ID count)

Identifier #2 may or may not be familiar with the taxa they choose to identify. For instance, they might see a recently uploaded observation of a taxon (“Species A”) appearing for the first time (the “New Observation”) in the region where they frequently identify (country X), and then proceed to leave agreeing IDs on most to all other observations of Species A in other regions (country Y) in the manner described in the preceding two paragraphs, while strangely refusing to give an agreeing ID to the New Observation of Species A (possibly to avoid the risk of being wrong).

Both of these identification practices involve a lot of agreeing, but to me, i think the underlying intentions behind each one differ greatly.

1 Like

The leaderboards seem to be working as intended and that is OK. The problem, I think, is a disagreement about what Top Identifier means. It’s easy to interpret “Top Identifier” as “Most Skilled Identifier.” That’s not what it means. Perhaps there would be less controversy (it’s a recurrent theme) if the leaderboard were renamed, perhaps to “Most Active Identifier” or “Most ID’s done for this taxon” or something lile that.

13 Likes

True, but that doesn’t translate to the top IDers list. Many of the more common taxa have top IDers with 1000s of IDs (in some cases even >100000). People just blindly agreeing with everything will usually not even make it into the top 100 (again, for the common taxa). I think IDs on your own observations aren’t taken into account for the leaderboard anyway, but I could be wrong.

As for false RG observations: So far identifiers have managed to keep those at a minimum (long term at least). But this is certainly an area where having more identifiers would help a lot. I currently never check RG observations because there are so many in needs ID that I simply do not get to it.

2 Likes

As others have noted, all the leaderboards indicate is that a person has spent time IDing a particular taxon and therefore they probably have some interest and expertise in that taxon. They aren’t meant to measure skill or authority.

The main problem I see with the leaderboards is that they do not actually count IDs one provided for a taxon, but rather the number of observations with a community ID of that taxon for which one has provided an active ID. This means that generalist IDers who provide a lot of broad IDs that are subsequently refined may end up on the leaderboards of taxa that they do not know how to ID except in a general sense. This can lead to other users misinterpreting how much expertise they have, and some of us are fairly uncomfortable being credited with IDs we didn’t make.

11 Likes

I agree that this is the problem.

4 Likes

Yes, this is also my one gripe with the current leaderboard system. There are moth species that are unidentifiable from photos for which I sometimes find myself on the leaderboard as an identifier. These are from times when I provided a genus ID and then someone else added an unwarranted species-level ID and I missed the notification. Though I suppose if I’m on the leaderboard for these species, I’m more likely to get tagged by people who think they’ve found one, so I can be notified and “disagree” back to genus on them.

6 Likes