Overzealous Identification

I am curious about error rates, and know there has been some work attempting to quantify them. I think it’s probably safe to assume that overzealous identifiers, almost by definition, will have higher error rates than other identifiers, but I see that as one end of a spectrum of identifier quality, and maybe it’s worth considering how to encourage a good balance between identifying more things, and maintaining a reasonably high level of quality/accuracy in the identifications.

I think @charlie has mentioned that for plants he isn’t convinced the error rate on iNaturalist is much higher than that for herbarium collections (forgive me if I’m misremembering, and please correct me if so), and there’s no way of knowing with data that is recorded without vouchers (in my personal experience, I’ve seen enough to suggest that accuracy of plant data is going to be highly dependent on the observer).

Personally, I know that I feel kind of bad about making mistakes in identifications, and sometimes feel like maybe I should be more conservative, to avoid mistakes like that. However, based on this measure at least, my overall error rate is pretty low, and I do use the corrections to help calibrate which things I should feel reasonably confident in (acknowledging that my confidence is based in large part on knowing what species are expected to occur in the area where I focus).

When I do learn about a species I wasn’t aware of previously, I become more conservative unless/until I feel confident I know how to distinguish it from others that occur in the region I look at. As an example, when @markegger identified some observations as C. chrymactis, a species I had not previously been aware of, I began limiting my identifications to genus for observations in the (relatively limited) area where that species occurs, since I didn’t know how to differentiate it from the other species expected in the region.

In contrast to my approach, a friend of mine only wants to identify something based on seeing all the key characters, even if there’s not really anything else expected from the region that is likely to fit what’s shown in the observation.

I would certainly acknowledge that the quality of identifications is likely to be higher in that case, it’s unclear whether that (possibly small) gain in accuracy is worth the cost of many more observations going unidentified (when most of those identifications would have been correct).

I suppose there will always be a tension between putting names on things (or not), and views will differ depending on the person and how they may want to use the observations/identifications.

5 Likes

That’s a compromise I could get behind.

But this discussion does beg the question—what’s a new user? Is it based on the age of the account or is it based on activity in each clade? When I started, I was IDing birds, non-rodent mammals, and a few common reptiles. The past couple years, I’ve been delving into plants and insects. Oftentimes, I can only ID to family for insects and Dicots, Monocots, or non-angiosperm class for plants (a lot of people in my area upload without ID, so I give it something so it shows up on the filtered Identify searches people do). I’ve mostly known what I’m doing with the verts, but have had to do corrective sweeps of my own IDs multiple times in insects and plants when someone who’s an actual expert told me about a similar species that wasn’t showing up on any of the area taxon lists. If there had been a weightless period, I think my contributions would have benefited from having a new one in each major clade until I’d had some practice and a chance to encounter those experts. Though the number of contributions needed for each clade would probably need to be lowered for certain clades if there are some that don’t get many observations overall.

3 Likes

Short answer… any new user who is unfamiliar with how the site and community operates. ie Those that think “Agree” button means to accept the given ID, and anyone who doesn’t grasp the nature of how CID works.

In the context of overzealous indentification, it is not so clear. This is normally involving more active users, and not withstanding the lack of response to questions, they are actually a good thing!

The probation periods and weightless IDs etc are ideas to mitigate the problematic IDing from users that join for school projects or bioblitzes, where they are either duress users or very short term users, and don’t respond to questions or ID challenges outside of the short project/period they joined for. This is not a huge problem, because we can usually tip the CID with weight from tagging in other active users to help with confirming IDs. Typically, these problematic IDs come in pairs, one making the errant ID, and a classmate or other well-meaning iNatter "Agree"ing to it in the Needs ID pages. The advantage of the weightless probationary IDs is that they would still appear, but only two other active IDers would be needed to confirm them to RG, vs the 3 that are needed to overturn a single errant ID (or 5 others if an errant pair). Of course, if either of the errant IDers does respond and/or change their ID, then there is no problem. It is only the absentee IDers that create the problem here.

For me, the key to whether it would be effective as a solution, is in whether the probationary period can be waived for situations where we “recruit” in expert identifiers. Any such system should have as little a deterrent on those new skills as possible.

The other matter that is important to me is how it affects new “novice/amateur” iNatters who perhaps join as a duress user, but then “catch the bug” and go on to become regular iNatters. This is of course what iNat is about, so whatever is implemented must prove to be not too great a hurdle for those new users.

Some have argued that restrictions such as the probationary period would put people off from becoming more active. I think that would be the case if they had the ability and it was taken away, but for a new iNatter starting under the probationary system, it would be a situation of gaining ability, rather than losing it. With the possibility of gaining it very quickly, I might add, should they request and/or be given the release from the probation!

5 Likes

This issue isn’t unique to the iNaturalist, but quite common for crowdsourced projects.
I know quite a bit about it from my experience with OpenStreetMap. (Spoiler: there, nobody really came up with a solid universal simple solution for many reasons.)

There are multiple aspects important to understand and realize to deal with it properly.

  • The goal of the project needs to be defined clearly and that definition should be accepted by the majority of users. There are always those who have a pretty practical result-oriented view on it. They think that it’s important to create a product. A database of observations (iNaturalist), a freely accessible map (OSM), a collection of freely available images (Wikimedia Commons), an open encyclopedia (Wikipedia). With such a goal, it is always possible to start from defining at least some quality standards and define the unwanted contribution as well as how harmful it is. While there are always those who believe that these projects play a completely different role - to motivate, encourage, teach, and so on. Their focus is on a mythical “blank slate newbie” who, according to their belief, can be easily shaped into almost anything, has nearly no own agency (goals, interests) and is super-easy to spook by telling them they did something wrong. This idea is quite unrealistic, generalized and idealistic by nature. Based on this view, no quality standard shall be defined since it supposedly discourages valuable newbies. However, in reality, those who already put a lot of their hard work, knowledge and time into their contributions, often get strongly and reasonably demotivated by low-quality contributions that damage the project’s reputation.
    The latter case is usually supported by the effect of “professional bubble” - a situation, where a project has been started by a relatively small group of professionals sharing similar values, so it doesn’t get any exposure to different kinds of contributors for quite a long time.
  • If there’s an agreement, that contributions can, in fact, be deemed as one of insufficient quality or harmful, regulation mechanisms need to be established. For example, in OpenStreetMap, there are no moderators or contributor levels (while in Wikipedia, they have such things). However, any edit can be undone by any user. Sure, there are conflicts sometimes, but there’s a (small) Data Working Group that solves these issues on an individual basis.
  • The source of contributions of insufficient quality may be studied to benefit a project. But it must not be proclaimed without any actual study because it is quite easy to imagine a pattern and suggest a universal measure that, in reality, will neither be universal nor effective. For example, attributing insufficient quality contributions to “new users” is just as incorrect as thinking that all new users can be turned into valuable productive contributors. For example, a new user can easily be a professional who finally decided to start contributing online. The only thing that can be done effectively based purely on the user’s experience within a project is somehow highlighting their contributions to draw more attention of the experienced users (or moderators, if any). And only then, when a pattern in the user’s activities has been somehow established, it is possible to take corrective actions if that pattern is negative. For example, in the OSM project, kids playing PokemonGO cheat the game by adding fake features on the map (it’s used in the game). Their edits get reversed and, if they persist, bans are issued by Data Working Group.
    So, the general approach there is “everyone is equal at the beginning, new users are watched with more attention, any bad contribution can be undone (deleted), if unwanted actions continue - user can be banned temporarily or permanently”.

Speaking of understanding the reasons for unwanted contributions - it might be important to understand that a lot of people nowadays are raised on videogames and their need for gratification trumps almost everything. So, the discovery of a “top identifiers” section automatically triggers a pursuit of the highest “score”, which is simple once you don’t care about rules/quality. It’s obviously impossible to detect such an intent in advance as well as it is not really necessary - it is perfectly possible to mitigate the effect of such actions if there is a tool that allows to reverse it and a way to prevent this user from repeating it. Sure, an attempt to talk a reason and understanding into them can be added somewhere before a ban to give them a benefit of a doubt.

Even though there is no really simple and universal solution, inventing the wheel from scratch makes no sense once this isn’t a unique issue.

14 Likes

I believe I found the user you reference. Unfortunately, I did not notice this over the summer, as I spent very little time on iNaturalist. I sent this person along message regarding this issue, which is broader than your observations alone, and hope to resolve the issue productively and amicably.

5 Likes

I do a lot of identifications. I’ll pick a genus I know - or have learned to know, because I’m not an ‘expert’ - and go though several pages of observations. My object is not to advance on the leaderboard, but to get two year old observations (&etc) into a database. I’m at the top of the leaderboard for Canadian Noctuids (and some species), but I don’t really care. Currently there are over 300 pages of un-confirmed identifications in this group. They are more beneficial if they are confirmed. When confirming I do two things: I look at 95% of the observations, and usually check them with at least two sources. I also give an explanation if I disagree, something I have noticed that many folks do not do.
I don’t know how to deal with observations that are not properly identified and then confirmed, but don’t lump all ‘confirmers’ in the same boat.

12 Likes

By the way, this is a slow and rather tedious process, but I like doing it.

10 Likes

Thank you for all your help on iNaturalist!

Same as anything else, leave your own ID explaining why you disagree, and let the process take it’s course. If all goes well, someone will learn something new!

7 Likes

I have a suggestion that might sound like a “reputation” system, but is not.

tl;dr: Throttle the ID rate of users whose mis-identifications exceed a certain threshold. Reach out to experts to create tutorials for frequently misidentified taxa. Let reading or watching the tutorials be a part of restoration of ID rate for users who have been throttled.

Explanation:

(1) iNat has an interest in encouraging enthusiastic IDers. Right now, the eButterfly project has 800k observations, with 120k listed as “needs ID.” That’s 15%, only some of which are unidentifiable. In general observations outnumber IDers.

(2) iNat also has an interest in encouraging or insisting that IDers improve their skills. Not only do we want the error rate to be low, but we also want IDers to be able to expand their repertoire.

To solve (1) and (2), we want to make it as easy as possible for IDers to jump in and help, BUT to flag those who are frequently operating outside the range of their expertise. The easiest way to do that is to look at error rates.

To bring those IDers along who have high error rates, we want to reduce their ability to contaminate the database, but also train them up so that they can better contribute.

So it makes sense that if an IDer shows themself to be inaccurate by having a large error / maverick rate, we would temporarily reduce their number of IDs per day (automatic process). That reduction could then be removed by watching or reading training on the taxa misidentified.

The downside of this suggestion is that it would require creating lessons on distinguishing various taxa. But that’s a one-time cost, and it would add value to the “about” pages for taxa. The time spent doing that would be saved by having better-trained IDers to help share the load.

4 Likes

The trick there is distinguishing between people who don’t know what they’re doing, and people who dissent because they know better than all the people who’ve mis-IDed things before them. I don’t remember what thread it was on, but somewhere someone said many maverick IDs are by people in the latter grouping. And when, as others have pointed out, there are large swaths of observations with two like but incorrect IDs because people are doing bioblitzes or some such, then those IDs are going to look “correct” to any algorithm unless or until enough better-informed people come along to change the ID it’s labelled with.
I’m not sure how to address that.

10 Likes

I’m not sure this is a great idea unless it can be really finely flushed out on the details and/or the site gets stronger practices on the taxonomy management side.

As an example, I am currently the top observer in Canada and one of the top identifiers of a species of bird. This bird has undergone taxonomic revision, but for 13 months the site has not implemented this change. Some users are doing the changes by hand and adding ‘corrected’ id’s under the new species name. I refuse to spend my time manually correcting my hundreds of ID’s when this should be automated and should have be done over a year ago. As a result my maverick/‘incorrect’ ID rate is growing by the day.

Under this proposal, I’d likely be downweighted or banned from doing ID’s

5 Likes

The problem is that it’s impossible to accurately spot incorrect IDs. If it was possible to do that, incorrect IDs would be eliminated altogether - scratch that, there would be no need for IDs because whatever system was checking for mistakes would be doing all the identifying itself! Until the AI has a 100% success rate, the only way to look for correct IDs is by comparing them to other IDs. And it’s impossible to distinguish between an incorrect ID among many correct ones, and a correct ID among many incorrect ones.

If a system was introduced to penalise maverick IDs, this would just further encourage people to overuse the Agree button - anything to increase their ‘correct’ ID count, right?

There’s a reason that maverick IDs currently have more weight in the community ID calculation, and it’s because people who disagree with the rest of a group are more likely to be right.

8 Likes

Instead of limiting people’s ability to ID things, or increasing the barriers to RG, how about we encourage more people to go back and look at RG observations of taxa they are familiar with, and correct wrong IDs? There are several species I run a search for every week and ID all new observations, regardless of if they’re already RG or not. And I catch a fair few errors, but they get corrected quickly that way.
This also trains the AI, so there will be fewer mis-identifications in the future, even when people are just selecting the first computer-vision option that pops up.

I see a lot of people complaining about how bad IDs ruin the dataset for their research, but for most species it is not difficult to sort through all the observations and review them. Honestly, I feel anyone who is trying to use these datasets has an obligation to do so in the first place. Inat is a great starting point for research, but if you’re outsourcing half of your project to random people on the internet and not willing to spend a day sorting through it before you use it, that’s a problem.

Anyway, I feel that adding more barriers to people’s participation is a bad idea and will discourage many who could have eventually become valuable members of our community.

24 Likes

Regarding the gamification and reward incentive: has it been suggested that maybe top IDers be based in part on their improving IDs, so that simple agreement doesn’t rocket you to the top of the list? Sorry if it has; I skimmed the whole thread and didn’t see it, though I could’ve missed it. Since the first ID is usually improving, it would reward speed, which was suggested as undesirable; but that would only happen as a side effect.

2 Likes

I think that would just shift the problem - the incentive would still be there, but now it would be incentivizing users to guess at finer level IDs in order to achieve a high rank.

2 Likes

“I see a lot of people complaining about how bad IDs ruin the dataset for their research, but for most species it is not difficult to sort through all the observations and review them.”

This is the right answer. If for some reason you need the identifications in a taxon to be correct, just go through them and check them yourself. Learn the keyboard commands on the identification tool, assuming you only want a few species it won’t actually take that long.

“Regarding the gamification and reward incentive: has it been suggested that maybe top IDers be based in part on their improving IDs, so that simple agreement doesn’t rocket you to the top of the list?”

We should not be discouraging agreeing with Research Grade observations. We need MORE people agreeing with them not less. If only two people review an observation they must be perfectly accurate. If ten review it, a few incorrect IDs are not a big deal.

“The problem is that it’s impossible to accurately spot incorrect IDs. If it was possible to do that, incorrect IDs would be eliminated altogether”

While it is impossible for the computer to know which IDs are incorrect, I do think there is a place for computers looking for incorrect IDs. With a combination of the posting history, previous rate of incorrect identifications and the photo identification algorithm it should be possible to sort observations by the probability they are incorrect. Then observations could be sorted so you see the most likely to need review first. This will take some time to develop, but when we are at the 1 billion observation mark something like this really will be needed.

13 Likes

First, thanks to @bushman_k for the excellent general description of the kinds of issues crowd sourced projects experience. There is a large variety of users and the quality issues that will undoubtedly occur will probably have a large variety of causes.
@jmaley was pretty specific about the issue he brought up. The question is whether such a problem can be identified based on rules that an algorithm can use. It seems to me that the characteristic he found was the combination of 1) very many IDs of 2) poor quality photos or photos lacking identifying features. More is necessary. Maybe he knows others that he didn’t mention. If a pattern can be nailed down the database can be analyzed for this problem. But I think it is unlikely that any pattern would identify just one “type of users” with 100% certainty. I think it is kind of a research project in itself, although possibly a worthwhile one. Somebody would have to develop the algorithm over time.
For any specific quality issue, I can picture an automatic message informing someone that their actions fit a pattern that could be problematic for iNat. This would probably start a conversation and hopefully better understanding - on both sides. To avoid causing unnecessary trouble any algorithm would be best tested offline, the results reviewed and obvious flaws corrected. After all, iNat’s goal is to encourage participation. For instance should known experts be spared such a message on “overzealous IDing”.
The discussion includes quite a few ideas about parameters measuring users’ actions, which could be used to steer their behavior. I think it is very difficult to do that when the motivations can be so different. The disadvantages of the “Top Identifier” lists and of the RG label were mentioned. Although I know parameters that would encourage me further or help avoid errors, I feel that it’s better to be cautious since the effect on the whole community is hard to predict.

4 Likes

Speaking of “gamification” specifically.
This is very fashionable and trendy thing. However, quite often it is utilized in a counter-productive manner simply because the reward is tied not to what actually constitutes the desired result.
To apply it properly, the reward needs to be tightly tied to all important metrics together. While for the iNaturalist, leaderboards are, in fact, rewarding only for a quantity, not quantity and quality. Once there is no reliable independent criterion for quality, it is impossible to create a quantity-only-based reward that will not incentivize random identifications for the sake of hitting the highest quantity score. This fact pretty clearly suggests that leaderboards promote wrong behavior in people with immature motives. However, we have no idea how many users are motivated both by their own quality and the existing quantity-based reward. So, it is impossible to tell for sure whether removing (currently partially misleading anyway) leaderboards will have any negative effect on the existing effective user base. To be able to do that, poll data is needed, answering the very simple question: “Do you care about being the top identifier?” Without this information, it’s impossible to make an educated decision whether leaderboards should go for the sake of not attracting people obsessed with games or should they stay for the sake of not demotivating a certain fraction of contributors who care about quantity as well as quality.

3 Likes

I also would like to address the “we should counterweigh the bad contribution by the good contribution”.
There’s a certain sense in this suggestion. It doesn’t require the creation of a new mechanism. But that’s pretty much everything good about it.
However, the bad sides include the following:

  • When bad contributions are fairly large, it requires a proportionally large amount of time to post own (correct!) identifications. Nobody should ever forget that, in a crowdsourced project, contributor’s time is the most valuable and limited resource. Once there is an option of not wasting it, this option must be explored. Otherwise, making a contributor do a “counterweight contribution” will almost directly rob him of time, enough for making an identification that will lead to a confirmed ID right away.
  • Since there’s a simple majority rule applied to confirmation, it is easy to demonstrate that even a single wrong ID potentially “stalls” the process of getting a correct confirmed ID. In the best-case scenario, observation author will agree with the right ID, but if not (and it’s quite possible), two correct IDs will be needed to convince him or three - to outweigh it and get the right confirmed ID. Now, see the previous point about the precious time resource.
  • In self-regulated crowdsourced projects, there’s always a certain stable level of collectivist behavior, which is perfectly natural. And certain people make an assumption based on this fact, that it’s always possible to expand the mass of collective actions, to make people do more work for a project or to do a new kind of work. This is always an incorrect assumption since it is not a commercial enterprise where workers can be paid more or motivated in some other way. It doesn’t mean that this mass is set in stone - it is, definitely, possible to ask people to do a bit more here and there. Especially if you make it easier. Say, OpenStreetMap has so-called “validator services” - third-party services that help people to isolate possible issues, review them and fix, rather than doing a lot of tedious searches to isolate it. But still, this is something that helps people rather than puts a new burden on them. While counterweight contribution is an additional burden that shifts the equilibrium of the project.

I do understand, that developing a mechanism to revert at least obviously bad contributions also requires additional work. However, that’s the work of a small group of developers that has to be done once, not a burden for everyone that will be there forever.

4 Likes

Your missing the point of iNaturalist, though. It is first and foremost a platform to encourage an appreciation and involvement with nature, second is to build community and connect people. The “correct IDs” part of the equation is very much a back seat or “additional outcome”.

An incorrect ID from someone who is attempting to further their position on the leaderboard represents an increased involvement with nature, and the act of counter weighting it represents an opportunity to connect and teach. We wind up with a (presumably) correct ID eventually, so it’s a win-win from the iNat perspective no matter how you look at it!

You could argue that existing members of the community might get frustrated at having to make unnecessary IDs, but that is more about their perception of what they are doing. So many people think it is all about the data, which means they get upset at seeing “wrong IDs”. For many of us it is about the community, and I for one get excited at the opportunity to help and guide someone else… even if it is just challenging them about making blind agreements with the (should be removed) Agree button. So often I start out “teaching” one thing, and end up learning something myself!

6 Likes