Overzealous Identification

“I see a lot of people complaining about how bad IDs ruin the dataset for their research, but for most species it is not difficult to sort through all the observations and review them.”

This is the right answer. If for some reason you need the identifications in a taxon to be correct, just go through them and check them yourself. Learn the keyboard commands on the identification tool, assuming you only want a few species it won’t actually take that long.

“Regarding the gamification and reward incentive: has it been suggested that maybe top IDers be based in part on their improving IDs, so that simple agreement doesn’t rocket you to the top of the list?”

We should not be discouraging agreeing with Research Grade observations. We need MORE people agreeing with them not less. If only two people review an observation they must be perfectly accurate. If ten review it, a few incorrect IDs are not a big deal.

“The problem is that it’s impossible to accurately spot incorrect IDs. If it was possible to do that, incorrect IDs would be eliminated altogether”

While it is impossible for the computer to know which IDs are incorrect, I do think there is a place for computers looking for incorrect IDs. With a combination of the posting history, previous rate of incorrect identifications and the photo identification algorithm it should be possible to sort observations by the probability they are incorrect. Then observations could be sorted so you see the most likely to need review first. This will take some time to develop, but when we are at the 1 billion observation mark something like this really will be needed.

13 Likes

First, thanks to @bushman_k for the excellent general description of the kinds of issues crowd sourced projects experience. There is a large variety of users and the quality issues that will undoubtedly occur will probably have a large variety of causes.
@jmaley was pretty specific about the issue he brought up. The question is whether such a problem can be identified based on rules that an algorithm can use. It seems to me that the characteristic he found was the combination of 1) very many IDs of 2) poor quality photos or photos lacking identifying features. More is necessary. Maybe he knows others that he didn’t mention. If a pattern can be nailed down the database can be analyzed for this problem. But I think it is unlikely that any pattern would identify just one “type of users” with 100% certainty. I think it is kind of a research project in itself, although possibly a worthwhile one. Somebody would have to develop the algorithm over time.
For any specific quality issue, I can picture an automatic message informing someone that their actions fit a pattern that could be problematic for iNat. This would probably start a conversation and hopefully better understanding - on both sides. To avoid causing unnecessary trouble any algorithm would be best tested offline, the results reviewed and obvious flaws corrected. After all, iNat’s goal is to encourage participation. For instance should known experts be spared such a message on “overzealous IDing”.
The discussion includes quite a few ideas about parameters measuring users’ actions, which could be used to steer their behavior. I think it is very difficult to do that when the motivations can be so different. The disadvantages of the “Top Identifier” lists and of the RG label were mentioned. Although I know parameters that would encourage me further or help avoid errors, I feel that it’s better to be cautious since the effect on the whole community is hard to predict.

4 Likes

Speaking of “gamification” specifically.
This is very fashionable and trendy thing. However, quite often it is utilized in a counter-productive manner simply because the reward is tied not to what actually constitutes the desired result.
To apply it properly, the reward needs to be tightly tied to all important metrics together. While for the iNaturalist, leaderboards are, in fact, rewarding only for a quantity, not quantity and quality. Once there is no reliable independent criterion for quality, it is impossible to create a quantity-only-based reward that will not incentivize random identifications for the sake of hitting the highest quantity score. This fact pretty clearly suggests that leaderboards promote wrong behavior in people with immature motives. However, we have no idea how many users are motivated both by their own quality and the existing quantity-based reward. So, it is impossible to tell for sure whether removing (currently partially misleading anyway) leaderboards will have any negative effect on the existing effective user base. To be able to do that, poll data is needed, answering the very simple question: “Do you care about being the top identifier?” Without this information, it’s impossible to make an educated decision whether leaderboards should go for the sake of not attracting people obsessed with games or should they stay for the sake of not demotivating a certain fraction of contributors who care about quantity as well as quality.

3 Likes

I also would like to address the “we should counterweigh the bad contribution by the good contribution”.
There’s a certain sense in this suggestion. It doesn’t require the creation of a new mechanism. But that’s pretty much everything good about it.
However, the bad sides include the following:

  • When bad contributions are fairly large, it requires a proportionally large amount of time to post own (correct!) identifications. Nobody should ever forget that, in a crowdsourced project, contributor’s time is the most valuable and limited resource. Once there is an option of not wasting it, this option must be explored. Otherwise, making a contributor do a “counterweight contribution” will almost directly rob him of time, enough for making an identification that will lead to a confirmed ID right away.
  • Since there’s a simple majority rule applied to confirmation, it is easy to demonstrate that even a single wrong ID potentially “stalls” the process of getting a correct confirmed ID. In the best-case scenario, observation author will agree with the right ID, but if not (and it’s quite possible), two correct IDs will be needed to convince him or three - to outweigh it and get the right confirmed ID. Now, see the previous point about the precious time resource.
  • In self-regulated crowdsourced projects, there’s always a certain stable level of collectivist behavior, which is perfectly natural. And certain people make an assumption based on this fact, that it’s always possible to expand the mass of collective actions, to make people do more work for a project or to do a new kind of work. This is always an incorrect assumption since it is not a commercial enterprise where workers can be paid more or motivated in some other way. It doesn’t mean that this mass is set in stone - it is, definitely, possible to ask people to do a bit more here and there. Especially if you make it easier. Say, OpenStreetMap has so-called “validator services” - third-party services that help people to isolate possible issues, review them and fix, rather than doing a lot of tedious searches to isolate it. But still, this is something that helps people rather than puts a new burden on them. While counterweight contribution is an additional burden that shifts the equilibrium of the project.

I do understand, that developing a mechanism to revert at least obviously bad contributions also requires additional work. However, that’s the work of a small group of developers that has to be done once, not a burden for everyone that will be there forever.

4 Likes

Your missing the point of iNaturalist, though. It is first and foremost a platform to encourage an appreciation and involvement with nature, second is to build community and connect people. The “correct IDs” part of the equation is very much a back seat or “additional outcome”.

An incorrect ID from someone who is attempting to further their position on the leaderboard represents an increased involvement with nature, and the act of counter weighting it represents an opportunity to connect and teach. We wind up with a (presumably) correct ID eventually, so it’s a win-win from the iNat perspective no matter how you look at it!

You could argue that existing members of the community might get frustrated at having to make unnecessary IDs, but that is more about their perception of what they are doing. So many people think it is all about the data, which means they get upset at seeing “wrong IDs”. For many of us it is about the community, and I for one get excited at the opportunity to help and guide someone else… even if it is just challenging them about making blind agreements with the (should be removed) Agree button. So often I start out “teaching” one thing, and end up learning something myself!

6 Likes

Please, read my first message in this thread where I clearly address the dichotomy of beliefs related to the purpose of the crowdsourced projects in general and iNaturalist in particular. Without solving this dichotomy, it makes zero sense to discuss any solution or even some issues since the base premise haven’t been established.

And this

is heavily based on a personal assumption and a biased belief. You simply reject other not-so-nice options by making a statement such as this one.

1 Like

This part? Very much the point I am making!

1 Like

In your opinion it might be “pretty much everything good about it”, but I see the tagging in of other identifiers as being a reinforcing of relationships… building community. For a start, I need to know who to tag, so it forces me to get to know the strengths and specialisations of other iNat identifiers.

Key things to remember here, are that the observation belongs to the observer, that identifications belong to the identifier, and there is no “wrong” or bad contribution in terms of either. That is a completely subjective determination that will mean different things to different people. Some contributions might be “more right” than others, but they are still valid contributions! There are cases where people have switched photos after an ID has been made… it was a good ID at the time it was made! Take the herculean effort on a certain beaver which even after it got “corrected” still receives almost daily contribution, testament to the strength of the community that iNat has built around such dissention.

For me, this will always be about the “Agree button”, what I see as the greatest source of problematic IDs in terms of “overzealous identifications”. And even then, it is only problematic in terms of the extra workload it creates, but that is why I show the other “positive” side of that problem by pointing out the reinforcing of the relationships that tagging in to counter-weight brings. It builds community to have to toil together!

3 Likes

Eh. It’s a community as you state. And as such the goals of the community are what defines what it is for as much so as any mission statement. It takes on a life of its own in a sense. And while I partially agree with you that fear of overzealous IDs is overblown, throwing away this huge data baby in the name of “connecting with nature bathwater” makes no sense.

iNat isn’t just about connecting with nature. That’s super broad - hiking, meditating, hunting, watching an Attenborough film, using Inaturalist are all ways someone might connect with nature but Inaturalist in particular is for people to connect with nature by sharing what they see. and understanding and sharing what it is. otherwise they could just post a pretty photo on Instagram. Many people share not only for fun but also because documenting biodiversity is important for a lot of reasons. And yes the data matters. INat is a turbo field notebook. While it was essential that Charles Darwin or Linnaeus or various other people connected with nature, what matters now hundreds of years later is the data, results, and discoveries. Which inat makes global and reduces the social and financial barriers to.

10 Likes

@kiwifergus: “It builds community to have to toil together!”

Doesn’t that mean that you would like to see processes developed where quality is improved through communication with contributors, e.g. those who make poor IDs. A contributor who actually makes bad IDs can learn to do a better job. If the characterization of the contributor is wrong, iNat learns how this happens and can adjust their process.

I understand that putting up demands will discourage participation by some users and I don’t see a need for that (except for fighting purposeful destructive behavior). But I think having an opportunity to learn something is a great motivator. Sure, it should be done in a suitable way that doesn’t frustrate inexperienced contributors.

2 Likes

Again with the “Bad IDs” !

The overzealous identifier is involving themselves with the process. That is to be encouraged! Could they spend more effort in identifying than just clicking an Agree button without actually looking close enough to have any certainty with the ID? Sure! I could look a bit closer on a lot of mine, having been caught out on what should be “simple IDs” a few times too many. And how much is “more effort”? Maybe I should only make IDs if I have been university educated, or better yet… only if I am the author of the taxon?

It makes no sense to you, because the data is what you value. The same could be argued on behalf of advertisers wanting your shopping habits (should we give them our credit purchase history, just because it is what they would value?) or political parties wanting to sway public opinion in the lead up to an election (should we let facebook data get into their hands?). Just because data exists, doesn’t mean it is to be valued above all else.

For example, there is plenty of “range data” available on the katipo, but that doesn’t stop a proposed walkway from destroying a colony here in Gisborne. What does stop that loss is a passionate iNatter who before joining iNat knew next to nothing about them, but has developed a passion for spiders through being “allowed” to have a go at identifying spiders in iNaturalist. When I heard of the proposed walkway, I stepped in and advocated on behalf of that colony. As a result, a survey was undertaken of the area and the extent of the colony, as well as the impact of the proposed track, are now being considered. That survey and it’s data is NOT on inat, although I have put up a few highlight observations of it from my part in that survey. In other words, the data is not what saved this colony… it was from someone developing an appreciation for them, and valuing them enough to step in.

I look back at some of my overzealous early IDs and I shudder, especially when other iNatters that I am helping to get more involved, send me links asking “why did you call this a …”. It also reminds me of how far I have come on this journey…

Interesting that you bring up Linnaeus. He was chastised by his “employer” for making his field trips too much fun. :)

6 Likes

the data is one thing i value. I value the community a lot and i think the community is largely built around um, making and identifying organisms, it’s the whole freakin point. ‘Connecting with nature’ is so broad to be totally meaningless. People are here because they want to identify and map things. Your other comments about advertising data (why?) and inferring that data is valued ‘above all else’ are odd and divergent.

Not alone. But if you don’t know it is there, no one would know that there was even something to lose. If it got erroneously identified as a black widow or wolf spider, and no one knew it was something rare and special, no one would have cared. You need both. We can’t ‘save’ nature when we don’t understand it, you don’t get good conservation results that way.

Nothing wrong with fun at all. What’s wrong is this idea that iNat data is a ‘secondary byproduct’, ugh

3 Likes

When I read these I am reminded of the following quote which for me defines the goal of the project, the base premise, and the place data and scientific research had in the founding of the platform:

The above is echoed at:

And I suspect that this ethos is the secret sauce that has made iNaturalist the rather massively successful platform that it is.

9 Likes

There’s been plenty of argument about the scientific importance of accurate IDs, but it’s worth thinking about how a resource like iNaturalist helps people connect to nature.

It would usually start with someone posting an observation of something they saw and said “I wonder what that is?”. They post it on the site and get an answer. However, if they get an answer, and two months later they find out it’s wrong, that might put them off posting in the future.

To that end, it does make sense that identifications are as accurate as possible - to help drive that engagement and give new observers the answers they want.

1 Like

Why is that wrong?

I don’t think it is… the people behind iNat don’t seem to think it is… but you obviously do… THAT is what I meant by “you value the data”. Just the fact that it is referred to as a by-product (and what a wonderful by-product it is!) seems to always rile you up!

I wouldn’t be upset if someone called the saving of that katipo colony a by-product of iNat… I would say “give me more of that by-product, please… what more can we save!”

2 Likes

this kinda comes off as troll-y or trying to rile me up in and of itself or something. I care about it because i think the data is crucially important. Did anyone expect iNat to grow this big? Maybe not. But it’s now important to conservation in more than one way. That’s a good thing not a bad one… The bottom line is we are all here one way or another because of data and classification, or to be part of the community associated with that. We just are.And yes I know what Ken-Ichi said but that doesn’t mean i automatically agree with it. Reposting it to annoy me is again, kinda troll-y.

1 Like

Just for the record, I didn’t re-post it. I’m not trying to be troll-y, but when you state that calling the data a by-product is wrong, I am going to challenge that, because I think that it is that view that is wrong.

There is a big difference between calling something wrong, and stating that you think it is wrong. Especially as a forum moderator, with the “authority” that might be implied by that, of course I’m going to point that difference out!

I think it is wrong. And I’m not speaking as a moderator now. As you know.

Observations (data) are the unit of iNaturalist. If the community does not value the basic unit, it will become meaningless. That the data can be used by other people than the observer can be considered a byproduct, but should not be dismissed as a byproduct.

To loop back to the original questions,

Only suspending a user will remove their ability to identify. The site curators and staff make the decision whether a user should be suspended.

No.

  • Add a disagreeing ID, if relevant
  • Mark it as needing further community ID, if relevant
  • Directly, politely address it with the user. Privately (message) is good, though sometimes publicly (comment) works better for certain people
  • Contact help@inaturalist.org if the user is not engaging in a dialogue

Since the questions have been answered and the rest of this has been discussed at length here and in other topics, I’m going to close this topic for now. We welcome folks to continue the related discussions, such as the value and pitfalls of crowdsourced data, gamification, or the mission of iNaturalist, in separate topics, or propose relevant feature requests, but only after reading through the existing material. If you find yourself in a quick, heated back-and-forth, please consider stepping away or moving the discussion to private messages. Thanks!

See related topics:

12 Likes