Create a way to flag duplicate observations and remove RG status from the extras

#1

This has been discussed before, elsewhere, but I’ll take the chance to add it here. Duplicates with RG level status are rife. In the past curators often flagged one of records as a duplicate in order to strip the RG status and reduce it to “Casual”. Recently we’ve been instructed not to do this, as the flags can’t be resolved, so there needs to be another way of dealing with it. Direct appeal to the observer to combine & delete is, more often than not, ignored. It may be most effective to have a way to downgrade all of the records involved (‘original’ & dup) and have the observer then select which record they want as the primary observation that can reach RG or have it reinstated. This would elicit some action from the observer and hopefully reduce the incidence of dup submission. (I was going through one iNater’s observations a few hours ago and found that at least one third of their observations are dups, with one Madagascan observation having been submitted 7 times in total and all having reached RG.)

I should make it clear that I’m only referring to single observer dups, not “duplicates” (same individual organism, place & time) submitted by different observers.

0 Likes

Duplicate prevention: Notify observers if their image checksums match others on the site.
Respecting ownership of observations
#2

How about a new Data Quality Assessment item or other ability that is only accessible to curators, which could be a catch-all for marking an observation as casual or hidden?

This way the user who notices the duplicate content should 1) first comment on the observation to ask observer to fix it, 2) upon lack of action, flag the observation for curation, then 3) a curator can confirm that it is a true duplicate, mark it as duplicate, which hides and/or casual-grades it, and resolve the flag.

I recommend curator-only access since as you mention, many users confuse “multiple observer dupes” as “bad”, whereas it’s totally okay if multiple people observe the same thing. And, it may not be a true duplicate after all.

The catch-all casual-grade and/or hide feature could also apply to cases where

  • the observer uploaded several photos of completely different organisms and is also unresponsive to requests to separate them, and
  • “inappropriate” observations where the user has been warned/suspended, but the flag would otherwise remain unresolved in perpetuity. Our only option for hiding grossly inappropriate content at this time is to mark them as “spam”, which isn’t ideal.
4 Likes

Easy way to flag multiple-species observations
#3

3 questions:

-by restricting the power to act on a dupe to curators, folks like myself will be limited to commenting to bring attention of the observer?

-creating a need to revisit upon inaction creates more work it would seem and I’m wondering if there’s a way to avoid so many steps?

-I also come across a fair number of dupes that aren’t from different observers, aren’t with different time stamps, etc. What is the protocol for a non-curator like myself? Do I need to become a curator if I want to maximize my “helping?”

2 Likes

#4

-yep, and flagging
-true, but if we would be hiding or casual-grading someone’s content, I think they should be notified and given an opportunity to fix it first
-:+1: I think you should look into it! https://www.inaturalist.org/pages/curator+guide

1 Like

#5

Before this is done I would say there needs to be official guidance on how to report multiples of the same species. There are still users who believe and tell others, if you see 3 then make 3 records.

If implemented there should also be an undo option for the user. If someone flags or combines stuff of mine that is not duplicated, like other DQA etc i should be able to offset it.

2 Likes

#6

“An observation records an encounter with an individual organism at a particular time and location.”

Having separate observations of different individual organisms of the same species is fine per that definition.

I’m also not suggesting at this time that curators get the ability to combine multiple photos from other users’ observations into a single observation, or to split them apart.

2 Likes

#7

I think if we implemented notifications when someone checks anything that affects your data quality (as recommended in another thread, maybe by you?) then it would be OK to let anyone do it.

3 Likes

#8

Is none of this possible to automate? Can’t iNaturalist be made to automatically detect observations with the same observation ID by the same observer that have duplicate photos and generate a notice to fix? Together with a box to check when someone notices that sufficient time has passed with no fix, which wouldn’t remove Research Grade or anything, but would be searchable by curators, that would seem more streamlined to me, if it’s possible.

5 Likes

#9

Today I’ve encountered close to a dozen duplicates from a single user who has a long history, many thousands of observations, thousands of identifications and has been getting tagged in by myself and multiple other IDers about the issue without response while continuing to upload more duplicates. I’m wondering if I need to now go through and flag them all since I find it strange that they aren’t dealing with the issue and several of these are becoming research grade accidentally. When a user is showing a pattern non-responsiveness should there be a different protocol than the klutzes like myself who accidentally hit all sorts of buttons and mix up photos on rare occasions? Also, agree with @paloma that automation would be easier (and make me less grumpy).

3 Likes

#10

And may be more effective in getting through to some people. Plus, even the less egregious people can’t all be assumed to like the “human touch” better than automatic notices–introverts especially may not want to talk to strangers about their mistakes or be reminded that people are noticing their mistakes while they are just learning the system. Having said all that, I am in favor of cleaning up all the duplicates, so if I weren’t out of votes I would vote yes on the original feature request here.

2 Likes

#11

I’ve received this email from the iNater mentioned above who is quite disgruntled at my request to remove the dups
"Hi Ralph

You have pointed out that a lot of my images are duplicates and that I have submitted images seven times.

I have not submitted these - iNaturalist pulled them off Flickr. Until the other day there was just one of each image. There must be a bug in iNaturalist that has duplicated these. I note that all the comments under each have also duplicated.

There appear to be a number of bugs in iNaturalist at the moment. I attempted to load 5 images yesterday. It said they were loaded, but only two have shown in my observations.

I am travelling over the next couple of months and don’t have time to address this issue that I had no part in causing."

I have no idea if this is, is indeed the case, but am reporting it here just in case there is a glitch that needs to be considered. I will suggest to him that he flags it on the record(s) if he believes there is a problem.

1 Like

#12

The Flickr importer is super buggy and I had problems using it recently too. So it could definitely be true albeit that user seems kind of cranky.

1 Like

#13

I brought this up with Ken-ichi and it might be possible to do something like, when an observation is added, check to see if its image has been uploaded before and generate a message of some kind. Then again, this might disincentivize users making two observations of the same photo when there are two organisms in it (eg bee on flower).

Definitely could be the cause. Charlie, if you can write-up what happened to you, would you mind sending it to help@inaturalist.org?

2 Likes

#14

This would only capture a minor proportion of the dups - many are comprised of different images of an encounter, posted as separate observations.

2 Likes

#15

The bee/flower problem was why I suggested checking for duplicate photos with the same observation ID. But, since problems and limits have been noted for automation, how about making the DQA the major part of this entire process of cleaning up? Whatever the common problems are, have a DQA checkbox for it (for example, “same organism is subject of different observations” or “photos in observation do not all share the same organism”). Whoever first notices this can check the appropriate box, which generates a notice to the observer explaining what that means, and how to fix it. Have the DQA then reflect a date for the fix. Have another DQA checkbox that the next person who notices the problem can check, which says the issue has not been fixed within the specified time frame, generates another notice to the observer, and somehow gets the observation in a searchable state for curators. This is still an impersonal process, but from my viewpoint the personal process hasn’t really been working that well overall. I have to admit that I have gotten into the habit of just skipping over these observations because I couldn’t really see that typing out the explanations over and over was really very effective. As usual, I don’t know whether any of this is feasible . . .

3 Likes

#16

IIRC i already made a post about it on the Github or something similar, and we looked at it, and admin found that it was a problem with Flickr or the uploader and wasn’t really something that could be fixed. I may be remembering wrong.

1 Like

#17

Maybe it is an idea to add ‘‘Number of organisms’’ as an Annotation? If you see 3 do not make three observations but make the Annotation ‘‘Number of organisms’’ 3

''Annotations are similar to Observation Fields but they are maintained by iNaturalist administrators. This is to avoid the duplication and general ‘chaos’ among observation fields resulting from their many different creators and uses. Currently Annotations exist for Life Stage, Plant Phenology, and Sex. We anticipate adding more. ‘’

3 Likes

#18

It’s perfectly fine to make 3 observations out of 3 different individuals - I do it frequently because I’m interested in documenting variation within populations - but not okay to make more than one observation out of an encounter with one individual organism. Generally duplicates are the result of misunderstanding, by new users, on how to submit observations or inadvertent re-submission by regular users.

1 Like

#19

I agree. The most prevalent form of duplication that I see is where several different photos of the same organism are posted as separate observations. The proposed merge tool will help with a lot of them, but the majority of these cases are also by absentee observers, who post a bunch of observations and then are never heard from again. If we have “Number of Organisms” promoted to annotation, then we can at least set it to 0 for all but one of the duplicates…

0 Likes

#20

Devil’s advocate here, what is the major downside of having multiple observations of the same individual by the same user attain RG status? I agree these types of observations are annoying and not ideal, but I don’t see a big issue to their having RG status, unless I’m missing something here. Which is certainly likely.

1 Like