Create a way to flag duplicate observations and remove RG status from the extras

I find occasional duplicates as I am adding more entries from dates where a Flickr photo might have been uploaded as a stand alone entry even if I observed 100 species the same day. I would love to be able to find and de-dup my records to reflect the true data point. But I would not necessarily want others to do it for a few reasons, first and foremost I am editing photos for upload in a larger format and sometimes with additional photos to inform others of characteristics not necessarily evident in just one photo. So far by searching date by date, I have been able to catch a few duplicates, but I would love an easier way to find them and remove the less appealing photo. I guess I should read through the entire thread to see if this is possible before asking a redundant question that may already have an answer.

I’ve read most of the comments here (but not all), and just want to add one point - one never knows what the data on iNat is going to be used for. RG is quite a misleading term (as is being discussed on another thread on this forum, and also on this very thread). I get the feeling that most people asking for the cleaning feature here are looking at it more from a biodiversity checklist perspective than any other use - am I mistaken? (Note that ‘other uses’ are not limited to species abundance and related quantitative studies.)

I’m part of a team that facilitates the use of iNat data not so much for research, but for outreach & use in reports that aim to safeguard habitats from misuse/destruction, sometimes via legal proceedings. But for good reason, RG is still something we prefer on the observations/images we use, and they’re sometimes multiple observations that are ‘duplicates’ (same individual, same place & date, almost same time, same user).

I understand why duplicate observations would be a pain for the identifiers who spend hours identifying individual observations on iNat.

Auto-merging such duplicates, as suggested by a few people here, would indeed be a much better feature than downgrading them. But I feel this idea would need a small amendment - the possibility to see all the images in the merged observations on an ‘observations’ page, without having to go into the observation to see the multiple images. I don’t really have a suggestion on how this could be done, and at the moment I guess this thought is a bit of a tangent. :)

To recap, I just wanted to say there are some of us who value ‘research grade duplicates’ on iNat, so I’m certainly in favour of some middle-ground solution (such as merging), but not in favour of downgrading the duplicates.

2 Likes

On the topic of skewed analysis, I would like iNat to have an abundance feature where single individuals, flocks/herds, or fields of the same species could be documented even if only one individual is presented as photographed. This would give a much more accurate record of the true distribution of a species. Rarities are seriously skewed in databases because many people post the same individual to get it on their life-list (which is fine to do). Bar charts can be really wonky because of it. I do upload the same species on sexually dimorphic individuals found together to help the identification algorithm, so those are not actually duplicates even if they were photographed almost simultaneously in the same location.

Search by date, and then go to the taxa tab. Any taxa that have only one obs is not going to have a duplicate, and I think the list is sorted on reverse number of obs… So just start with the top taxa, right click to open the taxa in a new window, then “view yours”… It will show all of yours for that taxa, but any duplicates should be easy to spot. Resolve, then close, and back at the taxa view for that date you choose the next taxa, repeat …

Thanks, that would work except I have 2500 taxa with ~5000 dates to sort. Since I am adding observations date by date, I am making sure I am not duplicating previously uploaded observations and catching a few that were uploaded as duplicates, but very, very slow progress. I expect to eventually catch most all of the duplicates, sadly, I built a life list before I decided to build a true database of daily photographed observations. As soon as I realized that Yahoo and Flickr would be going extinct or changing, I just started uploading directly to iNat, which is how the problem began and since the photos have different IP origins, the metadata is different.
Back to the slog.

Less repetitive typing would be nice. If these things could be flagged before people upload, it would be great. Even better, a message before upload that says “Hey, you don’t seem to have a date/ location/ picture with this observation. Do you want to add this data now?”

2 Likes

If the user legitimately accepts that Flickr incorrectly imported the same image multiple times as different observations, they can easily delete the duplicates themselves.

If a flag could be attached to at least remove it from the Unknown pile; or highlight an observation so that it doesn’t continue to get ID’ed. That would be good.

2 Likes

I also don’t see a huge problem with duplication. You can’t really use these data for estimates of abundance or frequency, anyway, and as you mentioned there’s no way to know whether observations from other users are the same individual.

I do agree it’s a bit annoying as an identifier when you hit 13 images in the row of the same bird.

Whats the point of community science if the data that is created is useless?

1 Like

I really really suggest the power to remove duplicates to curators. Every time i see duplicates it pisses me off. Because every single one makes Inat data look more and more useless. Who’s gonna use data with so many issues. We got people who just randomly agree, put joke IDs, ID without researching, Species that only live in x country being IDed in y country. There’s so many issues and none of them see to be getting fixed and its just gonna get worse and worse until maybe a site like the GBIF says to INat “we can’t include your data because there are to many issues”. Then who knows National Geographic will stop supporting INat, then INat looses funding and dies. Sorry about my rant but these situations are continuously getting worse.

maybe, but duplicate observations don’t actually cause any of those issues because iNat can’t be used to track abundance anyway. Anyhow, i’m curious where you get the idea that National Geographic cares more about scientific rigor than ‘connecting people with nature’ and associated marketing. In fact if anything I’d argue they lean too far the other way.

5 Likes

I think @charlie said it best, above:
https://forum.inaturalist.org/t/create-a-way-to-flag-duplicate-observations-and-remove-rg-status-from-the-extras/201/26

With observations doubling every year, and coming in at an enormous rate during the northern hemisphere summer, you may be seeing more unfixed issues now than at any previous time in iNat’s history. But that’s in absolute numbers, rather than as a percentage of the data. There’s far more good data than bad on iNaturalist, and the longer it has been on the site the more likely it is to have been corrected. After about October, things will slow down and the dedicated users, desperate for something to do, will go through and clean up most of the obvious issues.

10 Likes

There is not even a consensus on the site as to what represents a duplicate. Under site rules 2 plants 10 centimeters apart and submitted as separate records is not a duplicate. In fact it is what the site guidelines say should be done.

Most curators I have spoken with (and I am one of the more active ones) do not want the unilateral authority to delete content. I would not object to a process that allowed curators to vote to remove something and after achieving a set number of votes it is reviewed by site staff.

4 Likes

I am new here and was curious how to handle observations of the same individuals at different stages.

I had an Eastern Phoebe nest under my porch eave this spring, and I uploaded (as separate observations) the eggs, the babies just after hatching, and so on. I think there are 4 observations of them in total. I was considering combining them anyway.

Similar question would be observations of the same plant when flowering versus a few months later when it has developed berries.

2 Likes

Under site guidelines, anything taken on separate days must be separate records.

5 Likes

Observations of the same individual can be linked together in comments, like this: https://www.inaturalist.org/observations/26942473

Another strategy is to use an Observation Field. I’ve never done that, but it looks like maybe “Linked Observation” would work. People are less likely to see and follow one of those links, but it’s also possible to search for all observations which use a particular observation field.

2 Likes

see https://forum.inaturalist.org/t/using-the-field-similar-observation-set-for-linking-observations-of-lepidoptera-when-raising-on/1018

1 Like

I’m going to close this request:

  • we plan on adding a merge (and split) tool sooner rather than later, so it will hopefully make merging these easier for the observer.

  • as has been discussed, it’s not a major data issue, although yes, it can certainly be annoying for identifiers.

  • it might be better to tackle this on the upload side, with better onboarding or maybe some sort of automatic tool that checks for photos taken within a few seconds of each other. Our designer is pretty busy with notifications now, but we want to tackle onboarding after that.

  • Things could get messy when the community starts flagging duplicates, i.e. which one is the “original”?

7 Likes