Tiny Little Questions

If the official staff position is that duplicates should be treated just like any other observation (i.e., not ID’d to “life”, not made casual using one DQA option or another), please put this guidance somewhere on the website, e.g. in the FAQ section.

This is not because I think that experienced users are likely to consult the FAQ, but because it is useful to have a common point of reference when disagreements come up about how to best handle such cases. My experience is that all of the ad-hoc approaches mentioned in this thread are widespread, so if IDers are to be encouraged to change their approach, the recommended best practice needs to be documented somewhere findable (i.e., not in the forum, which is only read by a small fraction of users).

As with multiple species observations, I would prefer to see a formal mechanism for marking duplicates as such, but I’m not holding my breath on that happening any time soon.

The biggest issue that I see with duplicate observations is not the duplicate record per se. Duplicate records are unfortunate but something that anyone using a data set is going to have to be aware of and take into account, as there are multiple ways that a record might end up being reported more than once.

Where duplicates often do cause a lot of headaches, at least from an IDer standpoint, are cases where they are not true duplicates – i.e., same photo posted more than once – but different photos of the same specimen that the observer has uploaded in separate observations because they don’t realize they can include more than one photo in an observation.

For taxa that are difficult and often require a variety of perspectives for ID, this multiplies existing challenges because the relevant features are now scattered across different observations. When confronted with such a set of observations, the IDer must now additionally decide whether to consider each observation independently, or whether the likelihood that the photos show the same specimen is great enough that they are comfortable using features visible in one observation when suggesting an ID for another.

On at least one occasion I’ve had a situation where a student doing a project on pollinators has uploaded each photo they took as a separate observation, resulting in several hundred observations with an average of 4-5 observations for each insect that visited a particular plant. Most of these observations were mis-ID’d, because the CV is bad at bees and struggles with uncropped insect photos in general. If the observations only had a general ID, I probably would have ID’d the best photos, left a comment on the others requesting that all photos of the same specimen be put in a single observation, and moved on. But because of the wrong IDs, I and other local IDers were left with a largely unresponsive observer and hundreds of observations that needed fixing – a task that would have been much smaller if there were a way to either mark or combine the duplicate observations in such cases.

(Now, obviously there are other underlying concerns in this particular case – i.e., the fact that the student and/or their advisor clearly did not have previous familiarity with iNat, the lack of appropriate guidance for the student in IDing their own observations, etc. But the example illustrates how this kind of duplicate has an outsized impact on IDing workflows.)

6 Likes