Easy way to mark multiple-species observations

jeanphilippeb · December 3, 2019, 10:52pm

Alright, I will go on IDing these observations with a high level ID, with a neutral explanation.

In some cases it will not be enough: if there are already 3 or more IDs, a single high level ID will not change the community ID and it is unlikely that other people will follow, because they have worked for this ID and are happy with it.

The remaining question is: do we want to start collecting these observations, so that later we can treat them all in one way or another? The extra cost is very low: a click on “Photos are of the same individual” in the DQA panel, in addition to what we do already. This would spare the time to search for these observations later, for applying another treatment, still to be defined, to make these observations more valuable.

Fine! It’s better to enforce a process, whenever possible.

charlie · December 4, 2019, 4:07am

one possibility would be to create a traditional project and add these observations to that project. Another would be adding an annotation. Sometimes users disable the ability to add to projects and such, but probably most of these short term users don’t.

rupertclayton · December 4, 2019, 5:41am

It’s unfortunate that it’s being interpreted that way, but it’s a protocol we came up with a the best way to deal with these to avoid having them all ‘ingested’ into the system as the wrong species.

High-level IDs are certainly about the best workaround available in the current circumstances. My focus was really to try to understand users’ expectations as a guide for deciding on improved site functionality. Making the site handle people’s observations the way they generally would want and expect would seem to be a good principle.

Also, in terms of ‘no researcher’ finding value in something, there are so many different ways people use iNat data, so I’m not sure that’s entirely true, though I agree it is of limited value if not possible to classify beyond high-level taxonomic units.

There’s always some edge case that disproves any absolute statement, but the value of an “Angiospermae” ID for a multi-species observation is so much lower even than separate IDs of daisy and daffodil that it’s hard to argue in favor of this approach unless it’s definitely the best we can achieve.

In terms of what to do with these observations, the iNat team has said on multiple occasions they don’t want to edit other people’s observations…

I think it’s possible to draw a line between editing an observation (modifying a user’s photo, changing their comment text, removing their chosen ID, etc.) and adjusting the presentation of the data they submitted. We’re talking about preserving all the data the user entered, but simply using the DQA plus an inactivity delay to duplicate some elements. Each “child” observations gets one image from the parent plus all the non-image data. Nothing is lost or edited. I appreciate that this might need to be explicitly stated in the iNat user terms, but I think it remains within the spirit of the relationship between users and iNat.

so a different strategy along with a flag or data quality assessment entry would be to cause the entry to focus only on the first photo - grey out the other photos - then allow that photo to be identified. That way at least some of the data can be used and perhaps the location is more likely to be accurate for the first observation noted?

I see the motivation for this approach, but, to my mind, this would reach the same level of (non-)editing as the auto-split suggestion, but would produce a less useful result.

vynbos · December 4, 2019, 6:11am

I would prefer multi spp obs to be auto split but how would you get around the location issue? Many users are still adding location manually, so there is no accurate metadata for iNat to work on.
Perhaps the way to do it is for iNat to give the user a message such as “hi userXYZ your observation at (link) has been flagged by another user as containing muliple species. These should be split into separate obs as per (link to guideline). You can learn how to split them here (link to tutorial) or we can do it for you, click yes if you want us to split your observation, after which you will be send a notification of each new obs to review.”
I would say that if a user has dismissed the a message we have to accept that they aren’t interested in splitting the obs.

zabdiel · December 4, 2019, 11:39am

I assumed that it does because if I edit an observation there is a sync which I think sets the location & timestamp of the observation from the photo’s metadata.

However even if iNat does have the metadata I would caution against creating observations using locations unless they are obscured or the user has some involvement. In 99% of cases it’ll be fine but what if someone has taken some photos of ducks at their local park, then goes home and takes a photo of a bird in their garden and posts them all to iNaturatlist under a single observation. They see the location is the park which is fine. The observations then get split leading to an observation existing showing their home location which they didn’t expect to happen when they post the observations. Whilst the worst case scenarios are quite unlikely the consequences could be very bad.

rupertclayton · December 4, 2019, 4:21pm

I agree with all the suggestions that we should notify and assist users to split multi-species observations themselves. My auto-split suggestion is proposed only as a way to handle observations by users who have not logged in to iNat any time recently (e.g. 90 days or 1 year).

zabdiel · December 4, 2019, 5:45pm

This seems like a good solution to me. Blocking adding ids prevents the situation where the original observation has multiple conflicting ids after the split (or at least prevents it getting worse)

It should be obvious to the observer what has happened
It is obvious hat to do if we come across a multiple-species observation
The Multiple species text (and any explanatory text) can be translated. A comment requires google translate and could be misinterpreted (an English comment could be misinterpreted by another native English speaker even)

rupertclayton · December 4, 2019, 5:46pm

It seems that we could figure this out in a progressive manner:

First iNat adopts @jeanphilippeb’s suggestion to add a DQA field for “Images are of the same individual(s)”. That allows us to identify problematic observations.
Ideally at the same time, iNat adds functionality to notify observers when their observation is assessed as not being a single individual/species, along with some guided approach to resolving the issue. That allows observers to understand and fix the problem.
After this has been in place for some time, iNat staff could then do some data analysis to see:
a. What proportion of “multiple” observations are fixed by the observer? (Probably should focus on stats for observations made after the new functionality is added; for earlier observations there’s a much higher chance the observer left iNat already).
b. What are the stats for image count, location and date-time for unfixed observations: e.g. how many observations have 2, 3, 4, etc. images; how many have location/date-time metadata in the images; when metadata is present, what’s the spread of location/date-time data among the images?

With that info, we would be better informed to decide what kind of auto-split policy could reliably avoid exposing private data. For example, we might find that among unfixed “multiple” observations where all images have metadata, 91% have timestamps within 30 minutes and locations within 250 m of the data in the iNat record. It might be determined that those criteria are restrictive enough to infer that the location privacy settings from the original observation can be applied while creating new observations.

We might then determine additional rules for auto-splitting observations that don’t reach that threshold. For example, the newly created “child” observations might derive their time and location from the image metadata but be set from the outset to have the location obscured.

I think we can start with tools to help users and the iNat community address the problem and later assess what automated fixes could be safely made.

cmcheatle · December 4, 2019, 5:54pm

This has the potential to impact hundreds of thousands if not millions of observations from people who add multiple photos of the same species into a single record from their outings.

While I understand the well debated rule is ‘one observation = one individual’, do we really want this hard enforced?

rupertclayton · December 4, 2019, 7:10pm

Why do you see this first step as having such a large impact?

The first step just provides one more DQA field. No records are changed. iNat users can then choose to assess an observation as not being of a single individual in the same way they can currently assess an observation as being not wild or having a suspect location.

The impact would merely be that we’d have a way to trigger notifications to the user to alert them to the issue and help them fix it. And that we’d be able to easily filter out content from these observations (e.g. photos) and prevent them from erroneously appearing in search results.

What is the scenario that you feel would be a problem for this first step of the proposed change?

reosarevok · December 4, 2019, 7:16pm

It’d be pretty hard to hard-enforce for many cases (I have had cases where I honestly don’t even know if the several photos of the same species of bird from the same tree are the same individual or another one from the group, etc), although I’d imagine if the observation is also not from the same approximate location, just the same field trip, then we would actually want to start enforcing it or the location becomes kinda meaningless? (plus, it’s IMO very interesting data to have 10 obs of the same species rather than one, it shows how common it is…)

cmcheatle · December 4, 2019, 7:32pm

I am questioning the need/appropriateness of flagging records with multiple photos of individuals of the same species in a single observation as is apparently being suggested. Not cases with clearly different species.

All it will take is a few retentive hyper rules oriented people to start flagging such records to create a lot of disquiet.

graysquirrel · December 4, 2019, 7:37pm

Yeah, I think multiple individuals seen in the same area at the same time are not any kind of issue at all. Personally, I only care when it’s wildly different species all lumped together.

rupertclayton · December 4, 2019, 7:49pm

Personally, I’m not suggesting that we flag observations with multiple photos of individuals of the same species. I’m focused on the very common scenario where a newer iNat user creates an observation with separate photos of a squirrel, a mushroom and a rose. If the community feels the right text for a DQA field is not “Images are of the same individual(s)” then I’m very happy for iNat to use a better description of the issue.

The much narrower issue about whether five photos of various warblers in a tree are all the same species is not the same. That issue is probably handled quite manageably through the current ID and comment tools.

cmcheatle · December 4, 2019, 7:53pm

This should not be an issue at all, nor should it even generate a comment. As long as the species identified by the observer / identifier is present in the photo, it is irrelevant if there are other, even more numerous individuals of other species also shown.

rupertclayton · December 4, 2019, 8:33pm

Great! Are you in support of the proposal to add a DQA field that addresses the squirrel/mushroom/rose issue?

cmcheatle · December 4, 2019, 10:07pm

I have no objection to it so long as it is very clearly for that use case only. Not for different individuals of the same species, not for there is another species on the photo as well and I want the record to be for that instead etc.

jdmore · December 5, 2019, 7:50am

I like your thinking here. We should consider the wording a little more I think, since Multiple species could also be interpreted as applying to a single photograph, which is not the problem being addressed here. So for a DQA, maybe something like

Same species in each photo

When there is a single (or no) photo, this should be greyed out and all votes cleared. This wording also addresses @cmcheatle’s concern about photos containing different (and/or multiple) individuals of the same species/place/date, which is discouraged but not strictly enforced on the site.

I think we also need to think about follow-up. Consider the diligent new iNatter who gets 5 downvotes and quickly fixes the problem, splitting their multiple species into separate observations. They only get one upvote to counter the 5 downvotes on their original observation. What’s the chance that enough of the downvoters will then come back to change their votes?

My suggestion: When number of photos on an observation decreases, the DQA is automatically cleared and reset by the system. If it turns out to have been only a partial fix, people can downvote the DQA again until the fix is complete.

Or as a workaround, I suppose the observer can just delete their original observation and re-post it with a single species.

cmcheatle · December 5, 2019, 3:29pm

Location can be exceptionally important - in the cases of plants as an example, and also can be absolutely acceptable at larger ranges (that’s what the accuracy buffer is for). If I see bird species x in my local park, and then see another one 100 meters down the trail, it serves no purpose to enter multiple records indicating that. Tomorrow the birds could be 5 meters away, or 200 meters away. What is relevant is that they are in that park.

Abundance is a critical datapoint, the refusal of the site to implement a standard way of tracking it is one of my 2 biggest, most consistent points of frustration with the site. But I’m not sure recording multiple records for every individual is the best approach. To begin with it puts the iNat data out of step with most other platforms which track numbers of individuals seen within the context of a single record. Thus iNat data becomes harder to share or integrate. It also leads to needless flooding of the site with records to identify etc.

As an example (we’ll see how long it takes me to get flagged or comments they are duplicates etc). To strictly follow the rules I just did this :

https://www.inaturalist.org/observations/33202891
https://www.inaturalist.org/observations/36379035
https://www.inaturalist.org/observations/36379046
https://www.inaturalist.org/observations/36379057
https://www.inaturalist.org/observations/36379072
https://www.inaturalist.org/observations/36379104

Surely it is better to put in one record that says I saw 6 of them, or even more accurately that says I actually saw 25, but was able to photograph 6

reosarevok · December 5, 2019, 3:33pm

Oh, I didn’t mean that, I probably just misunderstood your “people who add multiple photos of the same species into a single record from their outings” comment as “if they take 20 pics on the same walk at different times, they upload them all into one observation”. If you meant “don’t create one observation per organism of the same species on a photo”, then yeah, that’s probably overkill in most cases, unless each one is very significant for some reason (different banded birds maybe?).

Topic		Replies	Views
Newbie general questions General	9	497	June 29, 2021
Computer Vision Clean Up - Wiki (2.0) General	23	1491	August 29, 2023
Research grade obs with multiple photos showing different species General	11	1050	November 26, 2020
Bulk ID: Is there a way to identify large numbers of observations of the same species? General	16	2306	September 17, 2019
Disagreements to observations with multiple species in the same photo General	34	1693	January 2, 2021

Easy way to mark multiple-species observations

Related Topics