Do Private observations impact observation's utility for research?

Is it documented somewhere that iNaturalist exports data with deliberately altered coordinates? I don’t see anything in the FAQs around research use of our data, ‘research grade’, or obscuring locations; and there doesn’t appear to be anything in the iNaturalist dataset description at GBIF. This is news to me, and very surprising!

Having made extensive use of iNat data in my research, I had assumed that ‘research grade’ would preclude any specimens with deliberately altered location. In research, having no data is almost always better than having wrong data.

From you comment, I take it you consider this ‘researcher error’ (i.e., my fault). I suppose because I could/should have been filtering on coordinateUncertainty? I don’t do that because I find that field is used very inconsistently by the many and varied datasets that are hosted by GBIF. (There are other checks that I do, based on my own understanding of the ranges of the species I’m studying.)

As a test, I just downloaded a sample with an obscured observation in it, and I see there’s also a field for informationWithheld that I could use as a filter. But this is something that really ought to be clearly documented.

In order to be useful for research, an observation needs an identification. Often location is needed in order to limit which similar species must be considered. I’m not unusual in that I’ll ID a “private” observation only if it’s really obvious. Others are not worth my time and frustration. (I might try to ID it if the observer comments about what region or at least continent the plant grew in.)

2 Likes

The public positional accuracy is increased to the diagonal of a 0.2 x 0.2 degree cell (~500km2 at the equator)

https://help.inaturalist.org/en/support/solutions/articles/151000169938-what-is-geoprivacy-what-does-it-mean-for-an-observation-to-be-obscured-

1 Like

Yes, I understand how obscuring location data works. I’m surprised that records that have had their locations obscured are included with the exported ‘research grade’ datasets.

1 Like

As @cthawley said, making the true coordinates public on GBIF would defeat the purpose of obscuration.

Each obscured iNat record on GBIF says that its location is altered, eg https://www.gbif.org/occurrence/4937202897

2 Likes

Alas, this observation is not geoprivacy-obscured. The screenshot provides no information as to geoprivacy, either.

This one is the GBIF record for a geoprivacy-obscured iNat obs: https://www.gbif.org/occurrence/4014885899

But again, the only clue hinting at ‘obscuration’ is the very high ‘Coordinate uncertainty in metres’ (28329). Nothing with the decimal latitude and longitude says ‘this has been obscured’ (the mention “coordinates rounded” is irrelevant, it also affects geoprivacy-open obs).

2 Likes

I’m not suggesting obscured locations should be un-obscured on export. The opposite - obscured records shouldn’t be exported.

The record you linked isn’t obscured, the remark indicates that the original coordinates have been rounded from 10 decimal places to six decimal places. That’s a change of less than a meter I think? Not the same as obscuring, which shifts the location by 20-30km.

Here’s a record that was obscurred:

https://www.gbif.org/occurrence/5077027163

Yes, that’s documented in the “information withheld” section you referenced. Maybe the text there could be updated to something that references iNat obscuration specifically?

As to your confusion about RG or what observations get exported to GBIF, see https://help.inaturalist.org/en/support/solutions/articles/151000169936-what-is-the-data-quality-assessment-and-how-do-observations-qualify-to-become-research-grade- and https://help.inaturalist.org/en/support/solutions/articles/151000170346-which-inaturalist-observations-are-exported-for-gbif-and-how-often-does-this-export-happen-

The data aren’t wrong, as the true point still falls within the accuracy/precision circle. Just as old specimens that only said “Chicago, IL” probably mostly didn’t occur at the downtown pinpoint or centroid (and we adjust the precision to encompass the whole city).

5 Likes

does this mean that the obscuration box on inat is reduced to the accuracy circle when it’s stored in gbif?

I’m not sure which numbers you mean going which ways, but in this example, the circle is vastly increased in GBIF due to obscuration. (iNat screenshot is my own obscured obs with its true location and original precision)

3 Likes

so I guess if I were using gbif data / api’s and wanted to filter out obscured, i’d code for…

if( coordinate_uncertainty_in_meters > x )

// where x is whatever level of uncertainty i’m uncomfortable with

2 Likes

If you want to exclude obscured and private observations specifically, I’d filter out obs that have something in that information withheld field.

I don’t know if anything is put there other than for geoprivacy/taxon_geoprivacy=obscured or private.

The other information withheld text options are “Coordinate uncertainty increased to #####m at the request of the observer” and “Coordinates hidden at the request of the observer”.

But yeah for most studies you’ll probably want to exclude obs that are like a pinpoint in Missouri with a precision circle that encompasses the entirety of the US lol

1 Like

as long as that field is populated in the same way for data imported from other sources (eBird / bugguide). I kinda like using the undertainty_in_meters because some researches might allow a little more uncertainty than others.

but yah, I think I see how it’s supposed to work. thx.

1 Like

The uncertainty radius on that observation is 30.06 km, so not quite the same

2 Likes

Yes, and this is perhaps the only place where this information is explicitly stated. Which is fine if you know to look for it. That field doesn’t show up in the docs here at iNat. It is described, vaguely, deep in the GBIF docs:

Term Name: dwc:informationWithheld
Definition: Additional information that exists, but that has not been shared in the given record.

I read the help pages you link. They do explain what qualifies as research grade. They don’t say anything about obscured locations. Which means researchers using the data need to know what research grade is, and they need to know that some records are obscured, and they need to notice that obscured records aren’t explicitly excluded from the research grade export. It’s this last step that tripped me up, as I would never have considered that altered data would be included in the export.

Now that I know, I’ll add a filter to remove obscured records (or not, for some projects 30km is close enough). But I think this issue could be better documented.

What would be the best place to document it?

Just to be clear because tone is difficult to interpret on the internet, I’m not challenging the idea that this should/could be more well documented, I’m genuinely curious as to where that could best be done so that people will see it.

2 Likes

More like sub centimeter.

3 Likes

Thanks, and I hope I’m not coming across as overly angry or aggressive! The internet makes us all sound crankier than we really are :)

There are seven posts in the science FAQ section. They’re all pretty short. Some of these could maybe be combined into a slightly longer, more complete answer to the question: “how do I use these data as a researcher?” Maybe there’s a better place that could be linked from the FAQ?

I’m not sure exactly, but I’d be happy to help. (but I’m about to leave on two weeks vacation)

1 Like

From a researcher’s perspective, obscured coordinates definitely impact our ability to use those data to evaluate the conservation status of a species. I typically exclude obscured data because we count the number of populations separated by more than 1 km and a 20-30 km uncertainty on multiple observations from a single population can create the illusion of a larger number of populations. Occasionally, if an obscured observation is the only record of a species in a 30km circle, I will include it.

Many US state Natural Heritage Programs (or their equivalent) have projects to collect rare species data that users can trust with their hidden coordinates. If you want your observations of rare species to be used for conservation status assessments, I recommend finding the local NHP project and joining it. Here is the umbrella project that includes all of the state projects: https://www.inaturalist.org/projects/natureserve-network-umbrella-project

In Canada and maybe other places with their own portals, the hosting entity (NaturesServe Canada in this case) may have access to coordinates that are obscured at the taxon level but that is not the case in the US.

2 Likes

Others have already noted the ways that iNat describes how data is obscured and points are replaced. I feel like iNat does a decent job of this, but it could always be better. One recent improvement iNat has made is to include metadata with downloads from iNat itself (in response to this feature request):
https://forum.inaturalist.org/t/package-metadata-with-csv-on-observations-export/31692

The terminology is here and does a good job showing how obscured locations are represented in the data imo.

I do think that any researcher who ignores coordinate uncertainty in their work does so at their own peril, and that choice is their responsibility (as are the others they make when doing their work). It’s really important to consider uncertainy for all the datasets one might work with. It is used inconsistently between datasets to be sure, but I don’t think that is a reason to ignore it. If anything, it means one should pay more attention to making the appropriate choice about how to use it in relation to whatever research they are doing.

I agree that

However, as others have noted, I don’t think it’s accurate to say that the iNat data is wrong. The points are accurate to the accuracy values that are provided for them. This is no different than a geocoded observation from a natural history collection where someone went back after the fact and added a coordinate with a large accuracy radius to a specimen that had no coordinate because the specimen location on its label was “Akron, OH” or similar. No one would (should?) assume that the specimen was found at the exact center of Akron, Ohio just because the coordinates are there. The same goes for the method of “manual obscuration” that @kevinfaccenda and other users described. The true point of the observation is within the circle but not at the coordinates - the location data is still represented accurately.

One other point is that this really isn’t (at least solely) an iNat issue in my mind. iNat shares their data with GBIF, and GBIF chooses how to represent it within their scheme. They chose to use the informationWithheld field to fit into their existing scheme. I think that their description is decent:
“Information withheld Coordinate uncertainty increased to 28329m (or other similar value) at the request of the observer”

Though this text doesn’t distinguish between observer-selected and taxon-based geoprivacy as far as I know (though this doesn’t functionally make a difference).

I disagree that

As a researcher, and as I’ve noted on many other threads, I’d much rather have a greater amount of data available to me, and then filter it for my needs rather than having someone else making a blanket decision about what I (and everyone else) can and can’t use. For some applications/questions/analyses, obscured data will work just fine. For many others it won’t. Let the user decide rather than trying to impose a choice on them.

9 Likes