Location and date accuracy of iNaturalist observations

Oh, my!

Lepidopterists, too. Eurema lisa (now Pyrisitia lisa) has the type locality ‘United States.’ Eurema elathea has the type locality ‘N. and C. America.’ It gets to where you are glad for the precision of Eurema daira, type locality ‘Cuba.’


There’s a related paper that came out a year ago, by Allison Binley and Joseph Bennett called the Data Double Standard https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.14110

I started a forum topic on it, but an early response sort of derailed the discussion. Anyway, they had several good insights, including:

These are valid concerns when working with community science-driven data. What is problematic is the inconsistency with which these criticisms are levied against other sources of data, as analogous and equally valid arguments can often be applied to conventionally collected data, yet these limitations are less frequently acknowledged and accounted for.


This is also a problem with databased museum specimens. When the specimen is labelled with only the state, province, or country, it gets assigned a point located at the centroid of the place’s polygon. Those should be tagged with a very high radius of error (hundreds or thousands of miles), but in practice they’re often simply assigned coordinates with six decimal places and no indication that it’s not accurately located there unless you search out the actual label text.


I don’t dispute that there are many flaws in the location data on herbarium and museum specimens. That is not surprising given that many are from the 19th century when the emphasis was on collecting rather than recording, and specimens were often collected commercially for sale so there was a motive to hide the locations of your best collecting grounds. But the underlying theme of this discussion seems to be that iNaturalist’s geographical data are fine because herbariums are worse. Couldn’t we aim higher than not being the worst? Wouldn’t it be better if iNaturalist observers were encouraged to be both precise and accurate in order that their observations are as useful as possible to the scientific and conservation users?


Maybe an extra prompt from iNat

Not just Missing Location
But, fine, but that covers half of Africa! (And yes I DO see that) Try again?

Also the map location and the displayed text have no logical connection. You can say - I’m in Chicago, while the map is in Timbuktu (I also pick up random obs that are clearly, not in Africa)

In my experience, it’s actually iNat that is generating the extremely vague location descriptors (via google maps). At least, that’s what I am inferring, based on the format of the descriptors, which don’t look like they were typed in by a human.

That said, I’m not sure I understand what you’re suggesting. It certainly would be nice if iNat could automajically parse a text location, assign lat/long to it, and then warn the user that their text location descriptor doesn’t match the map pin. But I doubt that this is feasible in many cases. There’s just too much variability in what a user could enter. I guess the iNat database contains a vast number of examples of location text descriptors that an AI system could use as templates. If the location descriptor entered by a user resembles other descriptors in the iNat database with a very different lat/long, the AI could warn the user. Something like “are you sure the location pin is in the correct place on the map? your text location looks like it would be ”. But my guess is that the AI guesses would be wrong a lot of the time, or at least often enough that iNat users would get annoyed at how often they are being second guessed.

In my experience, most users don’t even realize that they can edit the location descriptor. Or it may be hard to do if you are using the iNat app. Most users appear to leave the iNat/google default the way it is and if they enter anything at all about the location, they add it to the Notes, or (rarely) they add a separate location field to the observation.


A possible glimmer of hope for herbarium/museum specimens?:


That is where I think a prompt could also be useful - is this where you were?
(And always with the - Don’t show me this again - option) Google’s default is Silver Mine - two words, which is wrong. There never was silver in the old mine. Our nature reserve captures the history and is called Silvermine - one word, which I choose to edit each time.

Or. When we actively write text in the box - iNat could respect the observer’s choice. By not overwriting with the Google default text.

Funny coincidence. A Ndonga family in Namibia told me that the previous Peace Corps volunteer that they had hosted was from “Oshikango.” It took me a while to figure it out, but in their phonetic system, that was how the pronunciation of “Chicago” came out; they had hosted a Volunteer from Chicago. The funny part is that there is, in fact, a Namibian town named Oshikango.


Another scenario I’ve thought of is a user trying to avoid giving the correct location of a sensitive species (i.e. owls) to protect it without knowing obscuring or private geo privacy is an option.


Expensive way to georeference a specimen, but probably the only way for many historical specimens!


Yes, but. There’s a trade-off. The more complex the on-boarding process or the more intrusive the messages during the uploading process, the more likely potential users will give up. As they begin, observers may have very little commitment to iNaturalist but many of these people may have interesting observations to post and/or will become more dedicated later. We don’t want to discourage them. (And then there’s the whole issue of iNaturalist encouraging engagement with nature. iNaturalist’s scientifically useful data can be viewed as a side effect of encouraging that engagement.)

Citizen science – always messy. Getting better data is a good goal! How to get there without annoying and discouraging observers and without further burdening identifiers? I don’t know. If somebody and figure out a way, great!! But I think I’ll go do something easy, like ID plant observations.


Here’s a question about location accuracy, pin placement, and error bubble. Suppose I’m out in a bog, like I was last night, recording audio of the first frog call of the year. I hear a frog wayyyy off in the distance and have a general sense of where it is, but not specific information since it’s probably a kilometer away. Is it better to place the location pin where I was standing while recording (which I know very precisely…small error bubble), or where I think the frog was while recording (which I know only with a large error bubble)?

I think this is the crux of the issue. The nature of the uncertainty in location on old pre-GPS specimen labels just doesn’t jive well with the nature of uncertainty on modern databases. I found this when uploading a bunch of my 20+ year old specimen records to iNat.
For example, if I have a specimen that says it was collected in a particular county at the “route 11 rest area”, there’s really not a good way to express that uncertainty using iNat’s features (and I assume the same holds true for any “geo-locating” based databases). I need to click a point on the map and express uncertainty with a bounding circle going off in all directions the same distance. But I know the specimen was taken somewhere along the long, narrow corridor that was “route 11” at that time, somewhere within that county. So I can click on the highway in the middle of the county and expand to include the whole highway, but this creates a giant bubble that includes hundreds of square miles that I know are not where the specimen was taken. Or I could look for a rest area in that county and assume that’s where I collected it… but who knows if that rest area was moved/re-constructed in the past 20 years, which would make the point I drop inaccurate. Anyone trying to “geolocate” specimen records from 100+ years ago is going to run into this issue magnified. Where do you drop the point on a map for a specimen with nothing but county-level data, especially if the county is odd-shaped and not well-represented by a bounding circle? What do you enter as the date for a specimen with a 10-day date range listed, when a database like iNat requires a specific date to be entered? What assumptions can you make about locations of roads/towns from a century ago? I don’t know how much of the “messiness” with old specimen records is inherent messiness in their labeling, vs. our clumsy modern attempts to import an old type of data into a new type of database that isn’t well-designed for the nature of the uncertainty on old labels.


Some of these old records with good county-level information, but the location data below that level is unknown or uncertain, have been assigned locations that are county centroids, which is not the best way to handle it I think. If your county is small, the error is not that great, but in the western US with huge counties it can be way off and the use of coordinates suggests greater precision and accuracy than the reality. At least with iNat you can draw a big circle around the whole county and leave it at that…

Accuracy is a problem for locations known to be along a linear feature like a road, trail, or river. For a time the Oregon Flora Project had a different annotation for such locations vs. the usual ones for which circles worked well. I think we just have to read the label or add additional data somewhere to know that the search for the plant may be much easier than the big bounding circle makes it seem.


Your department of transportation knows. You can ask.

1 Like

When I worked for the Massachusetts Natural Heritage and Endangered Species Program, we ran into the same sorts of problems. Luckily, our database could accommodate different polygon shapes, not just circles, but still, the best way to resolve remaining questions was to go back out and re-find the organism. Of course, the organism (or members of its population) could well have moved over time, but change over time is simply something to accept.


If it was me, I would apply a principle from GAAP. For example, suppose a bookkeeper knows the total business expenses for the year, but not a breakdown by specific purchase dates. In that case, GAAP is to assume that half of the expenses were incurred in the first half of the year, and half were incurred in the last half of the year. Therefore, the total expense is recorded as occurring on July 1, as that is the average of the first and last halves.

In the case of a specimen with a 10-day range of dates, I would therefore average it out to the fifth day of that range.