Yes, this is the aspect that has led me to cut down on identifications. If people can’t be bothered to enter the correct place, it is a false observation regardless of how good the identification is.
from my current research and work, in some taxa/regions at least, there are far far more (order of magnitude more) date and location errors and problems associated with physical specimens/vouchers from herbaria and museums than there are with iNat records. It’s seriously not close. It’s frightening how many herbarium vouchers I’ve been looking at where the coordinates/pin on the map in online databases is a total mismatch for the collection locality notes, not to mention locations in the ocean for terrestrial species, extremely large coordinate uncertainty values, etc.
A huge part of the problem here of course is that when someone collects a voucher, and deposits it at an institution, they aren’t the ones entering the location into the database! In its accessioning journey it might pass through 2, 3, 4 other people before it finally gets entered, plenty of opportunity for human error.
Compare that to iNat where you take a photo and upload the record straight away, and your phone does the automatic geotagging for you. In the vast majority of cases, this works very smoothly. So certainly at least for the data that I regularly look at, it’s not even remotely close, iNat with respect to accurate locations/dates is vastly superior.
I also think this is somewhat of a mischaracterisation of the problem. Are there some cases like this? Sure. But in my experience, most cases I encounter on iNat of bad location data are the user not realising that their GPS stuffed up that day, or they were in a cave and lost signal, they accidentally entered the wrong name in the search bar, etc. Generally, people are making genuine mistakes and I hardly think they should be ‘punished’ for this
very relevant and illuminating here is this brief analysis put together by Bob Mesibov looking at amphibian records in GBIF, comparing physical specimen records vs iNat: https://discourse.gbif.org/t/21st-century-amphibia-collected-vs-observed/3903
His key finding:
After this “first pass” filtering for an imagined SDM/ENM, the observations subset for Amphibia had lost 30% of its records, while the collections subset had lost 88%. As with my review of iNaturalist ’s Australian millipede IDs in GBIF, I’m reluctant to generalise to other datasets and other uses. But this result accords with my audits of many more Darwin Core fields in many more datasets: citizen-science observation records are tidier than collections records.
This also matches my experience with record accuracy in collections. In my experience, it is the older collection records that are very inaccurate though (erroneous location recorded, no gps - subsequent geolocation which is an art at best).
I also agree that most iNat users mistakes with location are not bad faith. If it’s obvious that the location isn’t in the accuracy circle, the DQA is a very quick/easy fix, though I also generally leave a comment asking the user to fix (lower success rate on that, maybe 20% response rate).
I ran into an example of this years ago when I went to find a population of lizards that were supposedly along a certain road collected in the 1930’s. But I couldn’t find any at that site despite extensive searching, which made me think they had gone locally extinct. I went back to the original field notes of the researcher and noticed his sequence of towns visited seemed weird given the configuration of that road today. After more digging, I next discovered that a new bridge had been built in the 1960’s upstream from the original and the road had been rerouted several miles from the original route. When those 1930’s field notes were used to place the specimen on a map sometime in the 2000’s, they used the existing road route, not the original, which moved the specimen those few miles upriver, which turned out to be on the other side of several significant barriers to dispersal. Sure enough when I went back to the actual location from the 1930’s, I found the lizards I was looking for right where they were supposed to be.
I can’t quote you statistics on percentages with wrong localities. Once I have seen someone is putting a place name that doesn’t match the co-ordinates, I skip that batch of records. About twice a week I write a comment in an observation pointing out that the place name doesn’t match the co-ordinates and it is very rare for the observer to correct it, so to me that indicates can’t be bothered.
in many cases this is not the fault of the observer, see eg https://forum.inaturalist.org/t/incorrect-localitynote-for-lats-and-longs/46472
Hey all, hope you don’t mind, I split these posts off into a new topic, since it seemed both cohesive and divergent from the previous topic, which has already been marked as solved.
I agree with previous comments here about problems with museum specimen locations. Lots of reasons for erroneous location data: misread field notes, switched specimen tags, duplicate location names, curatorial mistakes, etc. I saw all of these while working in a university collection. At least with GPS and the fact that the iNat observer uploads their own data, many of these error-prone steps in the curation process have been eliminated. Not that errors can no longer happen, but less likely.
Somewhere on the forum is a very old post. A herbarium sheet with scrawled hand-written notes. Where IS this? Anyone?
And the answer rolled in!
(Turn left at the red gate - which is now blue. Or the old oak - which came down in a storm 3 years ago)
I agree that there are lots of problems with Museum specimens. In the database I’m working on, the iNat observations now (vastly) outnumber the old historical museum specimen records, so problems with the iNat observations can be a significant issue, even if the average error rate is lower.
I’ve also seen cases where the Lat/Long don’t align with the location descriptor entered by the observer. I usually don’t have time to carefully check the lat/long against the location descriptors, and in my taxon/region of interest, a lot of observers just use the default iNat location descriptor (which are typically useless).
But I was actually thinking of other scenarios:
- the app syncs up at a different location for where the photo was taken (already mentioned). These are tricky to spot unless you see a normally very localized species reported far from their normal habitat, or you notice a pattern of an observer repeatedly reporting a wide range of species from what is clearly a building.
- the observer is using the phone app to snap a photo of the original photo displayed on their computer screen, and so the location captured by the app is actually the location of the computer screen, not the location where the photo was originally captured (can usually spot these either from the character of the photo, or from the pattern of observation mentioned in the previous bullet)
- the observer is not using an app - they are uploading photos and then assigning locations by clicking on the map. Some observers are meticulous, while others are very “sloppy”. This can be tricky to detect, but because I get observations from multiple sources, I can correlate them and I find many cases where the observer uploads the same photo on another platform and assigns a different location, or a companion has a similar photo of the exact same organism from the same date but with a different location. After seeing enough examples where I can be absolutely sure that observers are entering inaccurate coordinates, it makes me suspicious of many observations that are reported at locations that are “off the beaten path”. Over the years, I have a sort of mental profile of many observers. I know the kind of places they tend to go to. There are folks who bushwhack, and folks who rarely venture far from their vehicles. I also have a very good idea of where most of the species in my area are likely to be found. Combining all these pieces of info gives me a pretty good idea of which observations can be taken at face values, and which ones should be viewed with a jaundiced eye. Without that background knowledge, you wouldn’t necessarily suspect that there’s anything unreliable about these observations. You’d believe that locations of iNat observations are being accurately reported.
There are probably some other scenarios that I’m not thinking of off top of my head. Another clue - it’s clear that locations are sometimes being chosen by simply clicking on the name of the park/landmark on the map. If the park is small, that’s not a big deal, but for larger parks, you end up with odd clusters of observations grouped around where google maps happens to display the park name, even if that location is poor habitat or inaccessible.
As someone with a lot of experience with collection records, I have to agree. iNat’s location data is much more accurate on average than pretty much any specimen collection. For most specimen collections, you can probably rely on the country being accurate, but after that, it’s a crap shoot.
Our standards for precision in location data have changed over the years. With modern GPS, we expect a location to be pretty darn close to the actual collection/observation spot. In the years before GPS, location data were typically much less precise … often within a half-mile or a mile was considered adequate. Lots of old specimen records have locations like “10 miles north of Anytown” where the spot from which the mileage was calculated is unclear (middle of town? the town post office? the edge of town at that time?). And don’t get me started about Township-Range-Section coordinates …
Precision is one thing. Accuracy is another. Location data on iNat is generally very precise. It’s the accuracy I’m worried about. I’ve even detected cases where the observer’s camera was getting stale GPS data from the app on the observer’s phone, and so recording very precise, but inaccurate location data in the EXIF. Again, this was detected by correlating the observations with those of other people in a group that was travelling together (as I discovered after spotting very similar photos). After I pointed out the location discrepancy, they confirmed that yeah, the camera app was prone to recording the wrong lat/long in photos.
Bottom line - there are probably lots of iNat observations with very precise, but inaccurate location information. But you wouldn’t know it without doing some serious sleuthing. It’s a very convincing illusion.
Absolutely this. In areas and with species I am most familiar with, it’s often quite discouraging to see how inaccurate locations can be. In addition to the reasons mentioned, I’ve seen two other cases:
-
GPS fix is unavailable, so the camera or phone uses the last known location, which in remote, hilly, forested country with poor GPS reception, can be kilometres away. Unfortunately it also records the last known “precision” value, so locations look great! I would post examples of this, where I and another observer photographed the same organism at the same time, but my friend’s coordinates were way out, but won’t so as not to call anyone out
-
observers who don’t even click on the map, but type the name for the city or county they are in. In theory, this should generate a precision circle encompassing the entire city, but in practice this is often not the case, and it produces weird results, like strictly forest-dependent species in the middle of urban areas.
Agreed. I just wanted to clarify that that overall, I think the quality of the data that I get from iNat is very good, but that doesn’t mean we can take all observations at face value. I’m getting good data thanks to the effort I invest into scrutinizing the observations (both for correct IDs, and also for location/date problems). I’ve written a number of programs that correlate observations in different ways. Originally, the idea was to ferret out duplicate observations, but as a side effect, I find a lot of anomalies that cannot be reconciled, which leads me to contact the observers and request correction/clarification. Furthermore, I divided Ontario into biogeographical regions, and then used the historical data to determine known ranges for the species of interest, along with expected flight seasons. This allows me to screen observations for anomalies and focus my attention where it’s likely to do the most good.
I agree, and I agree it’s typically far better than the locality data from traditional collections, which is terrible, but there is definitely still room for improvement
Just a funny (but also not funny) story that reminded me of this, I have come across many older insect specimens (pre-GPS) here in Alaska that list “Summit Lake” as the locality. The problem is that Alaska is home to several lakes called “Summit Lake” in various areas around the state, so which one they meant is anyone’s guess!
I had an experience with this that turned out well. Someone listed the location I was trying to rediscover as “10 miles north of Anytown” and there were lot of places that could be given the open country I was checking. But I knew something about the collector…they had a childhood injury that left them with one leg longer than the other and so they limped. I figured this person probably just drove up the their site without much walking in the rough country. So, I set my trip odometer and followed the road 10 miles as the road meandered to what looked like any other spot in the area. Sure enough, the population I was looking for was there exactly where it was supposed to be. I then started looking more broadly in the area and couldn’t find any of that species anywhere else, but just right there, 10 miles north of Anytown. It’s almost like one needs to get into the mindset of the original collector in some cases in hopes of figuring out what they meant when they said 10 miles north of Anytown…as the crow flies, as the road meanders, or something else. It can be exceedingly frustrating.
Locations in older herbarium specimens can be a challenge. Did you know that there are (at least) 5 Red Mountains in California? And 47 Dry Creeks in Oregon? At least Crater Lake, Oregon, should be easy to get an accurate (though not precise) location for, right? Not necessarily. There are actually two Crater Lakes in Oregon, one about as far from Crater Lake National Park as it could be and still be in Oregon.
Historic botanists have their quirks regarding locations. Consider these three prolific collectors. I sometimes call J. B. Leiberg the botanist who never knew where he was. Actually, he probably knew where he was but he didn’t write it down in a way that lets us know where he was. It seems he asked some local for the name of the nearby landmark and used that name, which never made it into maps. He collected the type specimen of a sedge at “Grizzly Peak” so for decades botanists attempted to relocate the population near what we now call Grizzly Peak, which is in a county where the sedge has never been found, as far as we can tell.
Wilhlem Suksdorf respected Native American place names and often used them, translated, as his collection localities. Admirable, but also problematic because most of those names never made it into the maps and other records of white settlers. For a long time, we had no idea where those locations were. (We know most of them now because somebody took the time to work out itineraries for him, to figure out where he was when.) Then there’s the added complication that he sold herbarium specimens, often applying a single date and place to a large series of specimens that must have been collected at different places and more than a month apart, sometimes over a year apart. Sometimes his series included plants he grew in his garden – noted on the sheet or not.
And then there’s William Cusick. Especially when he was replacing his herbarium lost by fire, he was in a hurry. His handwriting is a scrawl, often illegible (made worse as he went blind in old age). Often when you do figure it out, it reads “eastern Oregon.” Accurate, I suppose, but not precise. His locations could be eccentric – he made a lot of collections at Frank’s place. You will not find “Frank’s place” on a map. Eventually somebody figured out this was the farm of his brother Frank, which is useful as long as you also know whether it was collected before or after Frank bought a different farm and moved there – and do remember that Frank’s place was Cusick’s base, not necessarily exactly where the plant was growing.
Compared to all this, PLSS locations (Township, Range, and Section) are marvels of precision. I love TRS locations. We know the plant was (probably) found within a 1 mile x 1 mile square, and sometimes can narrow down the location to a fraction of that. Wonderful.
iNaturalist locations have problems. We should be aware of this and keep an eye out for the problems when we use the data. However, nostalgia for the accuracy of herbarium specimens (and other physical specimens) is unwarranted.
Botanists seem to have a long and famous tradition of using broad and misleading location descriptions. Asa Gray and his botanist friends spent decades searching for Oconee Bells based on Michaux’s herbarium specimen with a label claiming it came from the “high mountains of Carolina.” A teenager gone fishing eventually rediscovered it by chance growing deep in the shady gorges at lower elevations along the Catawba River. (https://huh.harvard.edu/book/shortia-galacifolia)