iNaturalist Data Quality Webinar

Yes, if the coordinates are obscured, the locality note (aka place guess) is coarsened to the encompassing county level place. So for example, this observation has the quite-specific locality note “Windermere Outdoor Adventure Centre, Windermere, Cumbria, UK”. If the coordinates were obscured, it would say only Cumbria. (Edit: slight correction from Tony – if the encompassing county level place is too small, the locality note will be bumped up. The Cumbria example is still true though.)

The other thing that seems kind of strange to me is people using the locality notes as a sanity check on the coordinates. I think in the vast majority of cases either the person enters coordinates/the phone grabs coordinates and the locality notes are just reverse geocoded from those coordinates (in which case, the sanity check is not on whether the location is correct for that observation, but on whether google’s/apple’s reverse geocoding system is any good) or the person types in their location (e.g. St James’s Park) and the coordinates are just geocoded from the text entry (which again is just a sanity check on the geocoding system, not on the observation’s location).

As proof that this is happening, if you type in St James’s Park on the web uploader, you get these coordinates and accuracy:

And here are over 300 observations with those exact coordinates (and most of them with that accuracy value): https://www.inaturalist.org/observations?lat=51.502967&lng=-0.1339534&radius=0.001&verifiable=any

So for none of those 300+ observations can you use the locality note as a sanity check on the coordinates – the observers clearly typed in the location, and then the coordinates were automatically filled by the geocoding system. They are not independent sources of data about the location of the observation.

5 Likes

The analysis of location names (aka locality notes) is quite confusing in the video. Initially, it’s included as the ninth item in the list of poor quality data, but the rest of the video only lists eight items (e.g. in the data quality concerns ranking). Towards the end of the video, there is a brief analysis of location names, but it doesn’t really address the essential problem with this item of iNaturalist data. This is rather disappointing, because many verifiers have complained a lot about this issue over the years, and there’s really no question that these complaints are generally well-founded, as can be inferred from this iNaturalist help topic: why is my observation’s location name incorrect? The fundamental problem here is that most locality notes are derived directly from the coordinates via reverse geocoding, so they have no value at all when it comes to verification - i.e. if the coordinates are wrong, then any automatically added locality notes must also be wrong, no matter how superficially accurate they appear to be. The special case of “vague” location names is really a red herring. What really matters is that, by default, most iNaturalist location notes aren’t independent items of data, so they can’t (or shouldn’t) be used for the verification of coordinates.

Now of course, if we could somehow guarantee that the coordinates always came from a reliable source, this wouldn’t be so much of a problem. But there are many different ways in which the primary location data can be entered (manually, via GPS, from geocoding, photo metadata, etc), and iNaturalist doesn’t currently record its origin. Thus, since this data has no obvious provenance, and is subject to various errors (both human and mechanical), it’s vital that an independent means of checking its validity is provided. This is why it’s so important for the recorder to manually enter their own location name, and that they choose something which doesn’t just look like a normalised address produced by a machine. Ideally, it should be a brief description based on a few local points of interest which demonstrates the recorder’s familiarity with the observation site. For example, “Field at end of footpath by SomeVillage church” is much better than “SomeVillage, SomeTown, England GB XY12 Z34”, because it uses features that can be easily found on the sort of maps provided by e.g. the Ordinance Survey or OpenStreetMap (as opposed to, say, Google, which is much more business-oriented). Needless to say, adding such information for a large number of records will be quite time-consuming as it often requires quite careful thought. But that’s precisely the point: if there’s no pain, there’s no gain.

This issue (amongst many others related to iNaturalist) was discussed several years ago on the National Forum for Biological Recording. However, since many people don’t do facebook, I will reproduce some of the relevant parts here:

Natalie Ann Harmsworth: My bug-bear so far has been with location names being auto generated. So far at least, many of the locality names used bear no close relation to the spatial references given. Is there a way to flag which location names are recorder inputted vs those that are auto generated? At present instead of the location names giving context to sightings they are confusing and make you think that there must be a problem with the record (grid reference and location name do not match). I am guessing this is going to [be] more of a problem in more rural locations…

Roger Morris: I agree wholeheartedly - we use location names as part of the location verification process - if that name is miles away then it creates big problems.

Natalie Ann Harmsworth: Quite! Because there is no direct link between iRecord and iNaturalist any queries regarding locations would have to be posted at source on iNaturalist. Practical if you only have a few to do, but if you are verifying 100s or 1000s of records I cannot see people taking the extra time to do this. I started off rejecting records if the location and grid clearly were are odds with one another, but that results in a lot of ‘lost’ records so will investigate if commenting on iNaturalist is worthwhile. In effect it isn’t the recorder’s fault if iNaturalist adds an incorrect location name - it is the system that is at fault and the fact iNaturalist does [not] consider location names so important.

Automatically adding reverse-geocoded location names seems quite appealing at first, but in the vast majority of cases, it either adds no value whatsoever or is actively misleading. If the accuracy circle is, say, 50 metres, what is the point of giving a superficially accurate location name several kilometres away? This sort of cosmetic fluff is just an attractive nuisance.

These issues aren’t restricted to iRecord. When I’m reviewing observations on iNaturalist, I don’t just want to make identifications: I would like to verify the location data as well. However, the site doesn’t really offer any help with this, as there’s no concrete mechanism, nor any agreed conventions regarding how to deal with inadequate/ambiguous location data. There seems to be a common assuption that the coordinates are somehow self-evidently correct in most cases, and so don’t deserve the same level of scrutiny that identifications attract.

5 Likes

Good to hear more depth about why this is an issue.
I hadn’t considered all of your points before.
Is there a feature request for data to be added of how location names of generated?
If not, would be good to add. As you said, provenance would really help here.

In regard to Natalie’s comment about feeding back information directly to iNaturalist, I do think the onus is on iRecord to some extent there. They could choose to make it more of a 2 way street, but they developed it as a one way bridge…. iRecord just feels to me like a black box by and large…so we all pay the consequences of that.
I reached out to see if I could do a test myself and asked if they would give me access to their API but they didn’t respond.
I wonder if we could just scrape comments from iRecord and use iNat API to at least help get recorder feedback to observers a little more. I wasn’t sure of the ethics of this. But if done with the verifiers support, it would be ok I imagine. ..

2 Likes

A couple relevant topics:
https://forum.inaturalist.org/t/make-it-easier-to-distinguish-how-coordinates-were-created/31070
https://forum.inaturalist.org/t/record-positioning-method-for-coordinates-and-make-it-available-in-observations/31165

And a snippet from staff:

And then for all of these cases, if the observation location is obscured, there is a coarsening applied on top of the locality notes, such that the original locality notes are stored privately (for the user and trusted data sharing partners), and the coarse locality notes are shown publicly and provided in exports/the API.



I don’t think there is a request for that, but based on the reception to the one to store how coordinates are generated and the staff quote above, I would personally highly doubt that iNat would be interested in a method to store how the locality notes are generated for specific observations.

1 Like

I disagree with this:

in this context of location/geospatial data accuracy.

There have been few (if any) more beneficial changes to the collection of geospatial data than the development and incorporation of GPS coordinates to data-gathering devices (cameras, phones, etc.). The advent of widespread GPS coordinate availability has allowed for quick, easy, and more accurate collection of geospatial data. I’ve worked with older datasets with text-based locality info many times, and using this data is incredibly cumbersome as well as inaccurate. In many cases, accuracy can only be determined on the level of miles, even for localities well-written by professional scientists. Written localities are also subject to inaccuracy due to changes in the landscape/landmarks/place names over time and challenges with interpreting written language (different phrases are interpreted differently by different people) as opposed to numbers which have a standardized meaning in this context.

I’m currently a professional biologist who collects and works with geospatial data. I rarely add written locality information to the data that I collect unless I have a specific reason to (private vs. public property, need to know the optimal route to a locality or something about an associated hazard). NB: not counting some kind of site level ID/categorical data for organizational purposes. I would not add a requirement for written locality information of the kind discussed above (i.e., unstructured) to any project I was designing or participating in - it wouldn’t be worth my or my coworkers’ time to write the description or try to make use of the data afterwards. The cost of creating and then using such data in an age of GPS coordinates (even with the known issues that some coordinates will have) doesn’t come close to generating the benefit needed to justify requiring them in my opinion. Excluding the use of biodiversity data because they don’t include written locality notes is akin to throwing out the baby, plumbing, and water treatment system with the bathwater.

5 Likes

I’m confused. I don’t see anything particularly negative in the responses by staff or others, just not a huge number of votes to the feature request. Given the numbers from @josscarr though, it seems this is a significant concern to far more UK data users than might be visible in the 16 existing votes ( regardless of whether they are justified or not ).

In my experience, it’s somewhat rare for staff to give overtly negative responses on things, and I interpret a merely lukewarm response as essentially negative (i.e. “we won’t prevent this from happening, but the number of things higher on the priority list is so big, it effectively isn’t going to happen”). Maybe you read these differently, but I think these comments are pushing things in the direction of unlikely to implement:

“positioning_method and positioning_device were fields that we added really early on and never ended up using much because they weren’t useful to anyone”

“there’s usually no way for us to know where those coordinates came from”

And this is from github:

In sum, I personally believe that iNat is not interested in exploring how to better capture the method used for setting either the GPS coordinates or the locality notes.
I’m not trying to squash dreams, I just don’t want people to get their hopes up. :sweat_smile:

2 Likes

ah right, fair enough…. these were old posts though …or Github at least is from 2018

maybe in light of Joss’ research and quantification of this concern by UK data users there might be some shift in stance by staff though, even if they were lukewarm towards it previously :slight_smile:

also, as @deboas just posted on the feature request… that exact request (and the response you mention by @tiwane ) is in relation to coordinates specifically, not how location names are generated

1 Like

We have 200 million records and IVON was in the start of the 20th century (1920??) I do not know the UK situation https://ndff.nl/ Most of my 200k observations are not in de NDFF, only a few thousand

1 Like

Thanks for the talk and video.
Do you know of similar studies dealing with other taxa, or with a different country?

I’ve read your post through several times, and I can’t quite see what it is you’re disgreeing with. You seem to have overlooked the crucial point that GPS is not the only source of location data. It’s very likely the most common - but there’s also a significant number of observations that employ far less reliable methods, such as manual positioning and geocoding from place names. If iNaturalist provided a way to distinguish GPS from non-GPS location data, this would make sanity-checking much more feasible. Observations with known GPS primary location data plus reverse-geocoded locality notes could be skipped; only those with non-GPS location data and/or user-supplied locality notes would be worth sanity-checking.

Locality notes are only useful insofar as they offer an independent means of checking the primary location data. Nobody is suggesting they replace them. Also, it doesn’t matter overly much if the locality notes aren’t always perfect: they just have to be significanly better than reverse geocoding (which isn’t very hard to achieve). As an additional bonus, when done well, such notes can also provide an accessible, reasonably accurate description of the observation site that doesn’t require the use of a visual mapping tool.

I have no doubt that professional biologists are capable of adding reliable location data that doesn’t require verification by independent means, but they make up only a tiny percentage of the people using iNaturalist. Most users don’t primarily use the platform for rigorous biological recording, let alone understand how and why their data might sometimes need some adjustments before being shared with third parties.

For those users who are concerned about what happens to their shared data, it’s important to understand that every potential downstream data consumer will have differing requirements, so it’s generally best to hedge your bets and try to include everything they might reasonably want. There’s really no point indulging in philosophical arguments about whether these requirements always make complete sense. If you’re keen to have your data accepted, just grit your teeth, tick all the boxes they want ticked, and have done with it. Practicality always beats purity when it comes to optimising your data for consumption by certain data consumers. This is something that applies in spades to the subject of the current topic, namely: iRecord. This platform is dominated by a relatively small pool of hegemonic verifiers, some of whom have quite old-school notions of what constitutes good recording practices. If your data happens to fall into the hands of one of these people, it’s tough luck if you decided not to endure the pain of preparing your data in the approved manner…

2 Likes

I realise this seems confusing and that my presentation gives only a shallow skim of the issue. To explain more fully, I didn’t realise location names to be such a big issue for people until AFTER I had sent out the questionnaire, hence I did not collect data on iRecord verifiers’ perceptions of this issue. I did however analyse the iRecord data itself. Therefore, I gathered data on verifiers’ perspectives of 8 data quality concerns (not including location names) but analysed the iRecord data itself for nine data quality metrics (including location names).

This is a very good point and is something I am aware of but didn’t give enough emphasis to in the recording (there was a lot to cover, to be fair). I will make sure to cover this more fully when we write the paper up.

Totally agree. I have been doing this for the last 6 months or so and highly encourage others to do the same. I am skeptical many will follow suit though, because it is extra work.

Hence my fear that convincing people to take these extra steps is likely to result in pushback. iNaturalist has thrived because of how easy it is to use. People will be opposed to taking extra time to go over their records.

I agree that this would be nice to be implemented. I have been tempted to use the ‘location is inaccurate’ field in the DQA but this isn’t the correct use case, I believe, as that field is for when the location is clearly wrong (as opposed to just questionable).

Definitely in favour of this.

I also agree wholeheartedly with this. However I feel it would be hard, if not impossible, to change the minds of the ‘powers that be’ in UK biological recording on the issue.

I wouldn’t say there’s no point… but I agree that it’s important to be pragmatic.

2 Likes

As far as I can tell, my research is the first to have looked at iNaturalist data quality across multiple metrics, but there are some others that have reviewed things like species coverage and identification accuracy, for example:

Species coverage:

  • Prylutskyi, O. and Kapets, N. (2024). State-of-the-art of iNaturalist as a source of data on Ukrainian fungi. Citizen Science: Theory and Practice, 9(1), 25.
  • Munzi, S., Isocrono, D. and Ravera, S. (2023). Can we trust iNaturalist in lichenology? Evaluating the effectiveness and reliability of artificial intelligence in lichen identification. The Lichenologist, 55(5), pp.193–201.

Identification accuracy:

  • Garretson, A., Cuddy, T., Duffy, A.G. and Forkner, R.E. (2023). Citizen science data reveal regional heterogeneity in phenological response to climate in the large milkweed bug, Oncopeltus fasciatus. Ecology and Evolution, 13(7).
  • White, E., Soltis, P.S., Soltis, D.E. and Guralnick, R. (2023). Quantifying error in occurrence data: Comparing the data quality of iNaturalist and digitized herbarium specimen data in flowering plant families of the southeastern United States. PLoS ONE, 18(12), e0295298.
  • Ackland, S., Richardson, D. and Robinson, T. (2024). A method for conveying confidence in iNaturalist observations: A case study using non‐native marine species. Ecology and Evolution, 14(10).
  • Mesaglio, T., Shepherd, K.A., Wege, J.A., Barrett, R.L., Sauquet, H. and Cornwell, W.K. (2025). Expert identification blitz: A rapid high value approach for assessing and improving iNaturalist identification accuracy and data precision and confidence. Plants, People, Planet. pp.1–16.
3 Likes

This is really interesting, also.

1 Like

Your initial post was advocating for users to take the time and effort enter unstructured locality information in the locality field regardless of how their coordinate information was collected, see:

You argued that this “pain” (observer time and effort) is required if there is to be a “gain.” This is what I am objecting to. As I detailed in my previous post, unstructured locality data has quite limited usefulness (and in most cases, will not lead to higher quality end data, as the large majority of iNat observations have coordinate data collected by a GPS enabled device). Users likely experience none of the potential marginal gains for their efforts in this scenario.

In this sense, I object to advocating for users to largely waste their time for little to no gain to a third party as summarized in this statement:

There are thousands of potential downstream users of any iNat data - most of which are unknown to the observer at the time of posting. If observers were to try to add every possible scrap of data, e.g., observation field, annotation, bit of information in the notes that they could, in their observations to maximize their value/usage by hypothetical future users, this would be a huge waste of time and very inefficient for the observers (unless they take their own satisfaction in doing so, which is fine!).

The onus is not on observers to make their observations useful for every possible end user, which is what this position seems to be, but for data users to use data from iNat observers as best as they see fit.

I would flip this statement around:

“If free data that they never could have collected otherwise happens to fall into the hands of one of these people, it’s tough luck on them if they can’t endure the “pain” of figuring out how to use an incredible data resource.”

3 Likes

I don’t know what is meant by “unstructured” locality data. Saying where the observation took place isn’t just an additional “scrap of data”. It is one of the four basics that make an observation a record. Is a GPS always correct these days? They weren’t in the past. I accept that if you are out at sea, a map reference might be the best you can get. But if you are in a country where most valleys, woods and lakes and of course all the villages have names, it makes sense to give the locality in words in order to verify the map reference. If you get an error in a map reference, it is still a reference to somewhere so you have a false record and no means of detecting it.

2 Likes

No, I’m afraid I don’t see, since you’re ignoring all the crucial context that “this” refers to (just as with your other post).

You’re showing your lack of familiarity with iRecord here (and with the long history of biological recording in the UK). The point of this thread is to try to find ways to improve the take up of data imported into iRecord from iNaturalist, since currently only about 30% are reviewed and accepted, compared to about 80% for all other sources. The position you seem to be advocating of advising users to take the easy way out and do nothing whilst laying all the blame on the iRecord verifiers is never going to help improve matters.

3 Likes