Looking for help with determining how location notes are generated by iNat

I manage an observation database that aggregates observations from multiple sources. I’ve been trying to “harmonize” the location descriptors (place names) coming from different sources so that they follow a common format. (I know, a tall order)

The bulk of our observations now come from iNat, so the biggest issue is with the locality notes we get from iNat. This isn’t so much a problem with idiosyncratic locality notes entered by observers, but with those generated automatically by iNat. The main problem I run into is that these “default” locality notes are often vague, inaccurate, and/or inconsistent. Because they are inconsistent in both format and content, it’s difficult to write code that can parse/reformat them and assess them to determine whether or not they are sufficiently precise/accurate (or need to be replaced with something more precise/accurate)

I understand that the problem likely has a number of causes, one of which is pointed out in this bug report:

https://forum.inaturalist.org/t/use-a-more-accurate-result-for-android-app-locality-notes/61149

(ie. depending on how an observation is entered into iNat, different algorithms are used to construct the locality notes from what is returned by the google reverse geocoding API)

I don’t expect this issue to ever get addressed/fixed in iNat, so I’ve been trying to “fix” the problem from my end. I’m under the impression that iNat generates the default location notes using google maps reverse geocoding, so I’ve been looking at the address information I get from the google maps reverse geocoding API (via Python). I’m trying to see if I can implement my own algorithm to construct a more consistent/precise/accurate locality descriptor in our preferred format. If I can manage to do this, I will consider simply replacing the locality notes I’m getting from iNat with my own location descriptor.

Note that my coding knowledge/skills are rudimentary…

In my preliminary testing, I’m getting reverse geocoding results and I have a pretty good idea of how I could create “replacement” locality descriptors from them. In many (possibly most) cases, they will be “better” than what I’m getting from iNat. But unfortunately, I’m seeing a number of instances where the iNat locality notes contains an “important” place name that doesn’t appear in the response I get from the google maps API. Looking at a specific example, I can see that these locality notes are consistent across multiple observers, with identical lat/long coordinates in each case. Based on my own testing, my guess is that in each of these cases, the observer probably entered the place name into iNat rather than the coordinates. In my example, the place name is “Holiday Beach Conservation Area”. When I try to create my own observation and enter that place name into iNat, it generates identical lat/long and locality notes to what I’m seeing for the other sample observations. This explains the inconsistency between the locality notes from iNat and what I’m getting from the reverse geocoding API. It’s because in these cases, the locality notes were not generated by iNat using the google maps reverse geocoding API. My guess is that iNat does a place search using the place name entered by the user and gets the locality descriptor and lat/long from that API. In these cases, the locality notes I’m getting from iNat are going to be “better” than what I can generate from the reverse geocoding API because they contain a pertinent place name that is not typically included in the reverse geocoding results (there may be a few cases where a similar park name is included, but in most of my tests, these place names are missing).

If I could accurately determine which observations had their locality notes created via this alternate mechanism, vs the ones where iNat used the google reverse-geocoding API, I could know where I should use the locality notes from iNat vs where I should construct my own using the results of reverse geocoding.

I’m getting observation data from the iNat API using “get_observations”, and as far as I can tell, there’s nothing associated with “place guess” to indicate how the place name was generated (if there is, PLEASE correct me). I’m not sure that I can write code that will reliably determine the creation mechanism simply by looking at the format of the locality notes. I don’t think there’s any reliable clue to the origin embedded in the format.

I’m hoping that someone with more knowledge/skill in this area might be able to offer a suggestion on how to sort out this problem. I guess it boils down to wanting to replace the locality notes when they’re “poor”, but keep them when they are “good”, but not knowing how my code can determine when they are “good”. Perhaps it isn’t realistic to expect that this can be done.

I agree it would be nice to know this. The geocoded location notes could be tossed for many applications. The human entered ones are those of primary interest.

One potentially useful point is that (I think, but don’t know for sure) that some of the apps don’t allow user-specified location notes, only the geocoded ones. If that’s the case, then 99.9% of app-generated locations could be assumed to be geocoded (quite unlikely that users are going back to the web after uploading to change location notes, though I’m sure a very small % do). And app-created observations are assessible.

I don’t think that works. In the past, I’ve tried to discern patterns in the formats of the location descriptors based on how the observations were created (web vs iNat apps vs Seek) and I wasn’t able to spot any consistent pattern.

Looking at my specific example from my previous posting, then I type “Holiday Beach Conservation Area” into the web interface for creating an observation, it gets “auto-completed” as:

“Holiday Beach Conservation Area, 6952 50 Cr, Amherstburg, ON N0R 1G0, Canada”

with lat/long set to: 42.0422, -83.035 (after rounding to 4 decimal places), with accuracy set to 186m

Looking at the observations I downloaded for 2024, there are 4 observations that have exactly the same locality notes, latitude and longitude. 3 of them have the same accuracy figure. One has a larger value (presumably edited by the observer).

Each one was entered by a different observer. 3 were entered via the Android app, and 1 via the web interface (that’s the one with the different accuracy figure). So I don’t think we can use the entry mechanism (ie web vs app) to assess the quality of the locality notes value.

In addition, there are 20 observations which have a slightly different (but still “good”) locality notes value:

“Holiday Beach Conservation Area, Amherstburg, ON, CA”

I checked a few, and they were entered using the iPhone app.

There are also several observations where the locality notes do not specify the Conservation Area (though the lat/long falls within the boundaries). For example, we have:

Holiday Beach, Amherstburg, ON N0R 1G0, Canada

Holiday Beach, ON, Canada

These appear to have been entered via the web interface. These look like addresses I can get from the reverse geocoding API. Perhaps the smart phone apps use an “enhanced” form of reverse geocoding that includes place names that are not included in the normal reverse geocoding API. There’s supposed to be a new feature in google’s reverse geocoding API called “Address Descriptors” which I can well imagine might specify a place name like a Conservation Area. Unfortunately, I can’t seem to be able to get it to work in my code. All forms of specifying that it should be included in the response are rejected. It may be that it is still experimental and not available to everyone, or perhaps it isn’t available to someone like me who is on the google API “free trial” tier.

I’m not saying that using entry in the app can tell you whether the locality notes are

or not (whatever that might mean in practice), just whether they were almost certainly geo-coded (ie, had some location value determined by an automated process vs. entered manually by a human). In the classic iNat app for iOS for instance, I don’t think it’s possible to manually enter location fields. Users can search for existing locations, but not enter a text value outside of those (again, not 100% sure of this, but I think it’s correct).

In my mind, there’s no clear reason to try to use geocoded localities for research. If one has coordinates (which is the case on iNat), they provide more standard info than an unknown geocoding process that has likely changed over time. I highly doubt that Google’s geocoding process has remained consistent over the 15+ years iNat observations have been created (whether due to changes in the base maps, process, or other reasons), and there’s no guarantee that iNat’s method of adding geocoded information has remained consistent over that time either. Observations created with the exact same coords and precision/accuracy values at different times are likely to have returned different geocoded locations in at least some places.

I think there are really at least four possibilities you would need to check (as opposed to “geocoded” vs. not):

  1. All manual data entry
  2. Manual entry of location (text string) with automated return of coords/precision
  3. Manual entry of coords/precision (or from photo file) with automated return of location text string
  4. Either 2 or 3 above manually edited by user

The classic iNat app really only allows 2, 3, and 4 (Again, I think, with location text limited to places already available/recognized).

If someone wants geocoded localities for their own use in research for whatever reason, it would be much more preferable to take the coords and accuracy from iNat and run all the geocoding via a known, consistent process to allow for comparability.

1 Like

Posting this quote here as it addresses some relevant issues in a response from staff

Note that staff explicitly say the process for creating the values in this field isn’t intended to be consistent.

1 Like

Ok, gotcha. Sorry for misunderstanding. Yes, I agree - source may be a good clue as to whether or not the place name generation was automated. I might be able to factor that into my algorithm.

As to the usefulness of place names for “research”, I think much depends on what kind of research you’re doing. Our database is used by a variety of different users - both scientific researchers as well as interested amateurs. Our database includes a lot of historical data (specimen records, reports from literature, etc.). As such, we have a lot of data were the lat/long was actually determined from the place name (with all the imprecision/ambiguity that implies). We have many localities which go by a variety of names. For a variety of reasons, we don’t make the lat/long coordinates available to just anybody, so for some users, all they ever see is the locality descriptor (in the case of iNat observations, they can go to iNat to see the lat/long). So having a reasonably precise/accurate place name to go with the lat/long is helpful in some scenarios. Having a certain amount of consistency in these place names helps a lot when you’re looking at a long list of observations spanning decades. I don’t have lat/long for various locations memorized, but I know the names of many of the localities frequented by observers (both current and historical). I’ve never been to Holiday Beach CA, but I’ve seen it numerous times and I know roughly where it is, so if I’m looking at a list of observations for a particular species and I see that name, I can instantly place the observation in my mind, whereas an alternate locality descriptor might not “ring a bell”.

To a degree, we’ve implemented some standardized reverse georeferencing of our own, using official boundaries of various defined Provincial/National Parks, conservation areas, etc. We’ve also implemented something similar for informal localities that are frequently visited (ie. observation “hotspots”). But there are too many small parks, conservation areas, hotspots, etc. for us to keep all their boundaries on file.

Like I said, I’m trying to have my cake and eat it too. I’d like to keep the “good’“ place names and replace the less good ones.

Thanks for the input.

Yeah, I get that. Just trying to make the best of a bad situation. At minimum, I would say it would be nice if iNat kept track of whether or not the place_guess has been manually edited by the observer. That should be possible.

1 Like

Some related previous discussions (click through for additional context):



1 Like

The location note, if generated from the coordinates, can be in different languages. My home observations are in “Rutherford County” at the old house and “Condado de Rutherford” at the new house, because I switched the interface language between when I joined and when I moved. I’ve seen an observation in Urubamba, Peru, with the location in Chinese; I looked up the characters and they are a close approximation to “pe ru u ru bam ba”.

1 Like

Yes, locality names in different languages is another issue, but only a minor one in my context. There aren’t many, and I can detect most of them in my code by checking the character set. I can just replace them with something I generate using reverse geocoding. Typically, I only run into problems with French location names that don’t happen to use any characters with accents.

Very much agree!

I recently noticed that on this observation:

https://www.inaturalist.org/observations/333086857

The location is written as “23rd St, Санта-Моника, CA, US” which has the Roman alphabet on both ends, and Cyrillic in the middle. I asked why that would be (the same user’s other observations taken in Santa Monica are entirely in the Roman alphabet), and @tiwane who had previously left an ID on the observation replied that because it was an iPhone observation, any questions about how the place name was generated or formatted would need to be put to Apple support. So you might sometimes, in unpredictable ways, have more than one character set in the same locality.

1 Like

Because the iNat place names are not consistently geocoded, it seems the easiest way to set up your database would be “lat/long first”, where everything starts with a geopoint in the database and you could geocode as needed to create place names. That way you would be in control of the geocoding and could do it consistently.

I just feed the entire location string into “isEnglish”. If even a single character is from a non-english character set, it returns False. Then I just discard the location and create a replacement. Normally, this was done “by hand”, but I’m working on coming up with an automated way of doing the reverse geocoding. Unfortunately, none of the options generate consistently good results for Ontario. I’m actually quite surprised by how bad the google maps API is. I figured that once I jumped through all the hoops to get access to it, my problems would be largely solved, but the results are largely disappointing.

1 Like

See my previous post. It turns out that the google maps reverse geocoding responses aren’t that great, overall. In addition to all the difficulties there are with choosing which is the “best” address of the 5+ that the API returns for a given set of coordinates (contrary to what most online sources suggest, it isn’t necessarily the first one), you can’t even rely on the address type to determine which will is likely to be the best. Furthermore, it turns out that there are elements in their formatted addresses that don’t exist in the “address components”, which shouldn’t be possible.

I ran into similar problems with Nominatim, which is based on Open Streets maps. There’s a lot of inconsistency in what gets put into the various address fields returned by the API, making it hard to write code that can take those fields and consistently assemble a usable location descriptor. It works “ok” most of the time, but it generates suboptimal results often enough that you want to check the results. And when you’re talking about hundreds of thousands of records, that ends up being a lot of work (I know because I have checked/corrected 10’s of thousands over the years).

I’m starting to think that even if I was able to start from scratch and generate all new location descriptors, there’s no good way to so. The cure would be worse than the disease.

There are places whose English name comes from another language and has an accent, such as San José, California. There are other places whose name in another language is spelled with only the Latin letters without accent, such as the aforementioned Condado de Rutherford.

1 Like

Sure. In such cases, the newly generated replacement location descriptor will likely be similar, and may have the accent as well (but not always). Up until now, I’ve done all the non-English place name replacement “by hand”, using my judgement as to whether to include the accents or not (many place names in Ontario which nominally contain accented letters have alternate spellings that do not).

I’ve been dealing with the suboptimal location descriptors that we get from iNat for years, and I recently spent months working on (manually) reformatting location descriptors for historical records. I’m well aware of the various pitfalls. The trick is getting code to simulate what I’ve been doing “by hand”. I’ve got code that can generate “auxiliary” location descriptors using various mechanisms, but I was hoping to find one that would reliably generate descriptors of the form:

“City/Town - road name” or “City/Town - Park Name” (and other similar variations)

The automatic replacement I was planning would have included a certain amount of manual intervention/checking. IMO, this is unavoidable (I would never trust code to do this kind of thing without “supervision”). In cases where “non-English” place names are replaced, I would pay particular attention to whatever place names are substituted. This is pretty easy to do - I simply flag those observations with the replacement reason so that I can easily find them and review them.

I now have the google reverse-geocoding API working, but I’m disappointed in the results I’m getting for many test locations (not much better than what I was getting from Nominatim). So much manual checking would be necessary that it probably isn’t worth the bother of trying to wholesale replace all the iNat location descriptors. Instead, I’ll probably do more targeted replacement. My existing code that morphs the iNat location notes into our preferred location format has the ability to detect certain types of suboptimal descriptors (eg. non-English, county/district level, etc.). I can have that code generate candidate replacements for the really bad ones using the google API and then I’ll manually “approve” the replacements. For 2025, I’m getting ~52K observations from iNat. Hopefully, I won’t have to review too many replacements. Based on how horrendous that work turns out to be, I’ll consider updating the location descriptors on the hundreds of thousands of iNat observations we have from previous years.

1 Like

I have faced the same problem and come to the same conclusion. Only the hierarchical place data (returned as the place_ids in API results) are going to be useful programatically. The place_guess is just too unpredictable to use for anything other than supplementary verbatim data.

Most of my observations are from a DSLR and I geolocate most of the photos using a track file. Typically I’ll drag the images into the web uploader and batch add an accuracy circle but not update the locations otherwise. When I do this I think it’s always formatted as [County, State/Province, Country]. Recently I’ve been uploading photos from a trip to Arizona and noticed an inconsistency with one county where usually it’s “Pima, Arizona, United States” but sometimes it’s “Pima County, AZ, USA”.

I had one observation where the photo didn’t get georeferenced so I had to choose a location manually in the web browser, this was in an urban area and the format then was [Neighbourhood, City, state code, Country] without reference to the county the city is in. Another observation at exactly the same location that was georeferenced had the former format as usual.

When I upload photos through the Android app usually I take the photos in the field and don’t upload them to the app until afterwards. Usually they take the format [City, CA-ON, CA] but if I’m at a bus stop it takes the format [STREET at STREET, City, ON Postal code, Canada]. I don’t get why it doesn’t know more precise than city unless I’m standing at a bus stop…

I guess the location notes are more reliable here than the coords even. Especially with irregularly shaped locations; if I search the location “Ontario” in the web uploader, the circle generated doesn’t include most of the population centres in the provinces where most observations are likely to be coming from.