I manage an observation database that aggregates observations from multiple sources. I’ve been trying to “harmonize” the location descriptors (place names) coming from different sources so that they follow a common format. (I know, a tall order)
The bulk of our observations now come from iNat, so the biggest issue is with the locality notes we get from iNat. This isn’t so much a problem with idiosyncratic locality notes entered by observers, but with those generated automatically by iNat. The main problem I run into is that these “default” locality notes are often vague, inaccurate, and/or inconsistent. Because they are inconsistent in both format and content, it’s difficult to write code that can parse/reformat them and assess them to determine whether or not they are sufficiently precise/accurate (or need to be replaced with something more precise/accurate)
I understand that the problem likely has a number of causes, one of which is pointed out in this bug report:
https://forum.inaturalist.org/t/use-a-more-accurate-result-for-android-app-locality-notes/61149
(ie. depending on how an observation is entered into iNat, different algorithms are used to construct the locality notes from what is returned by the google reverse geocoding API)
I don’t expect this issue to ever get addressed/fixed in iNat, so I’ve been trying to “fix” the problem from my end. I’m under the impression that iNat generates the default location notes using google maps reverse geocoding, so I’ve been looking at the address information I get from the google maps reverse geocoding API (via Python). I’m trying to see if I can implement my own algorithm to construct a more consistent/precise/accurate locality descriptor in our preferred format. If I can manage to do this, I will consider simply replacing the locality notes I’m getting from iNat with my own location descriptor.
Note that my coding knowledge/skills are rudimentary…
In my preliminary testing, I’m getting reverse geocoding results and I have a pretty good idea of how I could create “replacement” locality descriptors from them. In many (possibly most) cases, they will be “better” than what I’m getting from iNat. But unfortunately, I’m seeing a number of instances where the iNat locality notes contains an “important” place name that doesn’t appear in the response I get from the google maps API. Looking at a specific example, I can see that these locality notes are consistent across multiple observers, with identical lat/long coordinates in each case. Based on my own testing, my guess is that in each of these cases, the observer probably entered the place name into iNat rather than the coordinates. In my example, the place name is “Holiday Beach Conservation Area”. When I try to create my own observation and enter that place name into iNat, it generates identical lat/long and locality notes to what I’m seeing for the other sample observations. This explains the inconsistency between the locality notes from iNat and what I’m getting from the reverse geocoding API. It’s because in these cases, the locality notes were not generated by iNat using the google maps reverse geocoding API. My guess is that iNat does a place search using the place name entered by the user and gets the locality descriptor and lat/long from that API. In these cases, the locality notes I’m getting from iNat are going to be “better” than what I can generate from the reverse geocoding API because they contain a pertinent place name that is not typically included in the reverse geocoding results (there may be a few cases where a similar park name is included, but in most of my tests, these place names are missing).
If I could accurately determine which observations had their locality notes created via this alternate mechanism, vs the ones where iNat used the google reverse-geocoding API, I could know where I should use the locality notes from iNat vs where I should construct my own using the results of reverse geocoding.
I’m getting observation data from the iNat API using “get_observations”, and as far as I can tell, there’s nothing associated with “place guess” to indicate how the place name was generated (if there is, PLEASE correct me). I’m not sure that I can write code that will reliably determine the creation mechanism simply by looking at the format of the locality notes. I don’t think there’s any reliable clue to the origin embedded in the format.
I’m hoping that someone with more knowledge/skill in this area might be able to offer a suggestion on how to sort out this problem. I guess it boils down to wanting to replace the locality notes when they’re “poor”, but keep them when they are “good”, but not knowing how my code can determine when they are “good”. Perhaps it isn’t realistic to expect that this can be done.