Thanks for the suggestions, and in particular, but the links to GeoHub - I’ll have to look that over.
The problem with townships is that to most people (even locals), they would be largely meaningless. They’d have to look them up. I say this as someone who has traveled a fair bit around the province, been to many of the observation hotspots, and looked at thousands of records. They’re even more obscure than the counties (which are obscure for most people).
To my way of thinking, the whole point of a location descriptor is that when somebody looks at it, it gives them at least a rough idea of where the observation took place (without the need to consult a map). Sure, that’s not going to be possible in every case, especially for a province the size of Ontario. People from one side of the province aren’t likely to recognize obscure geographic references from the opposite side of the province. But regardless of how correct a location descriptor might be, what benefit does it bring if almost nobody recognizes it?
I’ve been working with these records for years. I’ve georeferenced thousands of historical records (in many cases, having to look up the townships and many old ones no longer exist). I can tell you that I would only recognize a handful of township names, and then only because they are a variation on the name of a city/town that is associated with them. And there are likely a whole bunch that are completely misleading - where the township name is the same (or resembles) the name of a town/city somewhere else in the province.
The ecodistricts are interesting. I’ve seen variations on that theme, and I used a set of them to construct the (larger) reporting Zones we use in our Atlas now. But those aren’t really intended as location descriptors - more of a logical way of grouping/filtering observations for analysis/discussion. Unfortunately, if used as location descriptors, some of those Eco-district names would cause confusion because they are the same as counties, towns or other administrative levels. You’d have to include “Eco District” in the name to prevent confusion, and even that probably wouldn’t work. I briefly thought that perhaps I could use these for the extreme north of the province, where the counties (aka districts) are particularly vast, and named settlements are extremely sparse (making them of limited use as reference points). But then I looked at some of the names of these Ecodistricts, and they don’t ring bells for me. If I saw that an observation was reported for “Wood Creek”, I would never guess that it was along the coast of Hudson Bay. Many of the more active TEA members are stuck on counties. I had to fight to get the reporting Zones implemented (and they did prove very useful in the end). So the Eco-districts are an interesting combination of overly familiar and overly obscure names.
This discussion has strayed far beyond my original question, which was simply “how does iNat construct the location notes (aka place_guess)”. Some of the suggestions that have been put forward (for location descriptors), though algorithmically/conceptually rigorous, would be of little benefit. If I wanted to assign a correct location descriptor that is meaningless to the average user, I could just use the breeding bird Atlas square ID that we already assign to all observations. There are a handful of people who might recognize the ID of a square or two that they frequent, so I would say that in terms of usefulness, that would be about on a par with townships. In fact, I considered simply going through the squares one by one and giving each one a descriptive name. I did it for a dozen or so before I noticed how hard it was to come up with a unique and meaningful name for some of the remote ones. That’s something I’m still considering as an option for some very remote areas where no other method for assigning a location descriptor will work.
This database has been around for over 20 years, and the website has been around for something like 15. I’m not starting from scratch so I don’t have a free hand to do as I please. There are expectations and precedents. I already have a lot of work invested in constructing/harmonizing place names that are variations on:
“TOWN - LANDMARK” or if I’m lucky “TOWN (distance/direction) - LANDMARK”
(where LANDMARK might be a street, road, park etc.)
I didn’t invent this format. Lots of serious observers have used variations on it going back as far as we have observations - which is something like 130 years. The problem is that it hasn’t been used consistently, and it isn’t the format that I normally get from iNat (except in rare cases where the observer entered it manually). I’ve been coaxing observers to use this format, and have succeeded in getting some traction. For the non-iNat observations, I can get the location descriptors into this format fairly easily. It’s the iNat observations that cause the problems, and these days, they represent the bulk of the observations we add to the database every year.
My code now does a reasonable job (most of the time) of generating a location descriptor in that format from what is returned by the google maps API. I will likely be able to improve on that somewhat based on some of the suggestions that have been offered here.
Thanks.