Looking for help with determining how location notes are generated by iNat

it wouldn’t be my first choice to do this, but you wanted a way to do street/road. so if you want to go down that path, that seems like the best way to do that.

the online atlas has views for county, TEA zone, 10k squares, parks, and nature club circles. the record listing page has columns for the first 3, but not the last 2 items. In my mind, you should add the last two items as columns, too, and you should drop the TEA_location altogether. Between the original location description provided by the observer and the 5 categories you already use, it seems like you’ve already got enough information there.

to me, there’s no sense in spending a lot of time trying to create your own custom set(s) of zones / locations. If you need other zones (besides those in your existing 5 sets), you should get them from well-established sets – census divisions, administrative boundaries, parcel maps, etc. trying to do reverse geocoding using OSM or Google Maps seems like it’s just going to be frustrating, because those databases are constantly changing and include lots of overlapping random community-created places, similar to the random community-created places in iNaturalist.

This is definitely not the case for a large number of observations.

3 of those categories (County, Zone and Forest region) are very large, and not particularly useful for narrowing down locality. IMHO, the counties are largely useless because they are arbitrary administrative boundaries that have no relationship to biology. If it was up to me, I’d probably drop them, but it isn’t my decision. The Zones were intended as an improvement over the Forest Regions, and those are bio-geographical in nature. I argued from dropping the Forest Regions once we had the Zones, but again, I was overruled (it was argued that it costs us nothing to keep them).

Please note that For iNat observations, the “original location description provided by the observer” is currently the place_guess I get from the iNat API. As we’ve discussed, this is frequently inadequate. A significant percentage only have the county in the place_guess. For remote northern locations, this makes sense as there wouldn’t be any towns to refererence, but for some as-yet-unexplained reason, this also happens for a fair number of locations in the south. Many observations have a city/town name in the place_guess, but it isn’t always the closest one. Our shape file for parks is far from complete. Last time I checked, I think less than 30% of the observations fall within named parks. The percentages are even lower for Nature Club circles (we’ve only ever implemented a handful of those - I think they’re a waste of time).

So if we were to follow your suggestion, I’ll have a large percentage of observations where the only location information shown will be the county name, or if I’m lucky, a town/city name (cause that’s all I got from iNat in place_guess). There won’t be a park or a nature club circle because the observations didn’t fall within any of those.

I concede that there is overlap between the regular location and TEA Location - that’s because half the time, the regular location is unhelpful. The TEA Location was intended as a alternative - providing a higher level, standardized location descriptor while retaining the regular location (in case there was useful detail there). If I succeed in “fixing” the regular location so that it consistently contains a useful location descriptor, I might be able to ditch the TEA location.

Back when eBird was hinting at making polygonal hotspots years ago I realized someone would’ve had to create polygons for every single hotspot and played around with making some in my area to see what it would look like. It’s complicated because eBird hotspots are often kind of arbitrarily assigned to some trails and not others within larger parks and different people would make different decisions about their extent.

However if someone did decide to go and make polygons for lots of hotspots (~roughly equates to every popular natural area) in the province and uploaded them to iNat then you could keep a list of those place IDs, check if observations have them on their list of community curated encompassing places, and have a standardized list of place names to assign to observations. That would be a huge project and probably unrealistic but theoretically it could be done collaboratively in Google My Maps or something like that.

Many of them already do have iNat places so you could just make a list of places that already exist for which you trust the boundaries and do that with those.
E.g. Fletcher Creek/Crieff Bog in Puslinch has 2 iNat places, one of them doesn’t include the fen which I think is the most biodiverse part of the preserve. If you don’t like either of those you could flag one of them to propose changes, or add your own place for it.

I found it very simple actually using my “hack” method - it was just a matter of being willing to spend time fiddling with the polygon (visually, on the map) until you added enough points to follow whatever contours you need to follow. I used My Maps in google maps. My trick was to start by importing a polygon (in fact, it was a set of squares from the breeding bird atlas, which we also use for our butterfly atlas). You then just move your starting polygon over the spot where you want it, then start moving the corners into position. Each time you click on the midpoint of one of the sides, it creates a new “corner” that you can drag into position. I never encountered a limit to the number of points I could add to a polygon - just my patience. I just kept adding new starter polygons using the import function. I then exported the results as a KML file, which by luck, turned out to be compatible with our georeferencing code. I created over 300 polygons and we’ve been using them for a couple of years now. The nice thing about it is that I can refine those polygons at any time. I just open the map in My Maps, play around with the boundaries (or add additional polygons), and then export the KML file. I believe there’s some work at the other end to convert the KML into a shape file (that’s done by my associate), so I only update the polygons when I need to.

Well technically, we already have this functionality, but it’s after the fact. We’d have to do this anyways, because not all of our data comes to us via iNat. Even if we put all these polygons into iNat, we’d still have to run our georeferencing code on all the non-iNat observations.

Yup, that’s my plan. As I said earlier, I took the master list of iNat places that someone linked to, and we’ve already winnowed it down to just the Ontario places. Now I just have to go through that list and pick out the ones that look useful. But that’s going to take some time, since I’ll have to verify the boundaries of each candidate. Right now, I’m working on processing the 2025 iNat observations. My code will include 3 different location options in the output - the original place_guess provided by iNat, a re-formatted version of the place_guess, and then a alternative location generated from the response I get from the google maps API. I’ll then review the results and see how well each strategy performed, and refine the code as needed. I’m hoping I won’t have to do much in the way of “manual georeferencing”.

1 Like

Ontario GeoHub has lots of useful geospatial data that’s relevant to your use-case, such as Eco Districts and Geographic Names.

Have you considered using township boundaries? These have much finer granularity, and their distribution matches up quite well with the clustering of TEA records (perhaps unsurprisingly, since they’re both strongly correlated with population density). Obviously this won’t get anywhere near full coverage for the entire state, but there are relatively few records outside the township boundaries, so you can just use custom hotspots for the outliers. These could be based on a combination of eco-district and, say, nearby lakes (since there’s no shortage of those throughout the outlying areas).

Using georeferencing services like Google will mostly get you blandly ambiguous results that often add very little informational value. If you’re going to put a lot of effort in to a project like this, it seems perverse to settle for a third-rate solution just to make things a little bit easier.

1 Like

Thanks for the suggestions, and in particular, but the links to GeoHub - I’ll have to look that over.

The problem with townships is that to most people (even locals), they would be largely meaningless. They’d have to look them up. I say this as someone who has traveled a fair bit around the province, been to many of the observation hotspots, and looked at thousands of records. They’re even more obscure than the counties (which are obscure for most people).

To my way of thinking, the whole point of a location descriptor is that when somebody looks at it, it gives them at least a rough idea of where the observation took place (without the need to consult a map). Sure, that’s not going to be possible in every case, especially for a province the size of Ontario. People from one side of the province aren’t likely to recognize obscure geographic references from the opposite side of the province. But regardless of how correct a location descriptor might be, what benefit does it bring if almost nobody recognizes it?
I’ve been working with these records for years. I’ve georeferenced thousands of historical records (in many cases, having to look up the townships and many old ones no longer exist). I can tell you that I would only recognize a handful of township names, and then only because they are a variation on the name of a city/town that is associated with them. And there are likely a whole bunch that are completely misleading - where the township name is the same (or resembles) the name of a town/city somewhere else in the province.

The ecodistricts are interesting. I’ve seen variations on that theme, and I used a set of them to construct the (larger) reporting Zones we use in our Atlas now. But those aren’t really intended as location descriptors - more of a logical way of grouping/filtering observations for analysis/discussion. Unfortunately, if used as location descriptors, some of those Eco-district names would cause confusion because they are the same as counties, towns or other administrative levels. You’d have to include “Eco District” in the name to prevent confusion, and even that probably wouldn’t work. I briefly thought that perhaps I could use these for the extreme north of the province, where the counties (aka districts) are particularly vast, and named settlements are extremely sparse (making them of limited use as reference points). But then I looked at some of the names of these Ecodistricts, and they don’t ring bells for me. If I saw that an observation was reported for “Wood Creek”, I would never guess that it was along the coast of Hudson Bay. Many of the more active TEA members are stuck on counties. I had to fight to get the reporting Zones implemented (and they did prove very useful in the end). So the Eco-districts are an interesting combination of overly familiar and overly obscure names.

This discussion has strayed far beyond my original question, which was simply “how does iNat construct the location notes (aka place_guess)”. Some of the suggestions that have been put forward (for location descriptors), though algorithmically/conceptually rigorous, would be of little benefit. If I wanted to assign a correct location descriptor that is meaningless to the average user, I could just use the breeding bird Atlas square ID that we already assign to all observations. There are a handful of people who might recognize the ID of a square or two that they frequent, so I would say that in terms of usefulness, that would be about on a par with townships. In fact, I considered simply going through the squares one by one and giving each one a descriptive name. I did it for a dozen or so before I noticed how hard it was to come up with a unique and meaningful name for some of the remote ones. That’s something I’m still considering as an option for some very remote areas where no other method for assigning a location descriptor will work.

This database has been around for over 20 years, and the website has been around for something like 15. I’m not starting from scratch so I don’t have a free hand to do as I please. There are expectations and precedents. I already have a lot of work invested in constructing/harmonizing place names that are variations on:

“TOWN - LANDMARK” or if I’m lucky “TOWN (distance/direction) - LANDMARK”

(where LANDMARK might be a street, road, park etc.)

I didn’t invent this format. Lots of serious observers have used variations on it going back as far as we have observations - which is something like 130 years. The problem is that it hasn’t been used consistently, and it isn’t the format that I normally get from iNat (except in rare cases where the observer entered it manually). I’ve been coaxing observers to use this format, and have succeeded in getting some traction. For the non-iNat observations, I can get the location descriptors into this format fairly easily. It’s the iNat observations that cause the problems, and these days, they represent the bulk of the observations we add to the database every year.
My code now does a reasonable job (most of the time) of generating a location descriptor in that format from what is returned by the google maps API. I will likely be able to improve on that somewhat based on some of the suggestions that have been offered here.

Thanks.

1 Like