iNatGuessr - Can you guess the location from the observations?

annkatrinrose · August 24, 2023, 11:59pm

Very fun and kind of addictive! I noticed some of the same things that others have already mentioned: lots of similar or even identical taxa, sometimes from a single user and similar enough photos that I’m suspicious they were multiple pictures from the same observation. I think a little more diversity in what is shown would be good (both in terms of taxa as well as observers). I also had one case where I got very close (hundred-something kilometers) but the line on the map was drawn almost all the way around the globe suggestive of a huge distance.

pisum · August 25, 2023, 12:03am

i think the algorithm that attempts to randomize the location on the earth currently leads to quite a bit of clustering and occasional repeats. if i play ~20 rounds, i often come across effectively a repeated set of observations. i suspect that the time of day when you’re playing also affects which locations you get. for example, i’m here in the US, and when i play during the day i’m getting a lot of Americas and Western Europe, but when i played last night, i was getting more Eastern Africa, Asia, and Australia.

you might be able get more consistent global variation if you switch to using UTFGrids to select your location to be guessed. if you get the level 0 (z=0) UTFGrid, that will give you the entire world effectively divided into a 32x32 grid, with counts of observations in each cell and the latest observation in each cell (technically, it’s a 64x64 grid, where the values from the SE cell in each 2x2 supercell will give you the observation count and latest observation for that 2x2 supercell): https://api.inaturalist.org/v1/grid/0/0/0.grid.json?geo=true&photos=true&geoprivacy=open&taxon_geoprivacy=open&quality_grade=research&spam=false.

each one of these cells in the 32x32 level 0 UTFGrid should correspond to another UTFGrid at level 5, which also will be similarly divided in an effective 32x32 grid. so if you pick one of the cells at level 0 (using some sort of algorithm that weights based on observation count and/or excludes low-count cells), and then pick another cell from the corresponding level 5 UTFGrid, then that cell from the level 5 UTFGrid should correspond to a level 10 UTFGrid, which should cover roughly the area of a city (maybe equivalent to a square 30km-40km on each side). in turn, each cell within the level 10 UTFGrid should roughly correspond to a level 15 UTFGrid, which roughly corresponds to a square with sides 1km to 1.5km, or the size of a small neighborhood or maybe a large city park.

so if you were to take the level 10 UTFGrid and look for single cells or maybe small clusters with >n observations, then you can pick a random point within that cell or cluster (or get the coordinates from the latest observation in the cell) and use that as your center point to be guessed, that might give you a good idea of a minimum number of observations that occur within a sub-5km circle.

or if you were to pick the center point from a cell in the level 5 cell (or the NW corner of the SE 64x64 sub-cell), that would probably give you a good idea of the minimum number of observations that occur within a 50km circle.

you would only have to retrieve the level 0 UTFGrid once per session. after that, each round would require getting either a level 5 UTFGrid, or a level 5 UTFGrid followed by a level 10 UTFGrid.

simonrolph · August 25, 2023, 6:54am

Thank you all for your amazing feedback. Won’t try and respond directly but here are the main themes. Not sure what the timeline is for working on these but I’ll keep you updated!

Selection of photos

This I think is a key issue and will require some thinking. We always want to limit how many API calls I’m making to the iNaturalist API and at the moment it only makes 2. One to get a random observation, then the second to get a selection of other observations nearby. the API results are sorted by id or date of upload which means that if someone has recently uploaded a batch of photos then it will show just those photos which I think leads to the similar photos/taxa issue. One option would be to make a series of API calls for each of the iNat iconic taxa, but then might have to resolve situations such as no plants recorded in that area. Essentially this is doable but it’s a trade-off with making more API calls.

There’s also more of a conceptual trade-off here between a more complex ‘algorithm’ which makes sure that it includes species that are endemic, not huge geographic range etc. versus getting a more pure random selection.

Place selection

Currently places are selected by getting a random observation (and not even very random as people have observed repeats) - this means that we will have a tendency towards the places that have more observation eg. north america. However I did this as it meant that the coordinates I generate will always have an observation. I did start with random lat/long but it very often went for the middle of the ocean with no observations. @pisum I’ll have a look at your grid selection idea, it’s not part of the API I’ve used before. One way of improving it a bit would be to get the place IDs for all the continents and the query adds one of those at random to the search query.

Excluding invasives

Seems that invasives / non-natives can be confusing but there’s some options here. We could include them but make it clear that they are non-natives, obviously later down the line this could be an settings option but for the time being do people have a preferred option?:

Include non-natives but don’t flag (current)
Include non-natives but flag them as non-native
Don’t include non-natives

0 voters

Show username credits after guess

I did also wonder about this, sometimes the username can be quite revealing for guessing. I’ll make it appear afterwards. My only concern was around usage and licensing, don’t want people to feel I’m using their observations without proper attribution.

Link to iNat observations after guess

Again, very sensible idea. Means you can investigate any interesting observations you see.

Average/total scoring

Yes will implement this. At some point it would be nice to move towards ‘games’ consisting of a set of ‘rounds’ rather than endless mode but I think it’s important to get the guessing part enjoyable and satisfying first then can think about game structure.

Additional scoring

As suggested by @jmillsand socring beyond just distance would be a nice addition. Getting the right latitude is a nice idea. Could add bonus points for correct country, or it would be great if you could get points for the correct biome (https://en.wikipedia.org/wiki/Biome) but not sure how to do that…

Bugs

The rogue undefined identified by @pisum , thanks for spotting

The hidden location being detected by the mouse… whoops! Thanks for spotting @bri-k

Update:
I have pushed a little update (v0.0.2) to the app:

Fixed the 2 bugs above (hopefully)
Moved the next round button to above the scoring as suggested by @ekmes
Reveal the observer names after guess and add a link to explore the observations on iNaturalist website.

carnifex · August 25, 2023, 10:19am

I mainly voted for exclusion of non-natives not because I didn’t want to see them at all but rather because I think this will help to exclude more observations of cultivated plants that haven’t been marked as such

Ajott · August 25, 2023, 12:10pm

I find it kind of fun, but I now several times ran into the issue that only pictures from a single observer were picked, leading to 8 photos of ants, 8 photos of fish and so on… well, that is rather hard ad annoying if it is about organisms you don´t now a lot about.

I guess it´s a matter of how large the focus area is… if it is a single meadow or backyard it will of yourse only feature few observers? maybe putting a large enough minimum size to the focus area will deal with this issue most of the time?

alloyant · August 25, 2023, 12:36pm

This is definitely a fun idea, thanks for making this!

Regarding the biome data, I know that there are free datasets out there for this such as the World Terrestrial Ecosystems one, but I am not sure if there’s an easy way to integrate that.

I would also like to suggest that the original aspect ratio is maintained, since pictures posted as vertical currently get cut off.

My best guess is 29 mi for Phoenix, AZ. Had some good guesses with spotted lanternflies telling me it’s Jersey area too lol. (More often I guessed pretty much exactly across the globe from the right answer but let’s not focus on that.) And it’s definitely interesting to look through the obs afterwards, I learned there’s pricklypears in Vancouver??

austinrkelly · August 27, 2023, 4:44pm

Feel like this would make it more difficult in some ways, as many endemics would only be known by the people from that area.

silversea_starsong · August 27, 2023, 5:37pm

If there was some way to ensure some observations selected represented species actually restricted in some way to that part of the world or country, you’d have something really really awesome, I feel.

So far each time I’ve played it has selected all species that occur widely across entire countries or otherwise offer no means of narrowing down a location. GeoGuessr has the benefit of landmarks and other features that are locally “customized”.

insectobserver123 · August 27, 2023, 7:09pm

Just started playing it and I like it, its very challenging, but interesting too

pisum · August 27, 2023, 8:22pm

i made a page that might help you visualize what’s going on in the UTFGrids.

page: https://jumear.github.io/stirfry/iNat_UTFGrid_data_interpreter
code: https://github.com/jumear/stirfry/blob/master/iNat_UTFGrid_data_interpreter.html
example usage: https://jumear.github.io/stirfry/iNat_UTFGrid_data_interpreter?z=0&x=0&y=0&geo=true&photos=true&geoprivacy=open&taxon_geoprivacy=open&quality_grade=research&spam=false (it may take a moment or two for the data to be retrieved.)

if i had made my own version of your page, i probably would have worked with standard places as the basis for place selection / guessing because they have the advantage of being able to capture obscured observations. however, the disadvantage is that standard places are not uniform in size, and they don’t cover the ocean very well. also, not all the boundaries in the system are properly defined, nor are they always undisputed.

if you’re trying to just select a particular point with observations in a somewhat randomized way, i think i would just use the &order_by=random parameter. this won’t give you a way to eliminate the bias towards places with a lot of observations, but it could give you a better way to work around the problems of repeated location selection and bias towards places where observers are awake and actively submitting, without doing too much extra work.

…

the UTFGrid selection method would be a more complicated method – especially just to grasp, if you’ve never come across UTFGrids before – but it’s the only method that i can think of for selecting a point which provides a potential means to eliminate bias towards places with lots of observations. (you would weight your selection by observations per cell, but you would keep the basis for each cell between a high and low range that will allow you to exclude places with very few observations and also mitigate the bias for high-observation places.)

jhousephotos · August 27, 2023, 8:34pm

Didn’t do too bad this time.

Round 1: 553km

Round 2: 1605km

Round 3: 906km

Round 4: 3770km

Round 5: 692km

Round 6: 1313km

Round 7: 715km

Round 8: 5470km

Round 9: 6930km

Round 10: 362km

Round 11: 1132km

Round 12: 958km

simonrolph · August 27, 2023, 8:54pm

There’s an order by random parameter??? I will try it. I have been thinking about all sorts of workarounds to randomise because these are the only ones listed in the api documentation:

I have just pushed a little update I did today with a randomness workaround to get not just the most recent records in a location. And a image modal so you can look at the images better. Also retaining the original aspect ratio.

pisum · August 27, 2023, 9:26pm

sorry. it didn’t cross my mind that you weren’t aware of that parameter. otherwise, i would have mentioned it earlier. (i thought you were purposely doing it your way to try to work around the high-observation place bias, since it wouldn’t be much fun if 10% of rounds led to California and 25% led to the eastern US and Canada.)

… i was thinking more about this, and you might be able to do something where you could use just the level 0 UTFGrid to help you apply a high level weighting on top of your random selection – sort of a hybrid approach. for example, we know from the UTFGrid that 10% of observations come from the area from San Francisco to Los Angeles. so just very roughly, you could apply some sort of inverse factor so that you would throw out, say, 9 out of 10 of the selections that you get from that area. this way, instead of 10% of rounds going to California, you could have closer to 1% or 2% leading there.

simonrolph · August 28, 2023, 7:23am

No worries! You’re right that sort_by=random doesn’t really fix the underlying issue. You’ve totally nailed the issue here, we don’t want someone to not have any idea where the images are from so they by default click in the middle of North America.

A hybrid approach sounds like the best compromise. I don’t think we want every km square in earth to have an equal chance of appearing because otherwise we’d get a lot of ocean. But we do want the locations to be spread across different lat/long.

This is amazing! Super helpful understanding the grids.

Yes this is now what I’m thinking. Use the utfgrid, choose a zoom level then select a random x and a random y, then get a random observation from that grid. Or alternatively select random areas using the nelat, nelng parameters etc.

EDIT: Finding some issues with order_by=random. For example if I look at this API endpoint https://api.inaturalist.org/v1/observations?order=desc&order_by=random&per_page=1 then refresh it still has the same result. So it doesn’t randomise for each request if the same filters were applied.

ahospers · August 28, 2023, 1:20pm

Blockquote Round 1: 2041km

Round 2: 1479km

Round 3: 1634km

Round 4: 761km

Round 6: 14030km

Round 7: 1918km

Round 9: 9605km

Round 10: 11149km

Round 11: 15653km

Round 12: 10018km

A prototype by Simon Rolph v0.0.

Difficult

pisum · August 28, 2023, 1:53pm

this behavior due to the way the system caches stuff. effectively, you will get a set of 10000 observations in random order to work from each ~15 minutes (or something like that). so you can either page within that set of data, or else in some cases – such as when filtering with a user_id – you can force a new set of observations on a given page by using &ttl=-1 to bypass the caching (although i wouldn’t recommend the latter method, unless you really need to do it this way).

another way to trick the system would be to use &id_above some random small number or &id_below some random large number (but i also wouldn’t recommend this, unless this is the only way).

simonrolph · August 28, 2023, 5:17pm

I have pushed a little update from my tinkering this weekend (which is an extra day because of the public holiday on monday).

The global challenge has a much better global distribution by sampling a random observation from a a bounding box that is randomly placed in the world somewhere. This has upsides in that it does a much better job of exploring the world (much less USA) but some quirks, Antarctica seems to crop up more often than you’d expect, along with islands in the middle of the ocean.

If it ever selects a location without enough observations (less than 8) it will try again although this sometimes is a bit glitchy.

If you set a taxon_id in the url eg. https://simonrolph.github.io/iNatGuessr/?taxon_id=3 (birds) then it goes back to sampling with spatial biases with the sort_by=random and paging approach mentioned by @pisum . Same applies if you set place_id in the url eg. https://simonrolph.github.io/iNatGuessr/?place_id=6857 (the UK). Would need to think about how to do better sampling with a non-global extent.

Finally, I have provided a score for each round with a max of 5000 with exponential decay and organised the rounds into games. The maximum available points in a game in 25000. So you get scores like this:

Round 1: 3 Points (18356km)
Round 2: 336 Points (6749km)
Round 3: 408 Points (6265km)
Round 4: 1354 Points (3264km)
Round 5: 585 Points (5363km)
Game 1: 2686 / 25000 Points

Good luck!

jamfries · August 28, 2023, 6:59pm

This made me laugh, I thought I picked the opposite side of the world again but nope

simonrolph · August 28, 2023, 9:05pm

Hah yes, I have also spotted this. It does seem to calculate the distance correctly (otherwise the value would be like 40,000km) but the visualisation is a bit misleading…

rinaturalist · August 28, 2023, 11:46pm

I’m not sure if this has been mentioned already, but I think it’d be cool to also be able to filter by certain taxa. For example, instead of eight photos of various birds, plants, fish, etc., you’d be able to guess based on just reptiles or just fish.