Evaluate geographic data for iNat's Suggested ID

I’ve no idea what data is analyzed by iNat when it comes up with its suggested ID, but geography seems low on the list. This is arguably single greatest cause of misidentifications on this site. It’s common to see endemic species from Region A suggested for an observation in Region B. Sometimes, this is quite egregious, like when a European beetle is suggest for an observation in Kansas or Botswana. These seem like easily avoided mistakes if the algorithm would simply take into account the existing data on iNat.

I’ll provide a particularly common example of this phenomenon. Amphiprion frenatus, the Tomato Clownfish, is found north of Wallace’s Line (i.e. the Philippines, the Gulf of Thailand, and north into the Ryukyu Islands). It is replaced anywhere south of this by A. melanopus, a species that looks more or less the same, save for a darker pelvic fin. And this gets replaced by A. barberi in Fiji. Now, if you were to observe either A. barberi in Fiji or A. frenatus in, say, the Philippines, iNat is going to suggest that your fish is A. melanopus, despite there being no data points of this species in either Fiji or the Philippines. I’v corrected this error dozens of times.

Now, I’m not sure how one would code this into the algorithm. I imagine it would be impossible to entirely eliminate this problem, particularly in areas near the limits of the biogeographic ranges, but surely it could be greatly reduced by incorporating some amount of geographic data into the suggested IDs.

Note: there was a previous post about this which I closed many moons ago. Approving this one to keep a conversation going, and probably because we’ll be doing away with the voting system soon.

3 Likes

Uh what ? Was this previously announced and I missed it ?

1 Like

I brought it up here: https://forum.inaturalist.org/t/feature-requests-how-to-best-handle-the-voting-system/1952/33?u=tiwane Still not sure about the poll part.

1 Like

on the forum not the iNat observation ID :)

1 Like

You wrote back in March that iNat had a new “spec” to address this. Is there any update on the progress of that and how it is being addressed?

2 Likes

I totally agree that range should be considered for computer vision and the resulting name suggestions. I have no idea how to do that, but geography is one of the most important characters I take into consideration for identifying things. It really matters.

4 Likes

But range is partially taken into account - see here:

image

(This is an interesting one as the Orchid is mimicking the Adenandra and the AI has picked up the model, but not the mimic https://www.inaturalist.org/observations/33861667)
((but note: all the ones not marked “Seen nearby” are utterly irrelevant and do not occur in Africa))

I dont know why your example does not have the “Seen nearby” option.
Are you using the app (versus the website)?
Are you from an area without any “nearby” observations?
Are you from an area with so much data that the “nearby” option is overwhelmed?
Is your taxon poorly atlassed and so was not used for the training of the AI?

Would be nice to know more.

I agree that the suggestion of species from the other side of the globe is frustrating, but I think in (rare) cases it can be ‘useful’, e.g. tracking the spread of invasives whether due to human-mediated transport or expansion via climate change.

1 Like

Thanks, I clearly did not interpret that the way it was meant to be.

2 Likes

I agree wholeheartedly that geography should have greater weight for the suggested identifications. I routinely see suggestions (especially for insects, but also for more commonly identified groups such as birds) where the suggested ID is correct (usually to family or genus), but the suggested species-level identifications are all over the place, frequently of species from different continents. I strongly suspect that many naive observers suggest the first species level ID that they think looks right, with no knowledge of where that species is found, resulting in very out-of-range records. And yes, I’m aware of the “seen nearby” flag, but that doesn’t always seem to work well in my experience. At least, it seems like the “nearby” records need to be quite close, geographically, for the flag to show up. Some sort of way to down-weight geographically distant species from the suggested IDs would be awesome, and would, I think, greatly improve initial identifications.

2 Likes