Why does the AI struggle so much with geography?

Range based on observations for thousands of species would be wrong/incomplete. Seen nearby as I remember is not a very far distance that is used so I don’t know if that would help, we need real ranges, not circles.

3 Likes

We have atlases that are curated manually and ranges created by observations, it’s obvious not many species have real full range maps, if AI now would use them it wouldn’t help at all for places without dense population of iNatters.

4 Likes

Some species get misidentified by the CV on a daily basis. For example the CV identifies Spilosoma lubricipeda in North America on a daily basis and it gets corrected to Estigmene acrea or similar species regularly. I wonder if the CV could learn from that.

1 Like

That photo shows 3 of the (at least) 4 different ways distribution data is stored on the site:

  • pink which is a formal range map (an actual KML or equivalent) detailing range info
  • green which are locations on which that species is found on the checklist
  • dots which are iNat observations
  • it does not show if there is an atlas defined for the species

A major problem is the data is scattered across all 4 of these tools. There is no standardization of where the info is entered at all.

It is easy to say well just use the observations themselves, but then range becomes a circular datapoint, all it takes is 1 wrong record, and the range data is now messed up. To say nothing of how to deal with legitimately correct outliers etc, as well as being biased in favour of areas with larger numbers of observers.

Take the 2 datapoints on that map in Oregon, which I will assume are properly identified. They are outside the pink range map, and relatively isolated from the bulk of records. What should happen when they are loaded and the AI run against it?

  • because the visual match is high, suggest the species, in which case you cycle right back to the original question in this thread, why are out of range suggestions made?
  • ignore the visual match and dont suggest it, just suggest the closest lookalikes in range, in which case the questions/complaints will be why when this is clearly x is that not suggested, thus users are picking wrong things from a faulty suggestion list.

It is both a technical and a design question, because it is technically difficult to do properly, it is not in the design. If it were easy, it would be, it’s not like considering geography is something that slipped the minds of the development team.

6 Likes

Add it here please https://forum.inaturalist.org/t/computer-vision-clean-up-wiki/7281
Though maybe it’s a revenge for North American species we get in suggestions.)

2 Likes

Something I suggested in the feature request thread is that there should be a popup warning if a user selects a taxon suggestion that has no observations from within 500 km of their location. Something like “This species has not been observed within 500 km of your location. Are you sure?”

Geography should definitely be weighted higher in auto-ID suggestions. But a big part of the problem is user error. Users with very little experience might be picking the auto-ID suggestion which looks the most similar to them, even if it’s not the best computer vision match.

9 Likes

I think one way to address this, at least in the interim (probably being technically simpler to implement), would be to offer different scores for DV and geography.

That way, if more than one species exists in close geographic proximity, then the geographic probability scores for each species would be similarly high. I do like showing a distance to the nearest confirmed observation of a species as part of that. Showing a CV score, as well, can help in cases where you get observations that are geographic outliers and therefore rank low on the geography score. Even seasonal likelihood would be a relevant metric to see for species that exhibit seasonality (either through migration, growth patterns, or activity cycles).

As far as the order in which the options appear in the list, ideally the top item would have both the highest CV score and the highest geography score. But farther down the list, yeah, choosing how much to weight CV vs. geography, especially in the case of lower quality images, is difficult. At which point does a higher geography score begin to outweigh the CV score? Especially when we’re talking about poor quality images?

I do favor a better indication of certainty than the AI currently shows. Green for high degree of certainty, red for low. Also, I would like to see the taxonomic level with the highest degree of certainty, regardless of what it is. Not limited to genus.

Isn’t that always going to be plants or birds or spiders etc ? I assume you have to cut it off somewhere in the hierarchy.

Also, won’t the geography score pretty much always be a binary result ? It either is or is not recorded (or observed or whatever means checklist etc) within the range you define, or it is not. Unless the thinking is that seen 1km away is more meaningful and thus higher scored than seen 2km away, and then again than 5km away etc.

3 Likes

That red and green exists if you want it
https://forum.inaturalist.org/t/computer-vision-should-tell-us-how-sure-it-is-of-its-suggestions/1230/44?u=dianastuder

1 Like

I think it’s dishonest to mark somebody’s observation as ‘no evidence of organism’ though you know the opposite is true. Just give your vote and explanation and move on. iNat is supposed a democratic system. Bullying the observer into an ID seems is petty and spiteful.

2 Likes

Inspiring people to think before reacting is certainly a worthy goal but we have to be aware of the limitations of AI. It is better than humans at some things but far worse at others. This means that information like scores for degree of certainty can be very misleading and those who trust it will still click on the one with the highest score.

1 Like

I don’t see that @joe_fish ever suggested doing so in this thread. Did you mean to reply to a different post?

2 Likes

He voted that way in the sample observation he linked to.

A vote of yes, the ID still can be improved would have been a more appropriate choice given the site design.

3 Likes

Ah, I see.

In that case, yes, I agree with you & @exekutive that marking the DQA for can’t be improved is the way to go (keeps observations like the example from erroneously going to RG, even if the user never ops into community ID)

Just to respond to that, there are exceptionally few users who ID this group, and this “opted-out of community ID” identification from the user is showing up on the range map for Oulactis, which is how I noticed it in the first place. Which no doubt means that future AI identifications from this region would mistakenly add the “Seen Nearby” tag (or does that only apply for research grade?). You’re all welcome to add identifications to correct the situation. How many of you clicked on it without adding an ID? That’s the trouble. Nobody IDs this group, so mistakes like this get stuck. So now I’m stuck resorting to workarounds to make these Casual to remove them from the maps. This is a terrible system that the administrators need to figure out a better solution for, other than “tag more people”.

Personally I feel the suggestion list should be topped with just “Plant” or “Spider” if the other suggestions have low enough confidence.

4 Likes

true, you’d have to do something about the cutoff. what’s the certainty cutoff for recommending a genus under the current system? Use that same cutoff for all taxonomic levels. The question could then be, “what is the highest taxonomic level that reaches the specified level of certainty?”

As for geography score, yes, I’m saying that distance would be the reported metric, not a binary “presence/absence within whatever defined area”. And for some species, distance would be less important, but rather elevation. But at least getting started with some basic biogeographic criteria could hopefully lead to enough improvements that other relevant ones could be added over time.

1 Like

from what I understand, the CV is trained on verifiable observations, not on RG observations only or even all observations. You could just mark the DQA “as good as can be”, but if the observer put an ID at genus or lower, and 2/3 of the community agree with that, then it will be RG (and if it is RG, by definition, will also be verifiable).

There’s not a DQA attribute that corresponds to opting out of community ID, but unless this genus is full of Geralds, you can turn the community ID to the correct taxon by leaving comments to explain why you don’t believe the opted-out observer’s ID is correct.

Thank you for applying your expertise to make sure the information on iNat is accurate.

2 Likes

I just want to be clear here, please don’t intentionally add an incorrect vote to the DQA.

4 Likes

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.