Don't automatically update atlases based on observations

Some computer vision suggestions (and good old human error) result in many erroneous research grade observations from far outside the known range of the species. These observations then seem to automatically update the associated atlas as an “expansion” of the known range.

Can we make it so that atlases are not automatically updated by observations? There could instead be section that allows manual approval of each new potential range expansion area. (cc: @loarie)

2 Likes

I could be mistaken but from what I was told, I thought Computer Vision solely suggested species based on the submitted observation’s proximity to other observations, not through atlases. (e.g. the species that have been “Seen Nearby” are not suggested because of their inclusion in an place’s checklist.)

I guess 2 questions come up on this:

  • why limit this to atlases, and not include checklists in the issue ?
  • which is less work, finding out of range records and correcting them and/or correcting the checklist and atlas, or manually trying to maintain all the additions that take place as new records come in ?

I’m inclinded to think that correcting is actually less work than doing the manual additions.

I’d also really like a good validation of exactly what triggers both the addition, and perhaps even more importantly removal of a taxa from an atlas/checklist.

Atlases are not currently used in the computer vision suggestions, you can verify this by finding a record for a species not covered by an atlas (it should not be hard, there are only about 25,000 atlases created versus hundreds of thousands of species) and the computer vision suggestion will still work.

Nor does it use checklists, as can be validated by the temporal component of the computer vision ( the +/- 45 day piece) which is not really capturable via checklists.

The bigger question / issue is if or when the site implements the discussed geographic intelligence integration into the computer vision, what source will they use ? And whatever that source is, ideally should get the primary focus for updating / validation etc.

1 Like

There’s currently no way to actually find all the mistakes. On the atlas page, only a handful of the recent checklist additions are displayed.

As far as atlases vs. checklists, it’s because once an atlas is active, that is an indication that the range has been curated. Checklists are more of a wild west. If an atlas is active, it should be an indication that the range is somewhat trustworthy. Right now it’s not at all.

2 Likes

Well then it is a circular issue, it either becomes inaccurate because of adding something that is misidentified and not actually found there, or it becomes inaccurate because of not automatically adding something that is legitimately there and correctly identified.

Which is the bigger problem ?

I would say the former is a far bigger problem.

1 Like

I guess there is not any ‘right’ answer to that, to my mind it is the opposite. having to manually update every atlas and checklist to find new appropriate additions seems a much larger task than correcting the false positives, especially if i am confident that the task works properly if I fix the identification and the removal from the atlas/checklist if appropriate is automated.

I make and clean a lot of atlases. I’m pretty sure that:

  1. for a taxon with an extant atlas, the addition of a 2-ID RG observation well outside this range will result in an automatic expansion of that atlas (https://www.inaturalist.org/atlases/19726 Magnolia Grandiflora in New Mexico, Ukraine, Greece)
  2. further, that auto-expansion does not contract / vanish when the observation is marked to a higher taxon (coarseified) or marked planted. Atlas expansions appear to be sticky until I remove those places.

Item 1 is a terrible bug I think. Most things don’t even have atlases, so why have code to expand them? Perhaps these are cases of people adding taxa to checklists, if so, that should be blocked once an atlas springs into existence.

Most of my atlases carry a dual citation to the POWO range and the BONAP range, so I really don’t think new AI observations nor people adding checklists should grow them. Atlases should be grown by an atlas clearer when they see an out-of-range observation, noting that the RG rating is well deserved (wild, correct ID, location accurate, …) and then adds the location to unmark the atlas.

P.S. why aren’t created atlases “counted” in the /users page like flags cleared or taxa maintained. its a lot more work!

4 Likes

Looking at what is actually occurring on iNaturalist in the vast majority of cases, the two options are closer to:

  1. patently false information about “range expansions” vs.
  2. slightly outdated ranges, based on the best available information at the time last curated

Respectfully, you are looking at this from the lens of a very significant minority of observations, the ones that are legitimately incorrectly identified, achieve research grade and result in a ‘range’ expansion.

You don’t solve this problem by turning off functionality that works for the far larger set of observations that are correctly identified. You solve the ‘problem’ by ensuring that when the incorrectly identified records are dealt with it cleans up after itself.

Mainually maintaining tens or hundreds of thousands of atlases or checklists is an unattainable process.

I spend a lot of time looking at out of range observations and for plants at least the vast majority are wrong including research grade ones. And part of the point of the atlas is to highlight these out of range occurences to be checked and either fixed or added to the atlas. So I definitely agree that a RG observation should not expand the atlas especially if a corrected ID doesn’t shrink it back down.

4 Likes

I should clarify that I’m referring primarily to “range expansions” to additional Level 0 (countries) and Level 1 (states, provinces) Standard Places.

I’m ambivalent about what should be done about automatic additions to new Level 2 Standard Places (counties and similar districts) that fall within Level 0/1 places that were already manually added/confirmed as appropriately included in the atlas.

1 Like

I’ve noticed pinus ponderosa or Ponderosa Pine showing as research grade in places outside of there range that are obviously cultivated or not even ponderosa to begin with
My last count was around 200 plus observations
I’ve tried to go in and correct the status on these
With other coming in and out voting me on them
The range map on these are definitely out of sync with reality
Is there any way to fix these observations
Explaining to people on here that it not a ponderosa or that it captive seems to set them off and I’ve given up

I am puzzled by this opinion. Generally, when I’m setting up atlases (for plants), I’m going to a major flora (Flora of North America, Flora of China, etc.) and using that to set the distribution of the species for geography Level 0 and Level 1; I don’t need individual observations to determine that distribution. The relevant comparison to misidentified observations outside the published range is not “the larger set of observations that are correctly identified”; it is correctly identified observations falling outside the published range, which for Level 0 and 1 are much, much less common than those that are misidentified, or even misidentified and research-grade.

Level 2 is another story. I’ve seen people create county-level maps for plant species from BONAP and CalFlora and so on, but range extensions at this level are much more common, and letting them auto-add seems less problematic.

Maybe this is very different for animals and a lot of range maps are being assembled de novo from observations, but for plants, it is absolutely less work to “maintain all the additions…as new records come in” because very, very few of those are legitimate range extensions.

2 Likes

My reply was done before Cassi clarified that she was specifically talking about the large geographic level 0 and 1 areas. Yes there are certainly fewer cases of those. I’m more commenting on smaller geographic areas, in particular with regards to the context of what is ever going to be used when/if the promised geographic intelligence for the computer vision is implemented.

For those, national or provincial/state checklists are effectively useless in many cases. It’s 2,400 km to the western border of the province I live in from my home, saying something is ‘found’ in my home province when it is all the way out there is useless.

Lower level data management is just as important, heck that’s why the atlases have explosion ability down to such fine scale geographies, the expectation is that it get used and managed.

Ah! I’m sorry I misunderstood, and I apologize for being cross–I find the manually set atlases helpful in pushing back on misidentified observations. You’re quite right about the finer scales: BONAP is an incredible labor of love, but crowd-sourcing is really necessary for Level 2 and finer mapping.

I just came across this discussion, so my apologies for being late to the party.

Firstly, I was a little surprised to learn that atlases have this auto-expansion capability, because my understanding from reading the write-up was that they are supposed to provide a control for out-of-range observations. The bar for RG is not especially high (2 IDs) and that has a lot of benefits, but it seems troubling that spurious observations that get a confirming ID would automatically expand the atlas for that taxon. It seems a significant bug for that expansion to “stick” even if the ID is later corrected.

I see that several people have suggested adding an auto-contract capability to address the bug I just mentioned. But this can’t be as simple as just removing locations that don’t have RG observations, because that negates one benefit of manually creating atlases which is that we can define a set of documented locations even if there aren’t yet RG iNat observations for some of them. It seems that iNat would need to track which locations were auto-added and only auto-contract those ones.

On the topic of what level of detail is appropriate for auto-expansion, I feel this is a continuum and this may differ greatly among different taxa. I have been adding atlases for some monocot plants in California. Many mistaken RG IDs are for similar species in California counties beyond the documented range of the species being selected. I definitely want to know when people are finding these species in new Level 2 locations, but I’m not sure that always auto-expanding the atlas boundary is the right response. In contrast, if these species show up in Maryland or the Netherlands, it’s more likely someone will flag them as cultivated before they reach RG.

I wonder if there could be a per-atlas setting that governed auto-expansion behavior, so that curators could tweak how community IDs interact with curated content. For some sparsely distributed taxon, it might be fine for iNat to auto-expand at Level 1 or even Level 0. For an island endemic, even Level 2 auto-expansion seems like a net negative.

2 Likes

I’m pretty sure atlases track/differentiate between additions done by the result of an added observation vs. those manually added. The list of additions to an atlas has a login field, which is presumably populated if manually added, and is blank if observation added.

See https://www.inaturalist.org/atlases/1786 as an example. The auto-cleanup would in theory only apply to additions which have no login associated to their data.

Ah, thanks for that info. Yes, it does seem that the atlas data makes this distinction, so auto-collapse should be possible.