Don't automatically update atlases based on observations

Ah, thanks for that info. Yes, it does seem that the atlas data makes this distinction, so auto-collapse should be possible.

To me this topic is still being addressed backwards. It should not be to prevent atlases and checklists getting populated as a result of observations, it should be to ensure that they get cleaned up as data is changed/removed.

Someone still needs to convince me, with data, not a ‘I kind of feel like’ assessment that there are more false positives than there are correct additions as a result of new accurate records.

Especially once you start to talk about level 2 geographies (or even level 1 geographies in places like Europe etc which tend to be much smaller geographically than we are accustomed to in North America), there are still a surprising number of additions being generated for even common species as a result of the inflow of records.

Something like this species is a pretty darn common insect, yet a couple of dozen new counties etc have been added just in the past few weeks of 2019 dragonfly season.

As stated above, manually managing tens or even hundreds of thousands of checklists and atlases is not realistic. We need to rely on some level of automation for these to ever have any kind of site wide use beyond niches and specific species active curators want to monitor.

I’ve personally given up on attempting to curate atlases because it’s impossible to keep up with the current flow of inaccurate information being added to place checklists, and too frustrating to keep fixing the same mistakes (removing the same listed taxa) over and over. (cc: @loarie)

Unless someone starts paying experts to manage the misidentifications/atlases, there will never be enough people to keep up with cleaning up all the wrong data that are being generated.

1 Like

Doesn’t that kind of reinforce what I am saying ? If you turn off the auto updating, you are consigning the management of the atlases and checklists to be forever manual. If you ensure they have proper cleanup behind them, then it benefits from the data corrections, which ideally are needing to be done as it is (and pushes it out to the full body of all identifiers, not just the small number of folks trying to manage the atlases and checklists)

No, for level 0 and level 1 standard places, I would prefer they not be automatically added to atlases.

Corrected misidentifications should be removed from that atlas “review queue”, but it should require a person to confirm the addition of level 0 and 1 standard places to atlases.

2 posts were split to a new topic: Atlas marked=false, but “there are observations not represented by this atlas”?

Does iNat currently have use cases for Atlas data for which we would not want to manually verify them first?

Certainly for taxon splits, they need to be manually verified first.

Taxon map displays? Does map symbology come from Atlases or Checklists? Or both? If from Atlases, do we want new RG observations to immediately be telling the world (or Computer Vision) that a species definitely occurs in a particular place, with the consequent risk of circular confirmations?

What else am I missing?

My point being, I have always thought of Atlases as having specific uses for which manual curation/verification is necessary. Versus checklists, which as @bouteloua says are more of a “wild west” reflection of the raw data. (And for which, I agree with @cmcheatle, there should be built-in reversal mechanisms when data-driven (versus user-driven) checklist additions are no longer supported by the data.)

As for Atlases, if manual verification continues to be necessary for their use cases, then I am with cassi on this one, and would prefer to see Atlas curation tools that make it quick and easy to see potential data-driven atlas changes, with quick options to “Accept All,” “Reject All,” or selectively accept and reject for individual Atlas places.

Such tools could also be valuable for initially populating new atlases, especially if they could also pull from external data sources such as GBIF, BONAP, SEINet, etc., with links to visit and check the source data, etc. I would even vote for auto-creating missing atlases that way, but leaving them inactive until visited and verified by a curator.

1 Like

The reverse is also true, if not updated, do you want the computer vision tool to not suggest appropriate choices, which virtually guarantees incorrect selections ?

When you talk to the staff it seems they want us to focus on atlases, but there is no clear indication why. And still we have no indication (maybe the site devs themselves still dont know) what choice (atlases, checklists, range maps, entered observations etc) will be used to drive the suggested geographic intelligence to be put into the computer vision tool.

Since the only geographic intelligence currently available to Computer Vision is nearby RG observations of the same taxon, I think adding incomplete but verified atlases to that intelligence would be far better than having none at all, and also better than over-complete atlases with false positives.

If atlases do become a source of Computer Vision intelligence, I will be far more motivated to create and maintain them, and far more vocal in requesting efficient curatorial tools to facilitate that.

Once we get a bunch of circularly generated false-positives auto-adding a wrong Atlas place, however, that can become exponentially more difficult to deal with and undo.

1 Like

It’s not a kind of feel like. For plants in North America with an already substantial number of observations on inat I’d estimate at least 50-100 wrong IDs for each correctly identified out of range outlier. (Unless talking about ruderal weeds or things like that) Especially when on the state/province level with non adjacent states. It would indeed be nice to see the data but… I don’t see it as necessary because it’s not even close.

1 Like

The issue / measurement is not number of incorrect ID’s versus number of correctly identified out of range observations.

The relevant measure is incorrectly Id’ed observations versus the number of that would accurately populate an atlas or checklist listing, especially at levels lower than geographic level 1.

The example I pointed to https://www.inaturalist.org/atlases/16916, everyone of those additions is ‘in range’ if all you do is measure range as defined as a state or province. And every one of them is an accurate reflection of the range if you look at more micro levels.

It seems a conservative atlas is way more useful than one that colors vast areas where the species doesn’t occur. At least for plants. A false negative is way less trouble than a false positive.

2 Likes

I fundamentally disagree on this, but I suppose we will have to leave it at that. To me a false negative is a perpetuating problem made worse by the fact that if the species legitimately is there, then not accounting for it will just continually come up as more people observe it.

I live in a country of almost 10 million square kilometers, and a province which is over 1 million. Attempting to use a national or provincial atlas to validate or suggest anything is useless.

There are 900,000 taxon concepts in the database and growing every day. What percentage of those are species or lower I don’t know, but manually populating atlases and checklists for that number of concepts is utterly unrealistic.

Since you can never really prove absense… a false negative is just no data (until someone else finds it) and a false negative on an atlas with RG observations is a call for someone to verify those observations and maybe edit the atlas. Unlike a false negative on the atlas a false positive is bad data.

1 Like

Hopefully manually added locations that are not driven by observations are being done from valid sources.

Both are bad data. Missing data that should be present is equally a data issue to data that should not be there.

That’s … not how it works unless you are doing plant plots or transects when you mark everything you see and if so it only applies to the area within the plot and even then you miss things that are off season or don’t come up every year.

As long as the “Seen Nearby” piece of Computer Vision remains a part of its algorithms, this should help counterbalance temporary incompleteness of atlases once they are also in the mix. (Though they are and would continue to be a source of false positive CV suggestions also.)

Far fewer of these have enough RG observations to be included in Computer Vision, so if that is the use case in question, the problem may be considerably more manageable. If it has to do with place symbology on maps, again some well-designed curatorial tools could go a long way.

One handy curatorial tool I could envision would be, on an observation detail page, in the dropdown next to an ID for which there is community agreement, a choice for “Add to Atlas” available only if one has curator status. Or on map displays on Observation or Taxon detail pages, right-click on a place to add it to the atlas.

I know this doesn’t fully address your concerns. I just have this vision of a false-positive slipping under the radar during a City Nature Challenge event, generating 100 more false positives of the same thing, and then more and more until finally discovered by a curator to be responsible for a false atlas place. We would then either have to contact 100 mostly new and non-returning users, and/or lots of other identifiers, to try to get IDs changed, or have some manual atlas tool allowing a curator to override or “turn off” automatic atlas population for a particular taxon and place.

Either way we come at it, manual intervention will be involved. I would rather see that effort go toward efficient addition of new good data (iNaturalist always being a work in progress, after all), instead of cleaning up after bad data.

2 Likes

You keep misunderstanding what I am saying, or I am not doing a good job of making myself clear. Likely the second.

It is ‘missing data’ because there are observations that are correctly identified whose locations are then not accounted for in the atlas or checklist, because the contents of the atlas or checklist are not getting updated. Not that there is no report of it.

I get that but I am saying since more of those are wrong than right it’s better to be conservative. I guess beyond that there’s not much else to say.

1 Like

I just dont see this as the case, go find me an atlas that has incorrect assignments, and I will find you many more where an atlas has been accurately updated. And as I have repeatedly said, I’m not talking about just geography level 0 or 1, I am talking about the entire geographic scope that atlases and checklists are supposed to account for.