Why there is No Comprehensive Database Cleaning

This is also another problem, a lot of dublicate observations. And there is no option to flag them as dublicate.

If I recall correctly, for the duplicate observations the staff decided that the cost of just continuing to host them was less than the cost of doing anything about them. They used to let them be flagged, but they stopped having us flag them; there are still over 5,000 unresolved duplicate observation flags. But having all those in the queue of flags just obscures more important flags from curator attention.

2 Likes

https://forum.inaturalist.org/t/duplicate-prevention-notify-observers-if-their-image-checksums-match-others-on-the-site/258/53
95 votes

That was March 2019 …

5 Likes

Now that must be some thorough review.

1 Like

I don’t know if there is an iNaturalist observation, but having grown up in Rhode Island, I have to love the Big Blue bug.

1 Like

Oooo, yes! That’s great!

i’m guessing this is probably intentional.

some portion of the observations with images hidden / removed for copyright reasons includes cases where people make valid observations of organisms at a particular location and date but then make the mistake of adding a copied photo as evidence of the organism. these really should have been valid casual observations with no photo, and theoretically the observer could just go back and remove the photo(s), but there’s no standard process for asking observers to do this.

since there’s no easy way to figure out which such observations have valid taxa + location + date, i think it’s fine that all of these observations get included in search results.

the default search results exclude observations marked as spam. compare:

others can correct me if i’m wrong, but i think observations that get flagged more more malicious reasons may eventually get removed from the system, but there’s a lag between the time a curator hides an observation and when a staff member gets around to deleting the observation… or something like that.

right. managing performance, storage, etc. by trying to cull through some tiny percent of potentially bad observations is a super inefficient way to do this. there are so many better ways to address concerns related to performance, storage, etc.

3 Likes

Can non-curators even see the content of observations marked as spam at all (even if that is a javascript-imposed limitation)? To curators, spam observations have a banner that says

This has been flagged as spam.This has been flagged as spam and is no longer publicly visible. You can see it because you created it, or you are a site curator. If you think this is a mistake, please contact us.

At least when logged out, spam observations give a 403 forbidden.

So the staff certainly could chose to delete all of that to save money with no site impact to regular users, but they choose not to. So it really must not cost much, or they don’t want to for whatever reason.

Usually, staff deletes any content by confirmed sockpuppets accounts, sometimes other things too. Definitely a lag time if they do that, and more about mitigating community impacts than disk space cost or anything like that.

1 Like

i think it’s probably more that it would cost more money to do a proper review before deleting than to just leave it all there.

(i think the reason you don’t delete observations is because curators can make mistakes, and not deleting provides staff an easy option to undo curatorial mistakes.)

it’s there for anyone, but most people won’t be able to get to these through the normal interfaces.

2 Likes

This is how I would expect such observations to be treated - as casual without evidence. If that’s not the case it would be a good feature request.

It might be worthwhile excluding casuals from this project - that would help part of this issue.

2 Likes

this is another case it would be good to be able to includ cultivated individuals but exclude other casuals. Which speaks again to the way that captive/cultivated observations are treated, especially cultivated plants. While it’s true they shouldn’t be intermingled with wild observations, they shouldn’t be mixed in with copyright infringement and bad data either. I would want them in some projects, such as projects that track invasive plants that are also in landscaping, pollinator habitat projects, habitat restoration projects where the initial plants are planted, etc.

7 Likes

I feel like ā€˜human’ observations can also be useful when it comes to things like showcasing polluted environments because of human activity (oil spills, litter, sewage, etc), especially if it has a large impact on the organisms dealing with it

1 Like

A small percentage of the human observations are of iNatters and other naturalists who are no longer with us (deceased) and serve as kind of a memorial to them.

2 Likes

Yeah, as @wildskyflower said we delete content created by sockpuppets or large trolling efforts, as well as stuff like pornography.

Sorry, that bug got lost in the shuffle. FWIW I forced a reindex on some of the example observations you listed there and that removed them from the search results, so they just don’t seem to have been reindexed properly when they were flagged. Looking at our test server now, I think it’s working as intended.

Probably a bit of both. I’ll also note that it’s not just curators who can make mistakes or who have the power to make something casual or flag media as copyright infringement. Anyone could flag things for copyright infringement and if that’s not caught, the observations would then be deleted if iNat had some sort of auto cleanup.

9 Likes

Did you see this project? https://www.inaturalist.org/observations?place_id=97394&project_id=bitter-about-litter

2 Likes

No idea it even existed! Thank you!

How about hiding copyrighted, media hidden etc observations? At least, we need a filter for hiding copyrighted observation.

We already hide the copyrighted photo. I don’t think the observation per se can be copyrighted except by the observer. Sometimes these are observations where the observer wanted to record the presence of an organism at a given place and time and didn’t have a good photo, so used one he shouldn’t have used. The problem isn’t with the observation as a whole.

2 Likes

You’ve used the term ā€œhidingā€ several times but it isn’t clear what that word means in terms of the the functionality you are asking for. There is a hiding functionality on iNat already, but it doesn’t seem to be the same thing that you are proposing. It would help other users to understand your position if you could provide more details about the specifics of what you think would be helpful.