Create filter for certain offensive content

I rescind this request, as I think about it and respond to questions it is becoming clear that it would be nearly impossible to implement this in a way that is neither ineffective nor overbroad, and I would prefer to stick to user flagging, or a system that only notifies curators of potentially offensive content without taking action against it. I am also concerned that an automated system flagging some words and not others (because it is impossible to list every slur in existence) could be misperceived as treating discrimination against some groups more seriously than others.

Platform(s), such as mobile, website, API, other: All

URLs (aka web addresses) of any pages, if relevant:

Description of need: Some sockpuppets will target specific users they have a grudge against by mass posting offensive comments on the target user’s observations. When suspended the harassers just make new accounts, using VPNs, I know of ongoing harassment cases with dozens of sequentially created socks, when one is banned another pops up, posts as much offensive material as possible (perhaps 100 comments on one observation) before being suspended, and then the cycle repeats. Some words that are used by these socks have no place on iNat in any context

Feature request details: Create a filter that flags and automatically hides comments or IDs containing certain words or phrases that have no inoffensive use if coming from new users.

It is important that this only flags content from new users, and doe snto flag users who have been flagged as non-spammers, to avoid automatically hiding moderation discussions on flags in which curators may have reason to quote offensive comments when discussing moderation decisions with each other, and to avoid hiding complaints from victims of harassment along the lines of “they called me [offensive thing]”

Some slurs are so offensive and widely known that they are always referred to by an abbreviation and not typed out even in a moderation discussion, but this is not true for all offensive words.

It is also important to be aware of words with multiple meanings, for example there is an abelist slur when used as a noun that is also a verb for “to slow down”, and a part of a word for some fire suppression agents. And I’m not sure we want to auto flag these uses.

I am against the use of AI moderation, and am not calling for that, what I am proposing is a filter that removes certain pre programmed strings that have only a bigoted, insulting, or sexually offensive meaning.

I also do not want this to flag the comments as spam, it should flag them as inappropriate so that they show up in the regular flag log that curators monitor, reducing the risk that erroneous flags won’t be corrected.

Finally, procedures for this should recommend that an automated flag on offensive content is resolved by curators, and the content manually hidden, so as to avoid a large number of unresolved flags that need no action making it harder to find the flags that do need action.

As I type this it sounds like it could be difficult to implement, and I am open to being convinced that this is not a good idea, but me and @wildskyflower have been wondering why there is not a faster way to stop some of the recent mass posting of the n-word by sequential socks

For what it’s worth Discourse has a “Watched Words” functionality that’s similar in some respects. I haven’t used it here because we’re a chill forum.

I’m shocked and saddened to hear this has been a problem. FWIW I have no concern about using AI/ML for prioritizing content for moderation, so long as there’s a human in the appeals loop – to my mind this isn’t really different from how we use computer vision to leverage expertise.

2 Likes

For a specific example of a context where an otherwise inappropriate word could be appropriate from a seasoned user, if a comment is posted in a language that not many curators speak it may be helpful for a user who actually speaks that language to provide an idiomatic translation in a flag (google translate often doesn’t translate the level of offensiveness effectively, in my experience).

Overly simple string-search based comment filters have long been known to have problems especially on a platform where people may post in many languages (a problem ikea is sometimes known for having). Inaturalist has been fortunate to mostly not have had terribly serious problems like this (as Tony says we’re mostly ‘chill’), and it is important to not throw the baby out with the bath water. I think overall we have the human moderation capacity to handle most issues like this manually if they can be just flagged automatically (and maybe given temporary suspensions to stem the tide for extremely new accounts that might be sockpuppets).

3 Likes

I’m not aware of this issue on the forum, it’s just iNat itself where there are sequential socks causing a problem, its not widespread but the observations it does happen on can end up with like 100 racist or sexual comments before someone intervenes

As a curator I often see comments wrongly removed, and users wrongly suspended, by the spam filter, so I would strongly oppose any expansion of AI moderation. AI alerting curators of the possibility that a comment breaks the rules would be OK (though that doesn’t really address the issue I raised here), but I do not want to see AI hiding comments or suspending users

What I am suggesting is a string based filter for new users using pre-specified phrases, not AI

2 Likes

Edited my post to say concretely that I haven’t used it here on the forum.

I’ll also say that on iNat itself, the kind of laguage that this feature would potentially be used for is quite rare, in my experience.

1 Like

I agree, but this might cause some confusion.
https://forum.inaturalist.org/t/gonosimulism-an-alternative-to-hermaphrodite-in-biology/56886

How would this cause confusion?

If someone is talking about hermaphrodites, and it’s hidden because the word is sometimes used as a slur.

This is why I am proposing filtering words with no inoffensive use, such as the n-word, and not every word that has an insulting use

4 Likes

Ah, I see now.

I’m not opposed, but I have some concerns with the proposal as it is in the OP:

  • determining a list of what words to include would be really difficult on the staff end, especially once we get out of the relatively narrow confines of English in the US, which is what we’re mostly familiar with. And even then it’s difficult. For other languages, or English in other locations, who makes those decisions?

  • I think I’d be OK with flagging or perhaps putting “watched words” on some sort of watch list, but not automatically hiding them. Currently only staff can unhide something, and I think it’s also good for hiding to remain a manual function.

Even if it was only done in English, a majority of the content on the site is in English, however there could be issues with other languages where a different word has the same spelling

I was thinking something like the spam filter where stuff is hidden but unhidable by curators, I certainly don’t want unhiding stuff from the filter limited to staff

I’m honestly unsure how much good this does, trolls often good at evading filters, and changing even one letter would evade the filter

I rescind this request (see edits to OP)

There are some attempts to make different lists for a variety of languages (this is probably not the only one but it came up quickly): https://github.com/LDNOOBW/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words

The problem is even in the few hundred words in the english list a large amount of the words on that list obviously do need to be allowed; a not insignificant amount are in common names (for better or for worse), and some of the more anatomical ones are simply routine in a scientific discourse. Honestly for the english list there are probably only a handful that have no reasonable non-offensive use and common enough to actually care about.

Also, at least in the case that was the inciting incident for this post, often the blatantly offensive content only starts at the 15th or 16th comment (perhaps a deliberate strategy to evade possible new user restrictions experienced on other sites).

Perhaps a better technical mitigation would be to simply flag brand new accounts that are commenting very fast, or that have a very high ratio of comments to IDs or observations.

I do know of at least one or two legitimate users that do use the site mainly to post comments, but I doubt they do so particularly fast, and if it is just an automated flag for heightened attention with no automated hiding or suspension it could simply be resolved if such cases come up.

2 Likes

@tiwane Doesn’t discourse also give a warning to new accounts that post fast or multiple times in a row with no intervening comments from other users? I think I remember seeing such a warning when I started on the forum at some point, so I think the threshold is pretty low for new accounts.

1 Like

rescinded by OP