22

for ML engineers: why can't you simply exclude the word "fuck"? (slrpnk.net)

submitted 1 year ago* (last edited 1 year ago) by FinallyDebunked@slrpnk.net to c/asklemmy@lemmy.world

16 comments fedilink hide all child comments

So, I've heard that ML manipulates tokens and specifically for the English corpora they take place of words. If we want model to be polite and not to speak uncomfortable language we can remove certain words from the internal array where all tokens and their associative data are stored, for example "fuck".

you are viewing a single comment's thread
view the rest of the comments

[-] swordsmanluke@programming.dev 19 points 1 year ago* (last edited 1 year ago)

As others have mentioned, it's not quite that simple.

For starters, you can absolutely remove the word "fuck" from all the training data. Now it's literally impossible for the AI to "know" the word. But what do you do with the training data? Do you replace "fuck" with a different token? "****" perhaps? Or do you just drop the data entirely?

Giving "offense" is much more complex than just a single word. See, if we just replace the token, the AI may still decide that "Go **** yourself" is a perfectly valid response to a query. On the other hand, if you drop all instances of "fuck"from the data, your AI will just learn offensive euphemisms instead: "You can shove your request where the sun don't shine"

Worse, there are plenty of sexual / offensive phrases that are built up from perfectly innocuous tokens. "Prone bone", for instance.

The goal with these (and really almost all) AI models is for them to be "helpful, honest, and harmless". Simply alerting or replacing a single token (or even combination of tokens) doesn't really help, because the AI is modeling concepts, not just individual words.

All of this to say that the problem being solved is not to stop an AI from saying "fuck" - it's to build an AI that doesn't want to.

this post was submitted on 25 Sep 2023

22 points (82.4% liked)

Ask Lemmy

26903 readers

1914 users here now

A Fediverse community for open-ended, thought provoking questions

Please don't post about US Politics. If you need to do this, try !politicaldiscussion@lemmy.world

Rules: (interactive)

1) Be nice and; have fun

Doxxing, trolling, sealioning, racism, and toxicity are not welcomed in AskLemmy. Remember what your mother said: if you can't say something nice, don't say anything at all. In addition, the site-wide Lemmy.world terms of service also apply here. Please familiarize yourself with them

2) All posts must end with a '?'

This is sort of like Jeopardy. Please phrase all post titles in the form of a proper question ending with ?

3) No spam

Please do not flood the community with nonsense. Actual suspected spammers will be banned on site. No astroturfing.

4) NSFW is okay, within reason

Just remember to tag posts with either a content warning or a [NSFW] tag. Overtly sexual posts are not allowed, please direct them to either !asklemmyafterdark@lemmy.world or !asklemmynsfw@lemmynsfw.com. NSFW comments should be restricted to posts tagged [NSFW].

5) This is not a support community.

It is not a place for 'how do I?', type questions. If you have any questions regarding the site itself or would like to report a community, please direct them to Lemmy.world Support or email info@lemmy.world. For other questions check our partnered communities list, or use the search function.

Reminder: The terms of service apply here too.

Partnered Communities:

No Stupid Questions

You Should Know

Logo design credit goes to: tubbadu

founded 1 year ago

MODERATORS

candyman337@lemmy.world

Bluetreefrog@lemmy.world

TheSaneWriter@lemm.ee

TheSaneWriter@lemmy.thesanewriter.com

candyman337@sh.itjust.works

Asudox@lemmy.world

lemmy_bot@lemmy.world

beefbaby182@lemmy.world

asudox@discuss.tchncs.de