Neuronpedia - AI Safety Game — LessWrong (www.lesswrong.com)

submitted 1 year ago by seasonone@opidea.xyz to c/technology@lemmy.world

4 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] TechLich@lemmy.world 2 points 1 year ago

I think the idea is that there are potentially alignment issues in LLMs because it's not clear what concepts map to what activations. That makes it difficult to see what they're really "thinking" about when they generate text. Eg. if they're being misleading or are incorrectly associating concepts that shouldn't be connected etc.

The idea here is to use some mechanistic interpretability stuff to see what text activates what neurons in an LLM and then crowd source the meanings behind that and see if that's something you could use to look up some context from an ai. Sort of trying to make a "Wikipedia of AI mind reading"

Dunno how practical it is or how effective that approach is but it's an interesting idea.

this post was submitted on 27 Jul 2023

16 points (76.7% liked)

Technology

59312 readers

5261 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS