overview for Hackworth

What California governor Newsom said after an Alabama senator called LA ‘a third world country’ in c/news@lemmy.world

[–] Hackworth@sh.itjust.works 2 points 18 hours ago

Hmm. Are intrusive thoughts memes struggling to be born? Not in this case, I hope.

ChatGPT will avoid being shut down in some life-threatening scenarios, former OpenAI researcher claims in c/technology@lemmy.world

[–] Hackworth@sh.itjust.works 9 points 19 hours ago

That was in Anthropic's system card for Claude 4, and the headlines/articles largely missed the point. Regarding the blackmail scenario, the paper even says:

... these extreme actions were rare and difficult to elicit, while nonetheless being more common than in earlier models. They are also consistently legible to us, with the model nearly always describing its actions overtly and making no attempt to hide them. These behaviors do not appear to reflect a tendency that is present in ordinary contexts.

They're testing alignment hacking and jail-breaking tactics in general to see how the models respond. But the greater concern is that a model will understand as part of the context that it is being tested and behave differently in testing than in deployment. This has already been an issue.

In the initial implementations of reasoning models, if an LLM was penalized directly for this kind of misaligned generation in its "scratch pad," it would not alter its misaligned response - rather it would simply omit the misaligned generation from the scratch pad. In other words, the model's actions were no longer consistently legible.

When we microwave in our house we say we're going to "zap it". What do you say? in c/asklemmy@lemmy.ml

[–] Hackworth@sh.itjust.works 94 points 20 hours ago (7 children)

Nuke it.

ChatGPT will avoid being shut down in some life-threatening scenarios, former OpenAI researcher claims in c/technology@lemmy.world

[–] Hackworth@sh.itjust.works 3 points 20 hours ago

Activating AI Safety Level 3 Protections

Remember in c/aboringdystopia@lemmy.world

[–] Hackworth@sh.itjust.works 3 points 1 day ago

Remember in c/aboringdystopia@lemmy.world

[–] Hackworth@sh.itjust.works 7 points 1 day ago (2 children)

I'm lucky that conversational AI is being developed as I'm middle aged, cause younger me absolutely would have fallen into an intractable state of delusion. Much like advertising, even the absurd cheese has an effect with extended exposure. And below the "You've hit on something uniquely insightful that could change the world!" shtick there is already a subtler form of reinforcement and enabling. This puts me in an odd place, because I use AI productively on a daily basis. And I still see it as one of the few technologies that could actually help us dig ourselves out of the enormous hole we've dug. But I suspect we'll just use it to dig a deeper hole at a swifter pace.

Trump preparing large-scale cancellation of federal funding for California, sources say in c/news@lemmy.world

[–] Hackworth@sh.itjust.works 3 points 6 days ago (2 children)

Is that functionally just seceeding?