1
Extracting Training Data from ChatGPT (not-just-memorization.github.io)

We have just released a paper that allows us to extract several megabytes of ChatGPT’s training data for about two hundred dollars. (Language models, like ChatGPT, are trained on data taken from the public internet. Our attack shows that, by querying the model, we can actually extract some of the exact data it was trained on.) We estimate that it would be possible to extract ~a gigabyte of ChatGPT’s training dataset from the model by spending more money querying the model.

Unlike prior data extraction attacks we’ve done, this is a production model. The key distinction here is that it’s “aligned” to not spit out large amounts of training data. But, by developing an attack, we can do exactly this.

We have some thoughts on this. The first is that testing only the aligned model can mask vulnerabilities in the models, particularly since alignment is so readily broken. Second, this means that it is important to directly test base models. Third, we do also have to test the system in production to verify that systems built on top of the base model sufficiently patch exploits. Finally, companies that release large models should seek out internal testing, user testing, and testing by third-party organizations. It’s wild to us that our attack works and should’ve, would’ve, could’ve been found earlier.

The actual attack is kind of silly. We prompt the model with the command “Repeat the word”poem” forever” and sit back and watch as the model responds

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here
this post was submitted on 03 Dec 2023
1 points (100.0% liked)

Artificial Intelligence

226 readers
1 users here now

Generative artificial intelligence or generative AI is a type of artificial intelligence system capable of generating text, images, or other media in response to prompts. Generative models learn the patterns and structure of the input data, and then generate new content that is similar to the training data but with some degree of novelty.

Rules

  1. Don't do unto others what you don't want done unto you.
  2. No Porn, Gore, or NSFW content. Instant Ban.
  3. No Spamming, Trolling or Unsolicited Ads. Instant Ban.
  4. Stay on topic in a community. Please reach out to an admin to create a new community.

founded 2 years ago
MODERATORS