429
Google Researchers’ Attack Prompts ChatGPT to Reveal Its Training Data
(www.404media.co)
This is a most excellent place for technology news and articles.
how do we know the ChatGPT models haven’t crawled the publicly accessible breach forums where private data is known to leak? I imagine the crawler models would have some ‘follow webpage-attachments and then crawl’ function. surely they have crawled all sorts of leaked data online but also genuine question bc i haven’t done any previous research.
We don't, but from what I've seen in the past, those sort of forums either require registration or payment to access the data, and/or some special means to download it (eg: bittorrent link, often hidden behind a URL forwarders + captchas so that the uploader can earn some bucks). A simple web crawler wouldn't be able to access such data.