this post was submitted on 31 Aug 2023

563 points (98.3% liked)

Technology

75074 readers

2676 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

563

A.I.’s un-learning problem: Researchers say it’s virtually impossible to make an A.I. model ‘forget’ the things it learns from private user data (finance.yahoo.com)

submitted 2 years ago by assassin_aragorn@lemmy.world to c/technology@lemmy.world

201 comments fedilink hide all child comments

I'm rather curious to see how the EU's privacy laws are going to handle this.

(Original article is from Fortune, but Yahoo Finance doesn't have a paywall)

(page 2) 50 comments

sorted by: hot top controversial new old

[–] norawibb@sh.itjust.works 6 points 2 years ago

"virtually" impossible. hehehe

[–] Viking_Hippie@lemmy.world 6 points 2 years ago

The Danish government, which has historically been very good about both privacy rights and workers' rights has recently suggested that they are looking into fixing the nurses shortage "via AI".

Our current government is probably the stupidest, most irresponsible and least humanitarian one we've had in my 40 year lifetime if not longer 🤬

[–] Pichu0102@kbin.social 6 points 2 years ago (3 children)

I feel like one way to do this would be to break up models and their training data into mini-models and mini-batches of training data instead of one big model, and also restricting training data to that used with permission as well as public domain sources. For all other cases where a company is required to take down information in a model that their permission to use was revoked or expired, they can identify the relevant training data in the mini batches, remove it, then retrain the corresponding mini model more quickly and efficiently than having to retrain the entire massive model.

A major problem with this though would be figuring out how to efficiently query multiple mini models and come up with a single response. I'm not sure how you could do that, at least very well...

[–] Strawberry@lemmy.blahaj.zone 3 points 2 years ago

You could certainly break up training data, but breaking up the models into mini models based on which training data is used wouldn't work with neural networks trained using gradient descent. Basically whatever the state of the model is it depends on the totality of the training data that it has been trained on (and the order) and it isn't possible to go and remove the effect of a specific training data point without then retraining for all of the data that followed that data point (and even that assumes you were storing a snapshot of the model before every single training data point, which I doubt anyone does)

However, that's no excuse and it is of course possible to entirely retrain a network using a clean dataset and that is what these companies should do

load more comments (2 replies)

[–] cloudless@feddit.uk 5 points 2 years ago (10 children)

It is not impossible, it is just expensive.

load more comments (10 replies)

[–] SomethingBurger@jlai.lu 5 points 2 years ago (19 children)

Can't they remove the data from the training set and start over?

[–] mo_ztt@lemmy.world 4 points 2 years ago (1 children)

Yes, but that's not easy... I can't remember exactly, but I think I saw an estimate that the compute time to train just one of the GPT models cost around $66 million. IDK whether that's total cost from scratch, or incremental cost to arrive at that model starting from an earlier model that was already built, but I do know that GPT is still to this day using that September 2021 cutoff which to me kind of implies that they're building progressively on top of already-assembled models and datasets (which makes sense, because to start from scratch without needing to would be insane).

You could, technically, start from scratch and spend 2 more years and however many million dollars retraining a new model that doesn't have the private data you're trying to excise, but I think the point the article is making is that that's a pretty difficult approach and it seems right now like that's the only way.

[–] skulblaka@kbin.social 5 points 2 years ago

Un-robbing a bank also isn't easy, but that doesn't mean I'm able to just say "it too hard :c" and then walk off into the sunset with my looted gains.

[–] Zeth0s@lemmy.world 2 points 2 years ago* (last edited 2 years ago)

Information leaking is a thing. Some information is spread across multiple sources without actually being in any of those. If you remove something, the model can still infer the information.

If macron asks for his name to be deleted, you can retrieve his political opinion by simply knowing the history of interactions of other people with the French government. I just need to tell the model that the person he has no direct information about is named macron, and he can profile him.

Same with the search engine. The only difference is that the inference of missing information now is done by human brains. The model can substitute them

load more comments (17 replies)

[–] asunaspersonalasst@lemmy.world 4 points 2 years ago

Then why they put it in in the first place no? 👁👄👁

[–] over_clox@lemmy.world 3 points 2 years ago

Have you tried..

format Earth

load more comments