599
submitted 9 months ago by rinze@infosec.pub to c/privacy@lemmy.ml

Reddit said in a filing to the Securities and Exchange Commission that its users’ posts are “a valuable source of conversation data and knowledge” that has been and will continue to be an important mechanism for training AI and large language models. The filing also states that the company believes “we are in the early stages of monetizing our user base,” and proceeds to say that it will continue to sell users’ content to companies that want to train LLMs and that it will also begin “increased use of artificial intelligence in our advertising solutions.”

The long-awaited S-1 filing reveals much of what Reddit users knew and feared: That many of the changes the company has made over the last year in the leadup to an IPO are focused on exerting control over the site, sanitizing parts of the platform, and monetizing user data.

Posting here because of the privacy implications of all this, but I wonder if at some point there should be an "Enshittification" community :-)

top 50 comments
sorted by: hot top controversial new old
[-] Fubarberry@sopuli.xyz 143 points 9 months ago

Reddit has long had an issue with confidently providing false statements as fact. Sometimes I would come along a question that I was well educated on, and the top voted responses were all very clearly wrong, but sounded correct to someone who didn't know better. This made me question all the other posts that I had believed without knowing enough to tell otherwise.

Llms also have the same issue of confidently telling lies that sound true. Training on Reddit will only make this worse.

[-] itsprobablyfine@sh.itjust.works 65 points 9 months ago

Yeah all of my most down voted reddit comments were the ones where I replied about something I'm an actual expert in. Scary stuff

[-] const_void@lemmy.ml 57 points 9 months ago

The voting system let's people push comments to the top that they want to be true, not necessarily things that are true.

[-] dessalines@lemmy.ml 46 points 9 months ago* (last edited 9 months ago)

There's also the issue of reddit comment sorting being entirely dominated by time. In something like 90% of posts, the top comment is one of the first five. Literally all you have to do is just comment first, and it'll likely be the top.

[-] Rentlar@lemmy.ca 23 points 9 months ago

I noticed from the beginning that Lemmy's default comment sorting improves visibility of a variety of comments including newer ones. Gee, I wonder who could have helped make it that way ;)

Over the years I ended up getting a Reddit habit of replying to one of the top comments so that it could attain some visibility. I still do sometimes but less often on Lemmy.

[-] Chriswild@lemmy.world 16 points 9 months ago

Because it's like old forums where the first person to comment gets engagement

load more comments (4 replies)
load more comments (1 replies)
load more comments (1 replies)
load more comments (7 replies)
[-] blindsight@beehaw.org 26 points 9 months ago

The problem is that SEO has made it impossible to find accurate information easily, since even "old, trustworthy brands" can't be trusted online. [This is an excellent article that explains the problem thoroughly, and brings receipts] (https://housefresh.com/david-vs-digital-goliaths/).

[-] Swallowtail@beehaw.org 11 points 9 months ago

This is a great example of why it's so important to emphasize teaching critical thinking in school right now. Misinformation and disinformation is just going to continue to grow.

load more comments (4 replies)
load more comments (2 replies)
[-] livus@kbin.social 16 points 9 months ago* (last edited 9 months ago)

@Fubarberry yes I saw this a lot too. Highly upvoted confidently incorrect comments, with the real answer or an answer debunking them with links to factual sources less upvoted.

Happened to me as well.

[-] collapse_already@lemmy.ml 14 points 9 months ago

I am a lawyer and I would get down voted for posts explaining the law that contained citations to the actual applicable statute if people didn't like the statute. Using reddit up votes as a measure of correctness is fundamentally a dumb idea.

load more comments (1 replies)
load more comments (2 replies)
[-] coyootje@lemmy.world 95 points 9 months ago

I'm still happy that I went through the effort to delete all my old posts when I left Reddit a while back. I periodically check if they've restored them and luckily it hasn't happened so far. I do miss some of the bigger communities but overall I'm having a good time on Lemmy.

[-] cookiecutter@lemmy.dbzer0.com 46 points 9 months ago

I'm sure they have a backup somewhere that they will use to train the AI, but agreed, it is time to leave reddit for good.

[-] hglman@lemmy.world 38 points 9 months ago

Unless you are in the EU Reddit absolutely did not delete your data.

[-] Kbin_space_program@kbin.social 17 points 9 months ago

Reddit is dumb enough that they probably have a backup they kept of EU users.

load more comments (5 replies)
[-] TheOneCurly@lemm.ee 9 points 9 months ago

I wonder what the risks are to including deleted and pre-edited content in training data. Most of the edits are going to be typos and formatting, do you want 2-3 copies of the same message with typos in them for training data? Similarly, deleted comments are mostly nonsense, unhelpful, duplicate, or highly controversial things.

If someone wants to dig through and find individual users to restore that's one thing, but I don't think I'd immediately choose to train off of that other data unless I had to.

load more comments (1 replies)
load more comments (2 replies)
[-] Fapper_McFapper@lemmy.world 14 points 9 months ago

After deleting all of my posts and comments Reddit decided to undelete them three days later and then proceeded to lock me out of my own account. Fucking bastards.

load more comments (3 replies)
[-] LillyPip@lemmy.ca 66 points 9 months ago* (last edited 9 months ago)

Aye, and that’s why I left. As an author, fuck you trying to monetise my writing when I can’t even do that myself.

[-] whoisearth@lemmy.ca 17 points 9 months ago

Hey another author?! How you doin? Lol

Same as you fuck them.

[-] LillyPip@lemmy.ca 9 points 9 months ago

Yeah, hi!

Can I have a link to your work?

load more comments (5 replies)
[-] Steve@communick.news 56 points 9 months ago* (last edited 9 months ago)

You know the phrase "If you aren't paying, you're the product".
It doesn't hit as hard as a CEO using the phrase "Monetizing Our User Base".

[-] ShadowRam@kbin.social 55 points 9 months ago

You know what the world doesn't need?

an AI model trained on the old Reddit Hive Mind.

[-] HobbitFoot@thelemmy.club 21 points 9 months ago

Some AI models already argue when people point out inaccuracies, just like on Reddit.

[-] LWD@lemm.ee 15 points 9 months ago

Makes me wonder how that technology is going to track. Reddit isn't bad for finding niche answers to niche questions, but if you import the data wholesale then you'll have a hard time separating the signal from the noise, even if you sort by using vote counts as relevance.

Reddit is valuable because people can do a search for a niche topic and find the answer on that forum. And the answer was written by a human. It's not valuable because it can amalgamate an approximation of those answers that might be 90% true and 10% dead wrong.

load more comments (1 replies)
[-] Atemu@lemmy.ml 10 points 9 months ago

Guess what data they're trained on...

load more comments (1 replies)
[-] possiblylinux127@lemmy.zip 14 points 9 months ago* (last edited 9 months ago)

I disagree. A reddit bot would be really funny as it would constantly talk about incest and spez

load more comments (2 replies)
[-] noodlejetski@lemm.ee 53 points 9 months ago

there should be an "Enshittification" community

basically every technology one

[-] 4grams@awful.systems 31 points 9 months ago

This has got to be the start of another bubble popping. It just has to, right? With essentially all online services doing everything they can to wring out every last penny of value without any eye towards the future (other than ai all the things)… something’s gotta give.

But then again, maybe it’s just my eyes being open after living in those spaces for so long. Granted I’ve been out of Facebook for years, been de-amazonning for a couple (it’s really f’ing hard) and I’ve been trying to de-google as well but it’s even harder (stuck with Apple though). But, now that I’m in the fediverse, where we’re talking about all this, maybe that’s why I’m noticing?

Nah, brace yourselves.

[-] HobbitFoot@thelemmy.club 12 points 9 months ago

The start of the bubble popping was the increases in interest rates. We've seen several online companies shut down already because the free money isn't there any more and there is no path to monetization.

The problem with the Fediverse right now is that it is all run on volunteer labor and donations, similar to an early Reddit. It will be interesting to see how a distributed system solves this problem.

[-] bdonvr@thelemmy.club 11 points 9 months ago

I think the volunteer labor and donations strategy works much, much, better on a distributed platform like the fediverse.

load more comments (2 replies)
load more comments (2 replies)
load more comments (1 replies)
load more comments (1 replies)
[-] NutWrench@lemmy.ml 41 points 9 months ago

"Early Stages?" You've got AI mining your data. The Lions have already come and gone. The hyenas and other scavengers are picking over the scraps, now.

[-] delirious_owl@discuss.online 15 points 9 months ago

They mean that they havent made money on it (yet)

load more comments (1 replies)
[-] Steve@startrek.website 40 points 9 months ago

Fuck u/spez

[-] thebardingreen@lemmy.starlightkel.xyz 36 points 9 months ago* (last edited 9 months ago)

When I go to some reddit posts on Mobile now (like from a Google search, that's the only way I end up at reddit anymore), it tells me "this content is unmoderated" and gives me a choice to either navigate away or install the Reddit app. Fuck that noise.

[-] alekradic@lemmy.world 8 points 9 months ago* (last edited 9 months ago)

Try this, in either Bing/Copilot AI or Google Gemini: Start your prompt with "According to Reddit", then do your search like you would by using search alone.

The AI of your choice will scrape the posts and give you a nice summary of whatever you were searching for - no need to ever touch Reddit directly.

For me, this works better with Copilot, YMMV.

Example: "According to Reddit, what is the best mechanical keyboard brand to use for touch typing?"

load more comments (5 replies)
load more comments (1 replies)
[-] init@lemmy.ml 32 points 9 months ago

I know it's only token resistance at this point because others have found their comments from Google searches even after their accounts have been deleted, but Power Delete Suite is busy churning away on mine right now.

[-] rinze@infosec.pub 22 points 9 months ago

I wish I had known about Power Delete Suite. I nuked my posts / comments by hand :-(

In case it's useful to more people: https://github.com/j0be/PowerDeleteSuite

load more comments (1 replies)
load more comments (3 replies)
[-] Aolley@lemmy.world 30 points 9 months ago

They've finally gone full /HailCorporate, become the thing some of the original people of the site would probably not have agreed with in many ways

load more comments (1 replies)
[-] SinningStromgald@lemmy.world 27 points 9 months ago

“we are in the early stages of monetizing our user base,”

If anyone on Reddit reads that and stays there willingly they are an idiot. Not they weren't idiots for staying after the API changes but now they are even bigger idiots.

[-] collapse_already@lemmy.ml 24 points 9 months ago

They permabanned my 14yo account because my anti-nazi rhetoric was "encouraging violence." I guess Nazis are a class of humans dumb enough to give them money so they don't want to scare them off. The post that got me banned had more than 60 up votes when it was deleted and I was permabanned. A reply post in the same vein was not deleted.

[-] EdibleFriend@lemmy.world 12 points 9 months ago* (last edited 9 months ago)

Remember that video where Ron Perlman talked about there's a lot of ways to lose a house?

I lost my 11-year account because I said something to the effect of 'If Ron Perlman pulled up and said get in the fucking car we're going to go burn down Bob Iger's house I wouldn't hesitate.'

They had been getting very weird near the end there anyways? I kept getting these stupid warnings over the most petty shit. At one point somebody said respond to this comment and I'll gild you. I simply responded fuck you because I thought it would be funny to see that have gold, which it got. Got an official warning for harassment.

I had said a lot worse over the years.

load more comments (1 replies)
load more comments (5 replies)
[-] Outtatime@sh.itjust.works 24 points 9 months ago

Aaron Schwartz is rolling in his grave

[-] Uglyhead@lemmy.world 17 points 9 months ago

“Pay-Per-Click”, is all this is when you break it down to its basest.

Narwhal developers have come out and said that they have to pay beforehand for clicks to the API—- what absolute bullshit Reddit and Spez are bringing to the trough. Spez killed reddit—- calling it now; a slow painful lingering shitty death.

People will not put up with it once they know what is really going on.

Let em know. “Pay-Per-Click” will not stand.

[-] SirSmokeAlot@lemmy.ml 10 points 9 months ago

People will not know what is really going on as they do not care. Reddit will continue to exist.

load more comments (1 replies)
[-] kirbowo808@kbin.social 13 points 9 months ago

With all the changes that Reddit has made recently esp with the API changes, it definitely did leave salt in my mouth alongside how increasing toxic the Reddit community had become in comparison to when I joined the community but the small niche communities that existed on Reddit did honestly made it harder to quit due to the lack of communities outside, which is another big problem with centralisation, esp in the modern internet as it makes you rely on platforms you may not necessarily like due to big issues like social isolation etc.

When I found out about this, this isn’t simply excusable anymore and I would rather delete my account over having my personal data being sold for profit (which goes completely against the early ethos of Reddit as a whole but being semi owned by Conde Nast, this would have been inevitable) despite the fact that I have been thinking about deleting my Reddit profile way before this issue.

Surprisingly, I honestly have had no regrets deleting Reddit out of my life and honestly I do wish I would have done it sooner, I’m far less frustrated, I’m starting to think more constructively again and I feel way way less dependent on it.

Can say, I made a good choice there tbh.

load more comments (1 replies)
[-] daniyyel@lemm.ee 11 points 9 months ago

Is this a long term source of revenue for Reddit? Or will it loose value at some point, simply because LLMs are all trained sufficiently on user generated content. Is there more to learn at some point?

Also it seems that a lot of content on Resdit is already AI generated, so it would train on data from other LLMs, which I'm sure doesn't improve quality.

[-] LWD@lemm.ee 11 points 9 months ago

LLM's are a parasitic entity. They can only operate as long as they have a living host (us) on which to draw data. Without their host, they rapidly start hallucinating. Hell, the other day ChatGPT (and every business that relied on it) started hallucinating for no apparent reason.

The thing about the parasite is, though, that it endangers its host. At some point, the fact that anything you say can be plugged into a machine with no credit given back to you, will encourage creative people to stop bothering being creative, depriving them of income or even exposure.

It's a funny thing, a few years ago I would say that the "anything you post here can be sold by us" clause on social media was very unlikely to get exploited, as nobody knew how to sell data en masse to make money off of it. I guess now we know that's not true at all. If something bad can happen with your data... It will.

load more comments (3 replies)
[-] henfredemars@infosec.pub 10 points 9 months ago* (last edited 9 months ago)

I'm in the early stages of becoming a billionaire. Now I just need approximately a smidge less than a billion dollars.

load more comments (1 replies)
[-] cluelessafterall@lemmy.world 9 points 9 months ago

If you are planning to kill your reddit account, there is an app, Redact, which is available on the Apple and Play stores, that will allow you to nuke all your posts before you close it completely. Deny them your data.

[-] BangCrash@lemmy.world 9 points 9 months ago

Surely that just removes the public data.

They will have backups that will retain it all

load more comments (1 replies)
load more comments (1 replies)
load more comments
view more: next ›
this post was submitted on 23 Feb 2024
599 points (98.4% liked)

Privacy

32159 readers
575 users here now

A place to discuss privacy and freedom in the digital world.

Privacy has become a very important issue in modern society, with companies and governments constantly abusing their power, more and more people are waking up to the importance of digital privacy.

In this community everyone is welcome to post links and discuss topics related to privacy.

Some Rules

Related communities

much thanks to @gary_host_laptop for the logo design :)

founded 5 years ago
MODERATORS