237

Maven Imported 1.12 Million Fediverse Posts (wedistribute.org)

submitted 4 months ago by deadsuperhero@lemmy.world to c/fediverse@lemmy.world

93 comments fedilink hide all child comments

Maven, a new social network backed by OpenAI's Sam Altman, found itself in a controversy today when it imported a huge amount of posts and profiles from the Fediverse, and then ran AI analysis to alter the content.

top 50 comments

sorted by: hot top controversial new old

[-] snugglebutt@lemmy.blahaj.zone 126 points 4 months ago

Pretty wild

[-] doctortofu@reddthat.com 145 points 4 months ago

The wildest part is that he's surprised that Mastodon peeps would react negatively to their posts being scrapped without consent or even notification and fed into an AI model. Like, are you for real dude? Have you spent more than 4 seconds on Mastodon and noticed their (our?) general attitude towards AI? Come the hell on...

[-] danc4498@lemmy.world 32 points 4 months ago

People can complain, but the Fediverse is built to make consuming user’s data easy. If you don’t want AI using your data, don’t put it on such an easily “scrapable” network.

[-] el_bhm@lemm.ee 47 points 4 months ago

Yeah, and girls dress for rape. They are just aaasking for it!

I will go off on a tangent.

Just because something is online it does not mean I give a full green light on anything.

Fuck this noise of social parasitic networks hammering free service therefore pay with data into everyone's skull. And everyone posts crap.

It is a billion dollar business. LLMs are extracting millions and will generate more.

You know why? Because worthless shit you post online is not worthless after all.

Yes, you are reading it right. Pay me. Pay us.

Before anyone ridicules this. Yall be defending billion dollar corporations, staffed with millionaires below C-levels.

People should start demanding money from these greedy assholes.

[-] circuscritic@lemmy.ca 23 points 4 months ago* (last edited 4 months ago)

I don't think they're making a moral argument, but pointing out the reality of the situation as it stands.

This is a problem that can only be fixed through legislation and aggressive enforcement backed by large punitive actions.

Until that happens, it's better to acknowledge and understand the reality of the situation, than to believe that a morally righteous condemnation will somehow unmake that reality.

It sucks. I agree with your philosophical stance, except for the payment for personal data, as I'd prefer a complete opt-out. However, none of that changes where we're at right now.

load more comments (4 replies)

[-] bbuez@lemmy.world 15 points 4 months ago

Alternatively, use a closed ecosystem susceptible to data rot and loss.

Want to contribute to our open source project? Join our discord

Would you want art to be unfindable because scraping for AI image generation happens? It's a solution looking for problems.

[-] scrubbles@poptalk.scrubbles.tech 8 points 4 months ago

This is what I've been saying the entire time. It sucks, and it's wrong, but the fediverse is built from the ground up as an open sharing platform, where amour data is shared with anyone. It shouldn't be, and it's wrong, but there is nothing to stop anyone from doing it. To change that would alter federation at a core level

[-] danc4498@lemmy.world 13 points 4 months ago

I would rather my content be open to the world for however it wants to use it than owned by a single company that gets to profit off aggregating and selling it.

[-] scrubbles@poptalk.scrubbles.tech 4 points 4 months ago

Fully agree. The annoyances of free and open are vastly outweighed by the negatives

[-] tooLikeTheNope@lemmy.ml 2 points 4 months ago

Yeah but doesn't hubzilla (https://hubzilla.org/page/info/discover) applies a privacy layer to how its content it is distributed? The issue then lies also in how the social network gets implemented in function of its purpose, in hubzilla vs lemmy case for instance is a public board vs a social network

load more comments (4 replies)

load more comments (5 replies)

[-] FaceDeer@fedia.io 10 points 4 months ago

It sounds like they weren't "being fed into an AI model" as in being used as training material, they were just being evaluated by an AI model. However...

Have you spent more than 4 seconds on Mastodon and noticed their (our?) general attitude towards AI?

Yeah, the general attitude of wild witch-hunts and instant zero-to-11 rage at the slightest mention of it. Doesn't matter what you're actually doing with AI, the moment the mob thinks they scent blood the avalanche is rolling.

It sounds like Maven wants to play nice, but if the "general attitude" means that playing nice is impossible why should they even bother to try?

[-] doctortofu@reddthat.com 6 points 4 months ago

The anti-AI knee-jerk reactions can be extreme, I agree, but at the same time one of important features of Mastodon is that your feed is nor controlled by an algorithm in any way.

So when a company comes, takes those posts and screws with them to create an algorithm to show them, I understand people getting angry - at least some of them joined to be free of that exact thing...

[-] FaceDeer@fedia.io 8 points 4 months ago

One of the important features of Mastodon is that you can choose what your feed is. Everyone's feed has an algorithm determining what's in it even if it's just a simple "list the posts of everyone I've subscribed to in chronological order."

If someone else wants to see a feed of content that is curated and sorted in a different way, why get angry at them? They're not forcing you to see that feed.

[-] xavier666@lemm.ee 2 points 4 months ago

Yeah, the general attitude of wild witch-hunts and instant zero-to-11 rage at the slightest mention of it. Doesn’t matter what you’re actually doing with AI, the moment the mob thinks they scent blood the avalanche is rolling.

This wasn't always the case. A lot of research on NLP uses scraped social media posts (2010's). People never had a problem with that (at least the outrage wasn't visible back then). The problem now is that our content is being used to create an AI product where there is zero consent taken from the end-user.

Source: My research colleagues used to work on NLP

[-] schizo@forum.uncomfortable.business 4 points 4 months ago

For me, more specifically, the problem is they took my data and made a tool to sell it back to me without paying me for it.

I have no real issue with current ai stuff, other than you're effectively taking our stuff and want us to pay you for doing so.

If they weren't freeloading on everyone, I suspect you'd have a lot less angry people.

load more comments (1 replies)

[-] Etterra@lemmy.world 6 points 4 months ago

It's not surprised. He's acting surprised because he got caught. It's pretty standard for these jerkass tech bros. "Move fast break things" is code "break laws be unethical" - as I think we've all seen if you do it often and fast enough you can keep way ahead of any kind of accountability because everybody else is trying to play catch up well the last thing has already filtered out of the news cycle.

load more comments (3 replies)

[-] Blaze@reddthat.com 17 points 4 months ago

His Mastodon: https://mastodon.social/@jsecretan/with_replies

[-] technomad@slrpnk.net 6 points 4 months ago

Look at that shit-eating grin, he knows. There's no way someone can be that out of touch, right? Right?!?

load more comments (1 replies)

[-] lunarul@lemmy.world 88 points 4 months ago

I was confused why a package manager would need to import posts from a social network.

Why name a new product the same as a very popular existing product?

[-] MindTraveller@lemmy.ca 17 points 4 months ago

Obviously it's named after Maven Black-Briar

load more comments (1 replies)

[-] threelonmusketeers@sh.itjust.works 40 points 4 months ago

I was confused on what they were trying to accomplish, and even after reading the article I am still somewhat confused.

Instead, when a user posts something, the algorithm automatically reads the content and tags it with relevant interests so it shows up on those pages. Users can turn up the serendipity slider to branch out beyond their stated interests, and the algorithm running the platform connects users with related interests.

Perhaps I'm a minority, but I don't see myself getting much utility out of this. I already know what my interests are, and don't have much interest in growing them algorithmically. If a topic is really interesting, I'll eventually find out about it via an actual human.

[-] technomad@slrpnk.net 28 points 4 months ago

Yeah, we're trying to get the fuck away from algorithms. That's what makes the fediverse such a big draw currently, for me.

[-] scrubbles@poptalk.scrubbles.tech 13 points 4 months ago

Only algorithm I need is posts I subscribe to, in descending order. That's about it

[-] FaceDeer@fedia.io 3 points 4 months ago

You're on slrpnk.net, I assume it's not implementing any of this stuff. As long as you don't sign up for Maven I don't see how this is going to affect you.

[-] technomad@slrpnk.net 7 points 4 months ago* (last edited 4 months ago)

I mean yeah, maybe it won't affect me directly, I like the instance I'm on and it's a pretty respectable one. However, indirectly, this is very relevant to any Fediverse user, regardless of the instance or platform they're using. Allowing abuses like this to happen without any pushback is a surefire way of turning this place into a shithole just like the rest of the internet. I appreciate the fact that, at least for now, it's different here.

Also, maybe this isn't my only homebase? Just saying.

[-] Zak@lemmy.world 11 points 4 months ago

TikTok is really popular operating on essentially the same principle. I, for one want nothing to do with that.

load more comments (2 replies)

[-] misk@sopuli.xyz 29 points 4 months ago* (last edited 4 months ago)

That's why I keep saying it's pointless to defederate corpos. They'll just scrape everything before you notice.

[-] snugglebutt@lemmy.blahaj.zone 28 points 4 months ago

The fact they even got DMs from at least one instance is crazy.

[-] mke@lemmy.world 27 points 4 months ago* (last edited 4 months ago)

And it's also damming for private messaging on mastodon.

I once read vague complaints about it being a rushed implementation. While I won't trust those without evidence, I for sure wouldn't trust mastodon with my PMs. At least, not until how this was allowed to happen is figured out and fixed if necessary.

P.S. I'm still not sure I believe in PMs in the fediverse. If I need to share something and care about keeping it private, I'd rather move the conversation elsewhere.

[-] technomad@slrpnk.net 17 points 4 months ago

I was under the impression that DM's on Mastodon (and Lemmy too) weren't ever stated as being secure and I think that they were both pretty transparent about this particular aspect.

[-] mke@lemmy.world 10 points 4 months ago

You're right, regarding Mastodon. I won't edit my other comment, though, both to preserve the original chain of thought and because that brings up another discussion.

To quote the EFF:

We feel that the intended usage of the feature will not determine people’s expectation of privacy while using it.

Offering people a feature with preexisting expectations, similar to other things that fulfill those expectations, then telling people "We know it looks like a duck but don't expect it to quack!"

...It begs the question: was the feature really a good idea?

load more comments (1 replies)

[-] Blaze@reddthat.com 9 points 4 months ago

Defederation is more about not being flooded with 1000x more users than the Fediverse currently has

load more comments (4 replies)

[-] pennomi@lemmy.world 3 points 4 months ago

Plus even if you defederate them, oops, it’s all public anyway!

[-] FartsWithAnAccent@fedia.io 16 points 4 months ago

Classic Scam Altman!

[-] Blaze@reddthat.com 16 points 4 months ago

I was confused at first, I thought it was the Apache project

[-] verstra@programming.dev 13 points 4 months ago

Oh shit, the persona guy was right! We should all be adding license to our comments, so could not legally train model that are then used for commercial purposes.

[-] pennomi@lemmy.world 18 points 4 months ago

The easiest way is a sitewide NoAI meta tag, since it’s the current standard. Researchers are much more likely to respect a common standard and extremely unlikely to respect a single user’s personal solution adding a link to their comments.

[-] scrubbles@poptalk.scrubbles.tech 6 points 4 months ago

This is the only way I see it being acceptable. How do we add this to instances?

[-] iAvicenna@lemmy.world 4 points 4 months ago

I feel like the bad thing about this is, whereas the researchers will mostly respect this, companies who want to make money out of data will still secretly keep using the data anyways. I am more ok with the data being used for non-profit research and not for making money but this would likely have the opposite effect.

load more comments (2 replies)

[-] Blaze@reddthat.com 7 points 4 months ago

@onlinepersona@programming.dev

[-] onlinepersona@programming.dev 7 points 4 months ago* (last edited 4 months ago)

Thanks for linking me 🙏 The makers of Maven probably set off a bomb now and people might ask for anti-AI features on the clients and servers.

Anti Commercial-AI license

[-] Danterious@lemmy.dbzer0.com 4 points 4 months ago

yeah they were. I hope more people start doing it even if it doesn't legally hold water its still a good way to show that fediverse users won't stand for that.

~Anti~ ~Commercial-AI~ ~license~ ~(CC~ ~BY-NC-SA~ ~4.0)~

load more comments (6 replies)

load more comments (3 replies)

[-] Larry@lemmy.world 10 points 4 months ago

Am I misunderstanding this, or did they just fuck up the integration so it's one way with a plan to make it two ways after, and the AI alteration is just sentiment analysis on whatever they took?

[-] FaceDeer@fedia.io 13 points 4 months ago

Looks like it.

In addition to pulling in posts, the import process seems to be running AI sentiment analysis to add tags and relational data after content reaches Maven’s servers. This is a core part of Maven’s product: instead of follows or likes, a model trains itself on its own data in an attempt to surface unique content algorithmically.

But of course, that news doesn't give the reader those lovely rage endorphins or draw clicks.

This is the Fediverse, having the content we post get spread around to other servers is the whole point of all this. Is this a face-eating leopard situation? People are genuinely surprised and upset that the stuff we post here is ending up being shown in other places?

There is one thing I see here that raises my eyebrows:

Even more shocking is the revelation that somehow, even private DMs from Mastodon were mirrored on their public site and searchable. How this is even possible is beyond me, as DM’s are ostensibly only between two parties, and the message itself was sent from two hackers.town users.

But that sounds to me like a hackers.town problem, it shouldn't be sending out private DMs to begin with.

[-] deadsuperhero@lemmy.world 9 points 4 months ago

They kind of fucked up everything in approaching this by not talking to the community and collecting feedback, making dumb assumptions in how the integration was supposed to work, leaking private posts, running everything through their AI system, and neglecting to represent the remote content as having came from anywhere else.

The other thing is that Maven's whole concept is training an AI over and over again on the platform's posts. Ostensibly, this could mean that a lot of Fediverse content ended up in the training data.

[-] Grandwolf319@sh.itjust.works 5 points 4 months ago

Genuine question, do instances not have a GPL license on their content? With that license, anyone can use all the data but only for open source software.

[-] GamingChairModel@lemmy.world 3 points 4 months ago

Instances don't actually own the copyright to comments. The poster owns the copyright and licenses it to the instance. Which lets the instance use it, but not sublicense to others.

load more comments (2 replies)

load more comments