this post was submitted on 10 Jan 2024

925 points (96.5% liked)

Technology

75300 readers

4261 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

925

"Did you realize that we live in a reality where SciHub is illegal, and OpenAI is not?" (fosstodon.org)

submitted 2 years ago by Star@sopuli.xyz to c/technology@lemmy.world

155 comments fedilink hide all child comments

top 50 comments

sorted by: hot top controversial new old

[–] Star@sopuli.xyz 309 points 2 years ago* (last edited 2 years ago) (11 children)

It's so ridiculous when corporations steal everyone's work for their own profit, no one bats an eye but when a group of individuals do the same to make education and knowledge free for everyone it's somehow illegal, unethical, immoral and what not.

[–] Grimy@lemmy.world 63 points 2 years ago (15 children)

Using publically available data to train isn't stealing.

Daily reminder that the ones pushing this narrative are literally corporation like OpenAI. If you can't use copyright materials freely to train on, it brings up the cost in such a way that only a handful of companies can afford the data.

They want to kill the open-source scene and are manipulating you to do so. Don't build their moat for them.

[–] givesomefucks@lemmy.world 42 points 2 years ago* (last edited 2 years ago) (10 children)

And using publicly available data to train gets you a shitty chatbot...

Hell, even using copyrighted data to train isn't that great.

Like, what do you even think they're doing here for your conspiracy?

You think OpenAI is saying they should pay for the data? They're trying to use it for free.

Was this a meta joke and you had a chatbot write your comment?

[–] tourist@lemmy.world 24 points 2 years ago

Was this a meta joke and you had a chatbot write your comment?

if someone said this to me I'd cry

[–] webghost0101@sopuli.xyz 15 points 2 years ago* (last edited 2 years ago) (12 children)

The point that was being made was that public available data includes a whole lot amount of copyrighted data to begin with and its pretty much impossible to filter it out. Grand example, the Eiffel tower in Paris is not copyright protected, but the lights on it are so you can only using pictures of the Eiffel tower during the day, if the picture itself isn't copyright protected by the original photographer. Copyright law has all these complex caveat and exception that make it impossible to tell in glance whether or not it is protected.

This in turn means, if AI cannot legally train on copyrighted materials it finds online without paying huge sums of money then effectively only mega corporation who can pay copyright fines as cost of business will be able to afford training decent AI.

The only other option to produce any ai of such type is a very narrow curated set of known materials with a public use license but that is not going to get you anything competent on its own.

EDIT: In case it isn't clear i am clarifying what i understood from Grimy@lemmy.world comment, not adding to it.

[–] be_excellent_to_each_other@kbin.social 18 points 2 years ago (1 children)

So then we as a society aren't ready to untangle the mess of our infancy in the digital age. ChatGPT isn't something we must have at all costs, it's something we should have when we can deploy it while still respecting the rights of people who have made the content being used to train it.

load more comments (1 replies)

[–] RainfallSonata@lemmy.world 17 points 2 years ago (5 children)

I didn't want any of this shit. IDGAF if we don't have AI. I'm still not sure the internet actually improved anything, let alone what the benefits of AI are supposed to be.

load more comments (5 replies)

[–] TwilightVulpine@lemmy.world 12 points 2 years ago

It's not like all this data was randomly dumped at the AIs. For data sets to serve as good training materials they need contextual information so that the AI can discern patterns and replicate them when prompted.

We see this when you can literally prompt AIs with whose style you want it to emulate. Meaning that the data it was fed had such information.

Midjourney is facing extra backlash from artists after a spreadsheet was leaked containing a list of artist styles their AI was trained on. Meaning they can keep track of it and they trained the AI with those artists' works deliberately. They simply pretend this is impossible to figure out so that they might not be liable to seek permission and compensate the artists whose works were used.

load more comments (9 replies)

load more comments (8 replies)

[–] TwilightVulpine@lemmy.world 34 points 2 years ago (3 children)

OpenAI is definitely not the one arguing that they have stole data to train their AIs, and Disney will be fine whether AI requires owning the rights to training materials or not. Small artists, the ones protesting the most against it, will not. They are already seeing jobs and commission opportunities declining due to it.

Being publicly available in some form is not a permission to use and reproduce those works however you feel like. Only the real owner have the right to decide. We on the internet have always been a bit blasé about it, sometimes deservedly, but as we get to a point we are driving away the very same artists that we enjoy and get inspired by, maybe we should be a bit more understanding about their position.

load more comments (3 replies)

[–] winterayars@sh.itjust.works 23 points 2 years ago (1 children)

That depends on what your definition of "publicly available" is. If you're scraping New York Times articles and pulling art off Tumblr then yeah, it's exactly stealing in the same way scihub is. Only difference is, scihub isn't boiling the oceans in an attempt to make rich people even richer.

load more comments (1 replies)

[–] kibiz0r@lemmy.world 14 points 2 years ago (2 children)

We have a mechanism for people to make their work publically visible while reserving certain rights for themselves.

Are you saying that creators cannot (or ought not be able to) reserve the right to ML training for themselves? What if they want to selectively permit that right to FOSS or non-profits?

load more comments (2 replies)

[–] grue@lemmy.world 10 points 2 years ago* (last edited 2 years ago) (1 children)

They want to kill the open-source scene

Yeah, by using the argument you just gave as an excuse to "launder" copyleft works in the training data into permissively-licensed output.

Including even a single copyleft work in the training data ought to force every output of the system to be copyleft. Or if it doesn't, then the alternative is that the output shouldn't be legal to use at all.

load more comments (1 replies)

load more comments (10 replies)

[–] richieadler@lemmy.myserv.one 21 points 2 years ago* (last edited 2 years ago) (2 children)

Cue the Max Headroom episode where the blanks (disconnected people) are chased by the censors because the blanks steal cable so their children can watch the educational shows and learn to read, and they are forced to use clandestine printing presses to teach them.

[–] mPony@kbin.social 13 points 2 years ago

what's this? an anti-corporate message that sneers at cable TV companies??? CANCEL THAT SHOW!!!

that show was so amazingly prescient: the theme of the first episode was how advertising literally kills its viewers and the news covers things up. No wonder they didn't get renewed. ;)

load more comments (1 replies)

load more comments (9 replies)

[–] Aielman15@lemmy.world 102 points 2 years ago (1 children)

I pirated 90% of the texts I used to write my thesis at university, because those books would have cost me hundreds of euros that I didn't have.

Fuck you, capitalism.

[–] puchaczyk@lemmy.blahaj.zone 12 points 2 years ago

I pirated texts for my thesis even when I had access to them through my university. A lot of journals are just too annoying to use.

[–] reverendsteveii@lemm.ee 87 points 2 years ago (1 children)

this is because the technocrats are allowed to steal from you, but when you steal from them what they've stolen from actual researchers that's a problem

[–] blazeknave@lemmy.world 20 points 2 years ago (2 children)

There are no technocrats. Just oligarchs, that titan newer industries. Same as the old boss. Don't give them more credit than that. It's evil capitalism. Lump them with bankers, not UX designers imho

load more comments (2 replies)

[–] LWD@lemm.ee 84 points 2 years ago* (last edited 2 years ago) (7 children)

deleted

[–] givesomefucks@lemmy.world 47 points 2 years ago (11 children)

AFAIK the individual researchers who get their work pirated and put on Sci-Hub don’t seem to particularly mind.

Why would they?

They don't get paid when people pay for articles.

Back before everyone left twitter, the easiest way to get a paywalled study was hit up to be of the authors, they can legally give a copy to anyone, and make no money from paywalls

[–] eager_eagle@lemmy.world 7 points 2 years ago

Also, no researcher would even exist if grad students had to pay for the papers they read and cite. A lot of people is not fortunate enough to have access to these publications through their uni. Heck, even when I had it, I'd still go to sci-hub just for the sake of convenience.

Like a lot of services nowadays, they offer a mediocre service and still charge for it.

load more comments (10 replies)

[–] breakfastmtn@lemmy.ca 30 points 2 years ago

Academics don't care because they don't get paid for them anyway. A lot of the time you have to pay to have your paper published. Then companies like Elsevier just sit back and make money.

[–] brsrklf@jlai.lu 13 points 2 years ago (1 children)

I follow a few researchers with interesting youtube channels, and they often mention that if you ask them or their colleagues for a publication of theirs, chances are they'll be glad to send it to you.

A lot of them love sharing their work, and don't care at all for science journal paywalls.

load more comments (1 replies)

[–] honey_im_meat_grinding@lemmy.blahaj.zone 7 points 2 years ago

I’m starting to think the term “piracy” is morally neutral. The act can be either positive or negative depending on the context. Unfortunately, the law does not seem to flow from morality, or even the consent of the supposed victims of this piracy.

The morals of piracy also depend on the economic system you're under. If you have UBI, the "support artists" argument is far less strong, because we're all paying taxes to support the UBI system that enables people to become skilled artists without worrying about starving or homelessness - as has already happened to a lesser degree before our welfare systems were kneecapped over the last 4 decades.

But that's just the art angle, a tonne of the early-stage (i.e. risky and expensive) scientific advancements had significant sums of government funding poured into them, yet corporations keep the rights to the inventions they derive from our government funded research. We're paying for a lot of this stuff, so maybe we should stop pretending that someone else 'owns' these abstract idea implementations and come up with a better system.

load more comments (3 replies)

[–] ICastFist@programming.dev 65 points 2 years ago (1 children)

What really breaks the suspension of disbelief in this reality of ours is that fucking advertising is the most privacy invasive activity in the world. Seriously, even George Orwell would call bullshit on that.

[–] Maggoty@lemmy.world 57 points 2 years ago* (last edited 2 years ago) (19 children)

Oh OpenAI is just as illegal as SciHub. More so because they're making money off of stolen IP. It's just that the Oligarchs get to pick and choose. So of course they choose the arrangement that gives them more control over knowledge.

load more comments (19 replies)

[–] erranto@lemmy.world 40 points 2 years ago

If you have enough money, you can do whatever you want!

[–] Fedizen@lemmy.world 37 points 2 years ago (1 children)

Make the AI folks use public domain training data or nothing and maybe we'll see the "life of the author + 75 years" bullshit get scaled back to something reasonable.

load more comments (1 replies)

[–] Jknaraa@lemmy.ml 35 points 2 years ago

And people wonder why there's so much push back against everything corps/gov does these days. They do not act in a manner which encourages trust.

[–] TWeaK@lemm.ee 33 points 2 years ago (1 children)

OpenAI isn't really proven as legal. They claim it is, and it's very difficult to mount a challenge, but there definitely is an argument that they have no fair use protection - their "research" is in fact development of a commercial product.

[–] givesomefucks@lemmy.world 12 points 2 years ago* (last edited 2 years ago) (1 children)

Using it to train is a grey area, if you paid for the works. If you didn't, it's still illegal

What it does is output copyrighted works which is copyright infringement. That is the legal issue. It's very easy to prompt it into giving full copyright text they never even paid to look at, let alone give to other people.

"AI" can't even handle switching synonyms to make it technically different like a college kid cheating on an essay

load more comments (1 replies)

[–] uriel238@lemmy.blahaj.zone 22 points 2 years ago (1 children)

The IP system, which goes to great lengths to block things like open-access scientific publications, is borked borked borked borked borked.

If OpenAI and other generative AI projects are the means by which we finally break it so we can have culture and a public domain again, well, we had to nail Capone with tax evasion.

Yes, industrialists want to use AI [exactly they way they want to use every other idea -- plausible or not] to automate more of their industries so they can pay fewer people less money for more productivity. And this is a problem of which generative AI figures centrally, but it's not really all that new, and eventually we're going to have to force our society to recognize that it works for the public and not money. I don't think AI is going to break the system and lead us to communist revolution ( The owning class will tremble...! ) But eventually it will be 1789 all over again. Or we'll crush the fash and realize the only way we can get the fash to not come back is by restoring and extending FDR's new deal.

I am skeptical the latter can happen without piles of elite heads and rivers of politician blood.

[–] Maggoty@lemmy.world 9 points 2 years ago (5 children)

We need to ban the publishing business from academic stuff. Have the Universities host a site that's free access. They can also better run the peer review system and the journals would also also no longer control what research sees the light of day even behind a paywall.

load more comments (5 replies)

[–] Tathas@programming.dev 16 points 2 years ago (1 children)

Time to make OpenASci?

/rimshot

[–] BleatingZombie@lemmy.world 13 points 2 years ago (3 children)

More people need to think like you. Why isn't "Total War: Warhammer" just called "Total Warhammer"? These are the questions that keep me up at night

load more comments (3 replies)

[–] hottari@lemmy.ml 14 points 2 years ago (2 children)

This is different. AI as a transformative tech is going to usher the US economy into the next boom of prosperity. The AI revolution will change the world and allow people to decide if they want to work for money or not (read UBI). In case you haven't caught on, am being sarcastic.

All this despite ChatGPT being a total complete joke.

load more comments (2 replies)

[–] UnderpantsWeevil@lemmy.world 14 points 2 years ago

Consider who sits on OpenAI's board and owns all their equity.

SciHub's big mistake was to fail to get someone like Sundar Pichai or Jamie Iannone with a billion-dollar stake in the company.

[–] rivermonster@lemmy.world 14 points 2 years ago

Kind of a strawman, I'd like everything to be FOSS, and if we keep Capitalism (which we shouldn't), it should be HEAVILY regulated not the laissez-faire corporatocracy / oligarchy we have now.

I don't want any for-profit capitalists to have any control of AI. It should all be owned by the public and all productive gains from it taxed at 100%. But open source AI models, right on.

And team SciHub--FUCK YEAH!

[–] Mango@lemmy.world 12 points 2 years ago (1 children)

What's scihub?

[–] sndrtj@feddit.nl 24 points 2 years ago (1 children)

A website where you can download paywalled scientific literature. Most scientific literature is paywalled by publishers, and costs a real significant amount to read (like 30-50$ per article if you don't have a subscription).

Scihub basically just pirates it. And has been shut down several times. But as most scientific studies are already laid with public money, scihub isn't that unethical at all.

[–] andros_rex@lemmy.world 14 points 2 years ago (1 children)

Lots of scientists will just send you their article if you email them. They don’t get the money when you pay to read it - often they pay to submit. Reviewing journal articles is a privilege and doesn’t get you paid. The prestige of a scientific article is from the number of times people have cited it. The only “harm” done is that the publisher doesn’t get to make 100% profit for doing nothing.

Journal publishing is mostly a way to extract money from universities. Elsevier and its ilk name whatever price they think a research university can afford.

load more comments (1 replies)

[–] Bananigans 10 points 2 years ago (5 children)

If this ends with LLMs getting shutdown to some degree, I wonder if it's going to result in something like a Pirate Bai.

load more comments (5 replies)

load more comments