this post was submitted on 10 Jan 2024

925 points (96.5% liked)

Technology

75340 readers

4123 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

925

"Did you realize that we live in a reality where SciHub is illegal, and OpenAI is not?" (fosstodon.org)

submitted 2 years ago by Star@sopuli.xyz to c/technology@lemmy.world

155 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] webghost0101@sopuli.xyz 15 points 2 years ago* (last edited 2 years ago) (4 children)

The point that was being made was that public available data includes a whole lot amount of copyrighted data to begin with and its pretty much impossible to filter it out. Grand example, the Eiffel tower in Paris is not copyright protected, but the lights on it are so you can only using pictures of the Eiffel tower during the day, if the picture itself isn't copyright protected by the original photographer. Copyright law has all these complex caveat and exception that make it impossible to tell in glance whether or not it is protected.

This in turn means, if AI cannot legally train on copyrighted materials it finds online without paying huge sums of money then effectively only mega corporation who can pay copyright fines as cost of business will be able to afford training decent AI.

The only other option to produce any ai of such type is a very narrow curated set of known materials with a public use license but that is not going to get you anything competent on its own.

EDIT: In case it isn't clear i am clarifying what i understood from Grimy@lemmy.world comment, not adding to it.

[–] be_excellent_to_each_other@kbin.social 18 points 2 years ago (1 children)

So then we as a society aren't ready to untangle the mess of our infancy in the digital age. ChatGPT isn't something we must have at all costs, it's something we should have when we can deploy it while still respecting the rights of people who have made the content being used to train it.

[–] assa123@lemmy.world 1 points 2 years ago

I would go even further and say that we should have it until we can be sure it will respect others' rights. All kind of rights, not only Copyright. Unlike Bing at the beginning, with all it's bullying and menaces, or Chatgpt regurgitating private information gathered from God knows where.

The problem with waiting is the arms race with other governments. I feel it's similar to fossil fuels, but all governments need to take the risk of being disadvantaged. Damned prisoner's dilemma.

[–] RainfallSonata@lemmy.world 17 points 2 years ago (4 children)

I didn't want any of this shit. IDGAF if we don't have AI. I'm still not sure the internet actually improved anything, let alone what the benefits of AI are supposed to be.

[–] RememberTheApollo@lemmy.world 5 points 2 years ago (1 children)

It doesn’t matter what you want. What matters is if corporations can extract $ from you, gain an efficiency, or cut their workforce using it.

That’s what the drive for AI is all about.

[–] RainfallSonata@lemmy.world 2 points 2 years ago

No doubt.

[–] webghost0101@sopuli.xyz 4 points 2 years ago

A perfectly valid stance to take.

[–] myslsl@lemmy.world 3 points 2 years ago

Machine learning techniques are often thought of as fancy function approximation tools (i.e. for regression and classification problems). They are tools that receive a set of values and spit out some discrete or possibly continuous prediction value.

One use case is that there are a lot of really hard+important problems within CS that we can't solve efficiently exactly (lookup TSP, SOP, SAT and so on) but that we can solve using heuristics or approximations in reasonable time. Often the accuracy of the heuristic even determines the efficiency of our solution.

Additionally, sometimes we want predictions for other reasons. For example, software that relies on user preference, that predicts home values, that predicts the safety of an engineering plan, that predicts the likelihood that a person has cancer, that predicts the likelihood that an object in a video frame is a human etc.

These tools have legitamite and important use cases it's just that a lot of the hype now is centered around the dumbest possible uses and a bunch of idiots trying to make money regardless of any associated ethical concerns or consequences.

[–] Grimy@lemmy.world -1 points 2 years ago

You don't have to use it. You can even disconnect from the internet completely.

Whats the benefit of stopping me from using it?

[–] TwilightVulpine@lemmy.world 12 points 2 years ago

It's not like all this data was randomly dumped at the AIs. For data sets to serve as good training materials they need contextual information so that the AI can discern patterns and replicate them when prompted.

We see this when you can literally prompt AIs with whose style you want it to emulate. Meaning that the data it was fed had such information.

Midjourney is facing extra backlash from artists after a spreadsheet was leaked containing a list of artist styles their AI was trained on. Meaning they can keep track of it and they trained the AI with those artists' works deliberately. They simply pretend this is impossible to figure out so that they might not be liable to seek permission and compensate the artists whose works were used.

[–] givesomefucks@lemmy.world 5 points 2 years ago (1 children)

That's insane logic...

Like you're essentially saying I can copy/paste any article without a paywall to my own blog and sell adspace on it...

And your still saying OpenAI is trying to make AI companies pay?

Like, do you think AI runs off free cloud services? The hardware is insanely expensive.

And OpenAI is trying to argue the opposite, that AI companies shouldn't have to pay to use copyrighted works.

You have zero idea what is going on, but you are really confident you do

[–] webghost0101@sopuli.xyz 4 points 2 years ago (3 children)

I clarified the comment above which was misunderstood, whether it makes a moral/sane argument is subjective and i am not covering that.

I am not sure why you think there is a claim that openAI is trying to make companies pay, on the contrary the comment i was clarifying (so not my opinion/words) states that openAI is making an argument that anyone should be able to use copyrighted materials for free to train AI.

The costs of running an online service like chatgpt is wildly besides the argument presented. You can run your own open source large language models at home about as well as you can run Bethesda's Starfield on a same spec'd PC

Those Open source large language models are trained on the same collections of data including copyrighted data.

The logic being used here is:

If It becomes globally forbidden to train AI with copyrighted materials or there is a large price or fine in order to use them for training then the Non-Corporate, Free, Open Source Side of AI will perish or have to go underground while to the For-Profit mega corporations will continue exploit and train ai as usual because they can pay to settle in court.

The Ethical dilemma as i understand it is:

Allowing Ai to train for free is a direct threat towards creatives and a win for BigProfit Enthertainment, not allowing it to train to free is treat to public democratic AI and a win for BigTech merging with BigCrime

[–] General_Effort@lemmy.world 4 points 2 years ago (1 children)

Allowing Ai to train for free is a direct threat towards creatives

No. Many creatives fear that AI allows anyone to do what they do, lowering the skill premium they can charge. That doesn't depend on free training.

Some seem to feel that paying for training will delay AI deployment for some years, allowing the good times to continue (until they retire or die?)

But afterward, you have to ask who's paying for the extra cost when AI is a normal tool for creatives? Where does the money come from to pay the rent to property owners? Obviously the general public will pay a part through higher prices. But I think creatives may bear the brunt, because it's the tools of their trade that are more expensive and I don't think all of that cost can be passed on.

[–] webghost0101@sopuli.xyz 1 points 2 years ago

I don't think lowering the skill level is something we will need to worry about as over time this actually trickles up, A Creative professional trained with AI tools will almost always top a Amateur using the same tools.

The real issue is Style. If you are an Artist with a very recognizable specific style, and you make your money trough commissions you are basically screwed. Many Artists feature a personal style and while borrowing peoples style is common (disney-esque) it's usually not a problem because within a unique and diverse human mind it rarely results in unintentional latent copying.

[–] Grimy@lemmy.world -1 points 2 years ago

That is very well put, I really wish I could have started with that.

Though I envision it as a loss for BigProfit Enthertainment since I see this as a real boon for the indie gaming, animation and eventually filmmaking industry.

It's definitely overall quite a messy situation.

[–] givesomefucks@lemmy.world -3 points 2 years ago* (last edited 2 years ago) (2 children)

You can run your own open source large language models at home about as well as you can run Bethesda’s Starfield on a same spec’d PC

...

Yes, you can download an executable of a chatbot lol.

That's different than running something remotely like even OpenAI.

The more it has to reference, the more the system scales up. Not just storage, but everything else.

Like, in your example of video games it would be more like stripping down a PS5 game of all the assets, then playing it on a NES at 1 frame per five minutes.

You're not only wildly overestimating chatbots ability, you're doing that while drastically underestimating the resources needed.

Edit:

I think you literally don't know what people are talking about..

Do you think people are talking about AI image generators?

No one else is...

[–] Auzymundius@lemmy.world 4 points 2 years ago (1 children)

I think you're confusing training it with running it. After it's trained, you can run it on much weaker hardware.

[–] givesomefucks@lemmy.world -4 points 2 years ago

The issue is it reproducing copyrighted works verbatim...

It can't do that unless it contains the entire text to begin with...

[–] webghost0101@sopuli.xyz 1 points 2 years ago* (last edited 2 years ago)

I am talking about generative AI, be it text or image both have a challenge with copyrighted material.

"executable of a chatbot" lol, aint you cute

"example of video games"

Are you refering to my joke?

I am far from overestimating capacity, Starfield runs mediocre on a modern gaming system compared to other games. The Vicuna 13b llm runs mediocre on the same system compared with gpt 3.5. To this date there is no local model that i would trust for professional use and chatgpt 3.5 doesnt hit that level either.

But it remains a very interesting, rapidly evolving technology that i hope receives as much future open source support as possible.

"I think you literally don’t know what people are talking about" I hate to break it to you but you're embarrassing yourself.

I presume you must believe the the following lemmy community and resources to be typed up by a group of children, either that or your just naive.

https://lemmy.world/c/fosai

https://www.fosai.xyz/

https://github.com/huggingface/transformers

https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

https://huggingface.co/microsoft/phi-2 & https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/

https://www.theguardian.com/technology/2023/may/05/google-engineer-open-source-technology-ai-openai-chatgpt