Technology

76012 readers

2704 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

285

Congress Wants Tech Companies to Pay Up for AI Training Data (www.wired.com)

submitted 2 years ago by stopthatgirl7@kbin.social to c/technology@lemmy.world

38 comments fedilink hide all child comments

At a Senate hearing on AI’s impact on journalism, lawmakers backed media industry calls to make OpenAI and other tech companies pay to license news articles and other data used to train algorithms.

you are viewing a single comment's thread
view the rest of the comments

[–] Grimy@lemmy.world 55 points 2 years ago* (last edited 2 years ago) (15 children)

“What would that even look like?” asks Sarah Kreps, who directs the Tech Policy Institute at Cornell University. “Requiring licensing data will be impractical, favor the big firms like OpenAI and Microsoft that have the resources to pay for these licenses, and create enormous costs for startup AI firms that could diversify the marketplace and guard against hegemonic domination and potential antitrust behavior of the big firms.”

As our economy becomes more and more driven by AI, legislation like this will guarantee Microsoft and Google get to own it.

[–] Motavader@lemmy.world 28 points 2 years ago* (last edited 2 years ago) (7 children)

Yes, and they'll use legislation to pull up the ladder behind them. It's a form of Regulatory Capture, and it will absolutely lock out small players.

But there are open source AI training datasets, but the question is whether LLMs can be trained as accurately with them.

[–] General_Effort@lemmy.world 2 points 2 years ago (2 children)

These open datasets are used to fine-tune LLMs for specific tasks. But first, LLMS have to learn the basics by being trained on vast amounts of text. At present, there is no chance to do that with open source.

If fair use is cut down, you can forget about it. It would arguably be unconstitutional, though.

That's not even considering the dystopian wishes to expand copyright even further. Some people demand that the model owner should also own the output. Well, some of these open datasets are made with LLMs like ChatGPT.

[–] wewbull@iusearchlinux.fyi 0 points 2 years ago (1 children)

If fair use is cut down...

It's not a case of cutting down fair use. It's a case 9f enforcing current fair use limits.

[–] General_Effort@lemmy.world 1 points 2 years ago

Can you give an example of something that is outside fair use?

Just in case, there is confusion here: Obviously there is no past precedent on exactly the new circumstances, but that does not put new technologies outside the law. EG the freedom of speech and the press apply to the internet, even though there is no printing press involved.

load more comments (4 replies)

load more comments (11 replies)