lvxferre

joined 4 years ago
MODERATOR OF
[–] lvxferre@lemmy.ml 20 points 2 years ago (6 children)

I think that the RHEL example is out-of-place, since IBM ("Red Hat") is clearly exploiting a loophole of the GNU Public License. Similar loopholes have been later addressed by e.g. the AGPL and the GPLv3*, so I expect this one to be addressed too.

So perhaps, if the GPL is "not enough", the solution might be more GPL.

*note that the license used by the kernel is GPLv2. Cue to Android (for all intents and purposes non-free software) using the kernel, but not the rest.

[–] lvxferre@lemmy.ml 0 points 2 years ago* (last edited 2 years ago) (5 children)

This is exactly what is already happening with Scots and American English.

If American media were to affect so much Anglic varieties spoken in Scotland, you'd expect SSE (Scottish Standard English) to increase in rhoticity due to said influence. And yet the opposite is happening.

Granted, the example is from SSE, not braid Scots; it works nice here though, since any potential non-British pressure would affect SSE first, then Scots.

The example also shows which Anglic varieties are threatening Scots: RP and its spiritual successor Southern Standard British English, both non-rhotic. That's because mere exposition towards another variety is not enough to trigger variety shift, you need some sort of [soft or hard] power over the speakers. Such as attacking their local identity to sell them an alternative one (governments love to do this shit).

Now, back into the hypothetical "international English": what pressure do you think that a hypothetical standard built upon the speech of L2 English speakers, mostly in continental Europe, would have towards Scots? I don't think that it would; at most you'd get some entitled corporation drone from London or New York screeching that "learning a dialect for international communication is too hard!" (i.e. a fraction of what others already do.)

Also note that the basic idea ("you aren't supposed to speak this natively") isn't too far off from how Esperantists promote Esperanto, except that it's towards a dialect instead of a full-fledged constructed language.

There is a world of difference between [spontaneous consensus between members of a particular culture or ethnic group]#1 and [the top-down enforcement (as in the case of Scots speakers being physically punished in school for speaking their native language)]#2 or [promotion of a specific dialect]#3.

I numbered them for convenience. #1 is a specific case of #3, and rather close to my "hot take" proposal. Nobody is proposing #2, this sort of Vergonha style linguicide is inhumane.

[I'd also like to reinforce that the idea is a "hot take". As in, I knew that it would be contentious, and I'm not exactly sure myself if it would be the best approach.]

[–] lvxferre@lemmy.ml 3 points 2 years ago (7 children)

I think this has a lot to do with: [poor American education + political climate]

I'm not sure if this is true, but it does sound reasonable. Specially because this sort of defensiveness tends to create a habit, so the person might behave in the same way even towards other subjects.

Absolutely not. [...]

I don't think that some sort of "international English" would be a threat to the local varieties that you mentioned; the threat is usually the variety backed up by the "upper caste", either implicitly or explicitly. So for example, varieties in UK would be still threatened by RP and SSB, things wouldn't get better for them but not worse either.

Linguistic prescriptivism is not the answer here.

I get where you're coming from but note that some prescription will be always there. Not prescribing anything at all means implicit agreement with the prescriptions already in place, in this case the usage of RP and GA as standards.

[–] lvxferre@lemmy.ml 1 points 2 years ago* (last edited 2 years ago)

Now I get it. And yes, now I agree with you; it would give them a bit more merit to claim that the data being used in the input was obtained illegally. (Unless Meta has right of use to ThePile.)

The link does not mention GPT (OpenAI, Microsoft) or LaMDA/Bard (Google, Alphabet), but if Meta is doing it odds are that the others are doing it too.

Sadly this would be up to the copyright holders of this data. It does not apply to NYT content that you can freely access online, for NYT it got to be about the output, not the input.

[–] lvxferre@lemmy.ml 4 points 2 years ago (2 children)

I don't think that the content was illegally obtained, unless there's some law out there that prevents you from using a crawler to automatically retrieve text. And probably there's no law against using data intended for human consumption to feed AI.

As such, I might be wrong but I think that the only issue is that this data is being used to output derivative works, and acc. to NYT in some instances "can generate output that recites Times content verbatim, closely summarizes it, and mimics its expressive style."

[–] lvxferre@lemmy.ml 7 points 2 years ago

Communication is a 2-way street. By definition you can’t blame it on one person.

Even if communication works two ways, sometimes you can blame it on a single person. Shitty drawing time:

I'm representing the people as houses, and the bits of info as vehicles. "A" sent "B" a problematic bit (the orange car), and everything stopped - because once those cars/bits of info won't be able to circulate further between "A" and "B", even if "B" did nothing wrong.

...that said I agree with the core of your text, I think that it's reasonable.

[–] lvxferre@lemmy.ml 7 points 2 years ago (3 children)

The fun part is that the word is an abstract concept inside your head, not in the text. They're removing those spaces from "a lot", "as well", "no one" etc. because they're already functionally words for those speakers.

[–] lvxferre@lemmy.ml 10 points 2 years ago* (last edited 2 years ago) (9 children)

I think that there are three other potential components here. They're hypothetical as I don't have data to back me up, so take them with a grain of salt. Still, I think that they're worth sharing:

  1. Potential selection bias. The comparison being made is between highly educated L2+ speakers with an education-wise mixed bag of native speakers.
  2. Multilingualism potentially improving communication skills. Perhaps the very fact that you speak 2+ languages allows you to express yourself better in all of them.
  3. Something in English itself, on a pragmatic level, might lead to poor communication. On the internet I've seen all the fucking time native English speakers fighting to understand each other, in a way that I see neither Portuguese (L1) or Italian (L2) speakers doing... it's like they're really eager to rush towards conclusions and assume words onto the others' mouths, almost always violating Gricean maxims. (Poor sampling, I know.)

I can go further on any of those if you want. Of course, they take the premise from the text as true, but it should be actually tested.

Learning these things is a part of learning a language, and this article also applies to native speakers of any language.

Hot take: or perhaps the burden should be put on the native speakers instead. This can be achieved by detaching what's to be considered "proper international English" from their native dialects. (That's part of what your TL;DR misses from the text, as it leans towards the same conclusion.)

[–] lvxferre@lemmy.ml 7 points 2 years ago

That sounds reasonable.

[–] lvxferre@lemmy.ml 6 points 2 years ago

2 seconds later someone can train a new one

"Training" datasets:

Does this look like the amount of content that you'd get in two seconds???

Maybe they should learn to code like those coal miners they pitied.

And maybe you should go back to Reddit.

[–] lvxferre@lemmy.ml 20 points 2 years ago (6 children)

Threads like this are why I discuss this shit in Lemmy, not in HN itself. The idiocy in the comments there is facepalm-worthy.

Plenty users there are trapping themselves in the "learning" metaphor, as if LLMs were actually "learning" shit like humans would. It's a fucking tool dammit, and it is being legally treated as such.

The legal matter here boils down to: OpenAI is picking content online, feeding it into a tool, the tool transforms it into derivative content, and the derivative content is serviced to users. Is the transformation deep enough to make said usage go past copyright? A: nobody decided yet.

[–] lvxferre@lemmy.ml 7 points 2 years ago (1 children)

Classical Physics breaks in three situations: if it's too fast, too massive, or too tiny. To address that, new theories appeared. Among them:

  • special relativity - handles fast stuff
  • general relativity - handles fast and massive stuff
  • quantum mechanics - handles tiny stuff
  • quantum field theory - handles tiny and fast stuff

What researchers are looking for is a theory that is able to handle all three things at the same time, superseding both the relativities and the quantum theories. That's the Theory of Everything that everyone is looking for. (Except me. I'm looking for my cup of coffee.)

And most people look for it in a specific way: they try to adapt relativity to quantum phenomena. Those researchers however are doing something different: they're imposing a limit on the quantum theories, saying that they break under specific situations, because spacetime would work more like in classical physics than like in quantum mechanics - in other words that quantum theories need to be fixed to relativity, not the opposite.

The researchers then devised a stupidly simple experiment to test their hypothesis out.

view more: ‹ prev next ›