128

"Not all AI content is spam, but I think right now all spam is AI content." (www.theregister.com)

submitted 7 months ago by dgerard@awful.systems to c/techtakes@awful.systems

25 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] ebu@awful.systems 24 points 7 months ago

correlation? between the rise in popularity of tools that exclusively generates bullshit en masse and the huge swelling in volume of bullshit on the Internet? it's more likely than you think

it is a little funny to me that they're taking about using AI to detect AI garbage as a mechanism of preventing the sort of model/data collapse that happens when data sets start to become poisoned with AI content. because it seems reasonable to me that if you start feeding your spam-or-real classification data back into the spam-detection model, you'd wind up with exactly the same degredations of classification and your model might start calling every article that has a sentence starting with "Certainly," a machine-generated one. maybe they're careful to only use human-curated sets of real and spam content, maybe not

it's also funny how nakedly straightforward the business proposition for SEO spamming is, compared to literally any other use case for "AI". you pay $X to use this tool, you generate Y articles which reach the top of Google results, you generate $(X+P) in click revenue and you do it again. meanwhile "real" business are trying to gauge exactly what single digit percent of bullshit they can afford to get away with putting in their support systems or codebases while trying to avoid situations like being forced to give refunds to customers under a policy your chatbot hallucinated (archive.org link) or having to issue an apology for generating racially diverse Nazis (archive).

[-] theneverfox@pawb.social -4 points 7 months ago

it is a little funny to me that they're taking about using AI to detect AI garbage as a mechanism of preventing the sort of model/data collapse that happens when data sets start to become poisoned with AI content. because it seems reasonable to me that if you start feeding your spam-or-real classification data back into the spam-detection model, you'd wind up with exactly the same degredations of classification and your model might start calling every article that has a sentence starting with "Certainly," a machine-generated one. maybe they're careful to only use human-curated sets of real and spam content, maybe not

Ultimately, LLMs don't use words, they use tokens. Tokens aren't just words - they're nodes in a high-dimensional graph... Their location and connections in information space is data invisible to humans.

LLM responses are basically paths through the token space, they may or may not overuse certain words, but they'll have a bias towards using certain words together

So I don't think this is impossible... Humans struggle to grasp these kinds of hidden relationships (consciously at least), but neural networks are good at that kind of thing

I too think it's funny/sad how AI is being used... It's good at generation, that's why we call it generative AI. It's incredibly useful to generate all sorts of content when paired with a skilled human, it's insane to expect common sense out of something easier to gaslight than a toddler. It can handle the tedious details while a skilled human drives it and validates the output

The biggest, if rarely used, use case is education - they're an infinitely patient tutor that can explain things in many ways and give you endless examples. Everyone has different learning styles - you could so easily take an existing lesson and create more concrete or abstract versions, versions for people who need long explanations and ones for people who learn through application

[-] sc_griffith@awful.systems 9 points 7 months ago

nodes in a high-dimensional graph

for people without a technical background: this is gibberish

[-] ebu@awful.systems 6 points 7 months ago

at least if it was "vectors in a high-dimensional space" it would be like. at least a little bit accurate to the internals of llm's. (still an entirely irrelevant implementation detail that adds noise to the conversation, but accurate.)

load more comments (1 replies)

load more comments (22 replies)

load more comments (23 replies)

this post was submitted on 14 Apr 2024

128 points (100.0% liked)

TechTakes

1403 readers

67 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 1 year ago

MODERATORS

dgerard@awful.systems