Technology

39799 readers

147 users here now

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.

Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.

Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

founded 6 years ago

MODERATORS

MinutePhrase@lemmy.ml

134

Oops! We Automated Bullshit. | Department of Computer Science and Technology (www.cst.cam.ac.uk)

submitted 2 years ago by Masimatutu@mander.xyz to c/technology@lemmy.ml

51 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] huginn@feddit.it 10 points 2 years ago (1 children)

I'm a few months out of date in the latest in the field and I know it's changing quickly. What progress has been made towards solving hallucinations? The feeding output into another LLM for evaluation never seemed like a tenable solution to me.

[–] mkhoury@lemmy.ca 6 points 2 years ago (2 children)

Essentially, you don't ask them to use their internal knowledge. In fact, you explicitly ask them not to. The technique is generally referred to as Retrieval Augmented Generation. You take the context/user input and you retrieve relevant information from the net/your DB/vector DB/whatever, and you give it to an LLM with how to transform this information (summarize, answer a question, etc).

So you try as much as you can to "ground" the LLM with knowledge that you trust, and to only use this information to perform the task.

So you get a system that can do a really good job at transforming the data you have into the right shape for the task(s) you need to perform, without requiring your LLM to act as a source of information, only a great data massager.

[–] sudoreboot@slrpnk.net 5 points 2 years ago (1 children)

That seems like it should work in theory, but having used Perplexity for a while now, it doesn't quite solve the problem.

The biggest fundamental problem is that it doesn't understand in any meaningful capacity what it is saying. It can try to restate something it sourced from a real website, but because it doesn't understand the content it doesn't always preserve the essence of what the source said. It will also frequently repeat or contradict itself in as little as two paragraphs based on two sources without acknowledging it, which further confirms the severe lack of understanding. No amount of grounding can overcome this.

Then there is the problem of how LLMs don't understand negation. You can't reliably reason with it using negated statements. You also can't ask it to tell you about things that do not have a particular property. It can't filter based on statements like "the first game in the series, not the sequel", or "Game, not Game II: Sequel" (however you put it, you will often get results pertaining to the sequel snucked in).

[–] BluesF@feddit.uk 2 points 2 years ago

Yeah, it's just back exactly to the problem the article points out - refined bullshit is still bullshit. You still need to teach your LLM how to talk, so it still needs that cast bullshit input into its "base" before you feed it the "grounding" or whatever... And since it doesn't actually understand any of that grounding it's just yet more bullshit.

[–] huginn@feddit.it 4 points 2 years ago

Definitely a good use for the tool: NLP is what LLMs do best and pinning down the inputs to only be rewording or compressing ground truth avoids hallucination.

I expect you could use a much smaller model than gpt to do that though. Even llama might be overkill depending on how tightly scoped your DB is