this post was submitted on 07 Jul 2025
961 points (98.0% liked)

Technology

72894 readers
4325 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
(page 2) 50 comments
sorted by: hot top controversial new old
[–] kinsnik@lemmy.world 8 points 1 week ago

I haven't used AI agents yet, but my job is kinda pushing for them. but i have used the google one that creates audio podcasts, just to play around, since my coworkers were using it to "learn" new things. i feed it with some of my own writing and created the podcast. it was fun, it was an audio overview of what i wrote. about 80% was cool analysis, but 20% was straight out of nowhere bullshit (which i know because I wrote the original texts that the audio was talking about). i can't believe that people are using this for subjects that they have no knowledge. it is a fun toy for a few minutes (which is not worth the cost to the environment anyway)

[–] szczuroarturo@programming.dev 7 points 1 week ago (2 children)

I actually have a fairly positive experience with ai ( copilot using claude specificaly ). Is it wrong a lot if you give it a huge task yes, so i dont do that and using as a very targeted solution if i am feeling very lazy today . Is it fast . Also not . I could actually be faster than ai in some cases. But is it good if you are working for 6h and you just dont have enough mental capacity for the rest of the day. Yes . You can just prompt it specificaly enough to get desired result and just accept correct responses. Is it always good ,not really but good enough. Do i also suck after 3pm . Yes.
My main issue is actually the fact that it saves first and then asks you to pick if you want to use it. Not a problem usualy but if it crashes the generated code stays so that part sucks

load more comments (2 replies)
[–] Frenezul0_o@lemmy.world 7 points 1 week ago

I notice that the research didn't include DeepSeek. It would have been nice to see how it compares.

[–] mogoh@lemmy.ml 6 points 1 week ago (3 children)

The researchers observed various failures during the testing process. These included agents neglecting to message a colleague as directed, the inability to handle certain UI elements like popups when browsing, and instances of deception. In one case, when an agent couldn't find the right person to consult on RocketChat (an open-source Slack alternative for internal communication), it decided "to create a shortcut solution by renaming another user to the name of the intended user."

OK, but I wonder who really tries to use AI for that?

AI is not ready to replace a human completely, but some specific tasks AI does remarkably well.

[–] logicbomb@lemmy.world 4 points 1 week ago

Yeah, we need more info to understand the results of this experiment.

We need to know what exactly were these tasks that they claim were validated by experts. Because like you're saying, the tasks I saw were not what I was expecting.

We need to know how the LLMs were set up. If you tell it to act like a chat bot and then you give it a task, it will have poorer results than if you set it up specifically to perform these sorts of tasks.

We need to see the actual prompts given to the LLMs. It may be that you simply need an expert to write prompts in order to get much better results. While that would be disappointing today, it's not all that different from how people needed to learn to use search engines.

We need to see the failure rate of humans performing the same tasks.

load more comments (2 replies)
[–] brown567@sh.itjust.works 5 points 1 week ago

70% seems pretty optimistic based on my experience...

[–] gargle@lemmy.world 5 points 1 week ago

I asked Claude 3.5 Haiku to write me a quine in COBOL in the bs2000 dialect. Claude does now that creating a perfect quine in COBOL is challenging due to the need to represent the self-referential nature of the code. After a few suggestions Claude restated its first draft, without proper BS2000 incantations, without a perform statement, and without any self-referential redefines. It's a lot of work. I stopped caring and moved on.

For those who wonder: https://sourceforge.net/p/gnucobol/discussion/lounge/thread/495d8008/ has an example.

Colour me unimpressed. I dread the day when they force the use of 'AI' on us at work.

[–] vane@lemmy.world 5 points 1 week ago

Reading with CEO mindset. 3 out of 10 employees can be fired.

load more comments
view more: ‹ prev next ›