this post was submitted on 14 Sep 2025
10 points (100.0% liked)
Technology
571 readers
920 users here now
Share interesting Technology news and links.
Rules:
- No paywalled sites at all.
- News articles has to be recent, not older than 2 weeks (14 days).
- No external video links, only native(.mp4,...etc) links under 5 mins.
- Post only direct links.
To encourage more original sources and keep this space commercial free as much as I could, the following websites are Blacklisted:
- Al Jazeera;
- NBC;
- CNBC;
- Substack;
- Tom's Hardware;
- ZDNet;
- TechSpot;
- Ars Technica;
- Vox Media outlets, with exception for Axios;
- Engadget;
- TechCrunch;
- Gizmodo;
- Futurism;
- PCWorld;
- ComputerWorld;
- Mashable;
- Hackaday;
- WCCFTECH;
- Neowin.
More sites will be added to the blacklist as needed.
Encouraged:
- Archive links in the body of the post.
- Linking to the direct source, instead of linking to an article talking about the source.
Misc:
Relevant Communities:
- Beehaw Technology- Technology Related Discussions.
- lemmy.zip Technology- Hard Tech news.
founded 4 months ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
And the best microscopes can't effectively peel potatoes.
how is that relevant? If an ai model is marketed as multimodal, then reading a clock is one of the things you'd expect they can do. Because it's explicitly marketed as being able to understand images.
LLM don't seem suited for things like OCR.
There is a lot of other edges cases. They don't work well with math, either explicit or even implied calculations. I've had cases where the LLM gave copytext from news releases for a summary figure, instead of actually doing the calculation from the parts provided.
Another massive failure with LLMs was subtitles editing. It's still way faster and easier to use a specialized subtitle editing suite. I honestly couldn't get the LLM to copy the source SRT file "to memory" (to then apply prompts).
They are marketed as "AI" when they are really just giving plausible output (that can often be very helpful) to a prompt.
Are these marketers in this very room now? Do we really have any "AI" that is "explicitly marketed as being able to understand images"? Did you read all the fine text under asterics, if there are really some of such "AI"s?
https://help.openai.com/en/articles/8400551-chatgpt-image-inputs-faq
Now read that FAQ. I see just a bunch of limitations descriptions, not a "I can read and correctly understand 100 percent of the images"
I think there’s a vast difference between “I say I can take in images as input for prompts with limitations “ and “I’m using the wrong tool for a completely absurd use case” like your microscope analogy implies.
LLM is the wrong tool for image analysis, even if the providers say that it is possible. Possibility doesn't mean effectiveness or even usefulness. Like a microscope and onions.
Are you sure about that?
It's been a while since I used a microscope (a bit less than 20 years), but I do remember having to peel and cut up onions to then look at them in the microscope.
Maybe these days there are some microscopes that come with a peeler thing built-in? :)
Yeah, the progress we deserve :)