Technology

75272 readers

5039 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

188

Recent AI failures are cracks in the magic (www.theintrinsicperspective.com)

submitted 2 years ago by boem@lemmy.world to c/technology@lemmy.world

109 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] wewbull@feddit.uk 4 points 2 years ago (2 children)

I don't know much about LLMs but latent diffusion models already have "meaning" encoded into the model. The whole concept of the u-net is that as it reduces the spacial resolution of the image, it increases the semantic resolution by adding extra dimensions of information. It came from medical image analysis where the idea of labelling something as a tumor would be really useful.

This is why you get body dysmorphic results on earlier (and even current) models. It's identified something as a human limb, but isn't quite sure on where the hand is, so it adds one on to what we know is a leg.

[–] lvxferre@mander.xyz 1 points 2 years ago* (last edited 2 years ago)

That's perhaps why image generators are comparatively better than text generators. But there's still something off, by your example it seems that the model cannot reliably use clues like position to understand "this is a «leg»". And I don't know much about image generators but I think that they're still statistics- and probability-based.

[–] FaceDeer@kbin.social 1 points 2 years ago

There was an interesting paper published just recently titled Generative Models: What do they know? Do they know things? Let's find out! (a lot of fun names and titles in the AI field these days :) ) That does a lot of work in actually analyzing what an AI image generator "knows" about what they're depicting. They seem to have an awareness of three dimensional space, of light and shadow and reflectivity, lots of things you wouldn't necessarily expect from something trained just on 2-D images tagged with a few short descriptive sentences. This article from a few months ago also delved into this, it showed that when you ask a generative AI to create a picture of a physical object the first thing the AI does is come up with the three-dimensional shape of the scene before it starts figuring out what it looks like. Quite interesting stuff.