Technology

75201 readers

3476 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

112

These Mini AI Models Match OpenAI With 1,000 Times Less Data. (singularityhub.com)

submitted 11 months ago by King@r.nf to c/technology@lemmy.world

8 comments fedilink hide all child comments

top 8 comments

sorted by: hot top controversial new old

[–] remotelove@lemmy.ca 68 points 11 months ago

This kind of skill might help developers build AI agents that identify buttons or fields on a webpage ~~to handle tasks like making a reservation at a restaurant.~~

... to improve efficiency of click farms and to bypass captchas.

[–] simple@lemm.ee 41 points 11 months ago* (last edited 11 months ago)

This reads like an ad. They claim to use 1000 times less data than proprietary models, except nobody knows how much data they use or how big proprietary models actually are. Also there's a giant asterisk here they fail to mention: Molmo outperforms the competition at visual benchmarks, not actual text chat.

[–] pennomi@lemmy.world 14 points 11 months ago

Daaaang, Apache license AND open dataset + training tools.

[–] homoludens@feddit.org 11 points 11 months ago (1 children)

but an order of magnitude smaller

I'm pretty sure that would be three orders of magnitude.

[–] FaceDeer@fedia.io 16 points 11 months ago (1 children)

They're not talking about the same thing.

Last week, researchers at the Allen Institute for Artificial Intelligence (Ai2) released a new family of open-source multimodal models competitive with state-of-the-art models like OpenAI’s GPT-4o—but an order of magnitude smaller.

That's in reference to the size of the model itself.

They then compiled a more focused, higher quality dataset of around 700,000 images and 1.3 million captions to train new models with visual capabilities. That may sound like a lot, but it’s on the order of 1,000 times less data than what’s used in proprietary multimodal models.

That's in reference to the size of the training data that was used to train the model.

Minimizing both of those things is useful, but for different reasons. Smaller training sets make the model cheaper to train, and a smaller model makes the model cheaper to run.

[–] General_Effort@lemmy.world 1 points 11 months ago

After a quick skim, seems like the article has lots of errors. Molmo is trained on top of Qwen. The smallest ones are trained on something by the same company as Molmo.

[–] lunarul@lemmy.world 11 points 11 months ago

Instead of writing captions, the team asked annotators to record 60- to 90-second verbal descriptions answering a list of questions about each image. They then transcribed the descriptions—which often stretched across several pages—and used other large language models to clean up, crunch down, and standardize them.

So those other LLMs are needed to train this one?

[–] chemical_cutthroat@lemmy.world 1 points 11 months ago

And a modern calculator has more computer power than the Apollo program... This is how tech works.