Qwen

50 readers

1 users here now

A community all about the Qwens! (LLMs, VLMs, WANs...)

Here their blog page and their free chat interface

Post are allowed to have any format.

It is advised to put "Qwen" into the title somewhere.

Da Rules

please be nice <3 🧸
no bigotry or general evil-doings please! 💖
no politics 🌏❌
please don't make me add more rules <3

founded 2 weeks ago

MODERATORS

Smorty@lemmy.blahaj.zone

new qwen architecture? :o (lemmy.blahaj.zone)

submitted 1 week ago* (last edited 1 week ago) by Smorty@lemmy.blahaj.zone to c/qwen@lemmy.blahaj.zone

3 comments fedilink hide all child comments

i jus wanted to get dis outta my system >v< ...

i dun like those boring linear model structures... they work... bt they dun look fun, nor intuitive. they jus produce output... which is boring!

pls, if some researcher with lotsa gpus sees this, maybsies try this kinda architecture... u dont evn have to credit me, just try it out n see where it goes ~ ~ ~

you are viewing a single comment's thread
view the rest of the comments

[–] pixxelkick@lemmy.world 5 points 1 week ago (1 children)

They do, that's called a Recurrant model.

And Recurrance is critical for a model to have "memory" of past inputs.

It was one of the key advancements awhile back in data processing for predictive systems, IE speech recognition.

Recurrence is pretty standard now in most neural networks, linear networks are your most basic ones that are mostly just used to demonstrate the basic 101 concepts of ML, they don't have a tonne of practical IRL uses aside from some forms of very basic image processing and stuff like that. Filter functions and etc.

[–] Smorty@lemmy.blahaj.zone 1 points 1 week ago (1 children)

yeaaaa you're right.. i was referring specifically to LLMs, but yes, recurrent models are essentially everywhere else.

i am just surprised we don't have many LLMs with recurrent blocks in them, like this model here did for example. i really hope we go that direction soon...

[–] pixxelkick@lemmy.world 4 points 1 week ago

Afaik all LLMs have very derp recurrance, as that's what provides their context window size.

The more recurrant params they have, the more context window they can store.