351
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/urgettingtallpip on 2024-09-30 22:33:03+00:00.


has anybody heard of new flux controlnets being trained/coming out soon? the current ones released by Xlabs and instantX feel mediocre at best.

352
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/jenza1 on 2024-09-30 15:07:47+00:00.

353
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Angrypenguinpng on 2024-09-30 21:58:03+00:00.

354
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Striking-Long-2960 on 2024-09-30 19:51:34+00:00.


New versions of CogVideoX-Fun 5B and 2B have been released. Including a new model that I believe it's thought for animating humans.

  • Retrain the i2v model and add noise to increase the motion amplitude of the video. Upload the control model training code and control model. [ 2024.09.29 ]

5B

2B

The custom node for comfyUI Cogvdeoxwrapper has an initial support for these new models.

355
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/idunno63 on 2024-09-30 15:47:13+00:00.

356
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/theroom_ai on 2024-09-30 12:28:23+00:00.

357
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/woadwarrior on 2024-09-30 13:39:26+00:00.

358
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/gpahul on 2024-09-30 12:39:16+00:00.

359
1
Flux-Ring Light (Lora) (www.reddit.com)
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Halodri88 on 2024-09-30 11:23:12+00:00.

360
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/hackerzcity on 2024-09-30 04:39:03+00:00.


This model has been trained on lots of artificially damaged images—things like noise, blurriness, or compression. And it learns from those bad images and can turn your blurry pictures into clearer ones.

361
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/an303042 on 2024-09-30 08:31:46+00:00.

362
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Jonfreakr on 2024-09-30 08:17:15+00:00.

363
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/ninjasaid13 on 2024-09-30 05:21:42+00:00.


Paper: (pdf link is broken for some reason)

Project Page:

Code:

Model: (Apache License for all models) and the vision tokenizer

Disclaimer: I am not the author.

Overview

While next-token prediction is considered a promising path towards AGI, it has struggled to excel in multimodal tasks, which are still dominated by diffusion models (e.g., Stable Diffusion) and compositional approaches (e.g., CLIP combined with LLMs). In this work, we introduce Emu3, a new suite of state-of-the-art multimodal models trained solely with next-token prediction. By tokenizing images, text, and videos into a discrete space, we train a single transformer from scratch on a mixture of multimodal sequences.

Examples

They introduce Emu3, a new suite of state-of-the-art multimodal models trained solely with next-token prediction. They introduce Emu3, a new suite of state-of-the-art multimodal models trained solely with next-token prediction! By tokenizing images, text, and videos into a discrete space, they train a single transformer from scratch on a mixture of multimodal sequences.

Emu3 excels in both generation and perception

Emu3 outperforms several well-established task-specific models in both generation and perception tasks, surpassing flagship open models such as SDXL, LLaVA-1.6 and OpenSora-1.2, while eliminating the need for diffusion or compositional architectures.

! By tokenizing images, text, and videos into a discrete space, they train a single transformer from scratch on a mixture of multimodal sequences.

Emu3 excels in both generation and perception

Emu3 outperforms several well-established task-specific models in both generation and perception tasks, surpassing flagship open models such as SDXL, LLaVA-1.6 and OpenSora-1.2, while eliminating the need for diffusion or compositional architectures.

Video Generation

Emu3 is capable of generating videos. Unlike Sora which employs a video diffusion model to generate the video from noise, Emu3 simply generates a video causally by predicting the next token in a video sequence.

Video Prediction

With a video in context, Emu3 can naturally extend the video and predict what will happen next. The model can simulate some aspects of the environment, people and animals in the physical world.

Vision-Language Understanding

Emu3 demonstrates strong perception capabilities to understand the physical world and provides coherent text responses. Notably, this capability is achieved without depending on a CLIP and a pretrained LLM.

364
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Pretend_Potential on 2024-09-30 02:52:22+00:00.


Just going to post the link to the news article rather than quote the entire article.

365
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/stupidxthrowaway on 2024-09-30 01:05:21+00:00.

366
1
Punk generations (i.redd.it)
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/myf4pacc0unt on 2024-09-29 21:06:37+00:00.

367
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/nootropicMan on 2024-09-29 21:47:40+00:00.

368
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Devajyoti1231 on 2024-09-29 08:47:42+00:00.

369
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/os75 on 2024-09-29 02:53:10+00:00.


Hey guys, So I've been working on this thing I'm calling lorakit. It's just a little toolkit I threw together for training SDXL LoRA models. It is heavily based on DreamBooth from AutoTrain but with similar configuration style as ai-toolkit. Nothing fancy, but it's been pretty handy for quick experiments and prototyping. Thought some of you might wanna check it out:

370
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/smusamashah on 2024-09-29 13:06:03+00:00.

371
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/stbl_reel on 2024-09-29 12:15:43+00:00.

372
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/reditor_13 on 2024-09-29 08:49:35+00:00.

373
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/rolux on 2024-09-29 08:16:48+00:00.

374
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/kozakfull2 on 2024-09-29 00:02:25+00:00.

375
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/ryanontheinside on 2024-09-28 21:51:32+00:00.

view more: ‹ prev next ›

StableDiffusion

98 readers
1 users here now

/r/StableDiffusion is back open after the protest of Reddit killing open API access, which will bankrupt app developers, hamper moderation, and...

founded 1 year ago
MODERATORS