Better Flux ControlNets? (old.reddit.com)

submitted 3 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/urgettingtallpip on 2024-09-30 22:33:03+00:00.

has anybody heard of new flux controlnets being trained/coming out soon? the current ones released by Xlabs and instantX feel mediocre at best.

352

1

Shepard Fairey Style LoRA [FLUX] (www.reddit.com)

submitted 3 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/jenza1 on 2024-09-30 15:07:47+00:00.

353

1

Flux [dev] with ControlNets is awesome. (old.reddit.com)

submitted 3 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Angrypenguinpng on 2024-09-30 21:58:03+00:00.

354

1

CogVideoX-Fun-V1.1 (Including versions for Pose) (old.reddit.com)

submitted 3 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Striking-Long-2960 on 2024-09-30 19:51:34+00:00.

New versions of CogVideoX-Fun 5B and 2B have been released. Including a new model that I believe it's thought for animating humans.

Retrain the i2v model and add noise to increase the motion amplitude of the video. Upload the control model training code and control model. [ 2024.09.29 ]

5B

2B

The custom node for comfyUI Cogvdeoxwrapper has an initial support for these new models.

355

1

An img2img recreation of a screenshot from a cutscene from Halo 3 with Flux (www.reddit.com)

submitted 3 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/idunno63 on 2024-09-30 15:47:13+00:00.

356

1

Dr. Farnsworth from Futurama (Flux) (www.reddit.com)

submitted 3 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/theroom_ai on 2024-09-30 12:28:23+00:00.

357

1

New Apache 2.0 licensed small diffusion models: CogView3 and CogView-3 Plus (github.com)

submitted 3 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/woadwarrior on 2024-09-30 13:39:26+00:00.

358

1

How to generate videos like this? (old.reddit.com)

submitted 3 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/gpahul on 2024-09-30 12:39:16+00:00.

359

1

Flux-Ring Light (Lora) (www.reddit.com)

submitted 3 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Halodri88 on 2024-09-30 11:23:12+00:00.

360

1

FLUX.1-dev ControlNet Upscaler (old.reddit.com)

submitted 3 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/hackerzcity on 2024-09-30 04:39:03+00:00.

This model has been trained on lots of artificially damaged images—things like noise, blurriness, or compression. And it learns from those bad images and can turn your blurry pictures into clearer ones.

361

1

Trained a Groovy Psychedelic 70s style LoRA! Hope you dig it ☮️🎨 – Time to get far out with vibrant colors and trippy vibes with "PsyPop70 🌈🌀✨" (www.reddit.com)

submitted 3 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/an303042 on 2024-09-30 08:31:46+00:00.

362

1

Should I stay or should I go (i.redd.it)

submitted 3 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Jonfreakr on 2024-09-30 08:17:15+00:00.

363

1

Emu3: Next-Token Prediction is All You Need (old.reddit.com)

submitted 3 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/ninjasaid13 on 2024-09-30 05:21:42+00:00.

Paper: (pdf link is broken for some reason)

Project Page:

Code:

Model: (Apache License for all models) and the vision tokenizer

Disclaimer: I am not the author.

Overview

While next-token prediction is considered a promising path towards AGI, it has struggled to excel in multimodal tasks, which are still dominated by diffusion models (e.g., Stable Diffusion) and compositional approaches (e.g., CLIP combined with LLMs). In this work, we introduce Emu3, a new suite of state-of-the-art multimodal models trained solely with next-token prediction. By tokenizing images, text, and videos into a discrete space, we train a single transformer from scratch on a mixture of multimodal sequences.

Examples

They introduce Emu3, a new suite of state-of-the-art multimodal models trained solely with next-token prediction. They introduce Emu3, a new suite of state-of-the-art multimodal models trained solely with next-token prediction! By tokenizing images, text, and videos into a discrete space, they train a single transformer from scratch on a mixture of multimodal sequences.

Emu3 excels in both generation and perception

Emu3 outperforms several well-established task-specific models in both generation and perception tasks, surpassing flagship open models such as SDXL, LLaVA-1.6 and OpenSora-1.2, while eliminating the need for diffusion or compositional architectures.

! By tokenizing images, text, and videos into a discrete space, they train a single transformer from scratch on a mixture of multimodal sequences.

Emu3 excels in both generation and perception

Emu3 outperforms several well-established task-specific models in both generation and perception tasks, surpassing flagship open models such as SDXL, LLaVA-1.6 and OpenSora-1.2, while eliminating the need for diffusion or compositional architectures.

Video Generation

Emu3 is capable of generating videos. Unlike Sora which employs a video diffusion model to generate the video from noise, Emu3 simply generates a video causally by predicting the next token in a video sequence.

Video Prediction

With a video in context, Emu3 can naturally extend the video and predict what will happen next. The model can simulate some aspects of the environment, people and animals in the physical world.

Vision-Language Understanding

Emu3 demonstrates strong perception capabilities to understand the physical world and provides coherent text responses. Notably, this capability is achieved without depending on a CLIP and a pretrained LLM.

364

1

California governor vetos bill SB-1047 (old.reddit.com)

submitted 3 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Pretend_Potential on 2024-09-30 02:52:22+00:00.

Just going to post the link to the news article rather than quote the entire article.

365

1

What model would I need to create images like this? (www.reddit.com)

submitted 3 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/stupidxthrowaway on 2024-09-30 01:05:21+00:00.

366

1

Punk generations (i.redd.it)

submitted 3 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/myf4pacc0unt on 2024-09-29 21:06:37+00:00.

367

1

FLUX Sci-Fi Enhance Upscale (www.reddit.com)

submitted 3 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/nootropicMan on 2024-09-29 21:47:40+00:00.

368

1

Yamato-e style Flux lora (www.reddit.com)

submitted 3 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Devajyoti1231 on 2024-09-29 08:47:42+00:00.

369

1

lorakit: A Simple Toolkit for Rapid Prototyping SDXL LoRA Models (old.reddit.com)

submitted 3 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/os75 on 2024-09-29 02:53:10+00:00.

Hey guys, So I've been working on this thing I'm calling lorakit. It's just a little toolkit I threw together for training SDXL LoRA models. It is heavily based on DreamBooth from AutoTrain but with similar configuration style as ai-toolkit. Nothing fancy, but it's been pretty handy for quick experiments and prototyping. Thought some of you might wanna check it out:

370

1

How do I make realistic animals like this in Flux? (www.reddit.com)

submitted 4 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/smusamashah on 2024-09-29 13:06:03+00:00.

371

1

Minecraft for nothing (AD unsampling) (old.reddit.com)

submitted 4 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/stbl_reel on 2024-09-29 12:15:43+00:00.

372

1

When will SD3.1 medium be released, if at all? (i.redd.it)

submitted 4 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/reditor_13 on 2024-09-29 08:49:35+00:00.

373

1

Testing depth-aware image-to-image animation with Flux + Controlnet (old.reddit.com)

submitted 4 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/rolux on 2024-09-29 08:16:48+00:00.

374

1

I wanted to achieve some natural look with FLUX and some mix of LORAs. Does it look good? (www.reddit.com)

submitted 4 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/kozakfull2 on 2024-09-29 00:02:25+00:00.

375

1

Audio Reactive Playhead in COMFYUI (old.reddit.com)

submitted 4 weeks ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/ryanontheinside on 2024-09-28 21:51:32+00:00.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

Emu3 excels in both generation and perception

Emu3 excels in both generation and perception

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

This is an automated archive made by the Lemmit Bot.

StableDiffusion