StableDiffusion

451

1

CogVideoX-I2V workflow for lazy people (www.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/lhg31 on 2024-09-23 15:19:58+00:00.

452

1

Roman Statues Came to Life with FLUX ControlNet (www.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/erkana_ on 2024-09-23 15:06:15+00:00.

453

1

I have been experimenting with the text vectors recently and I think I got my maths a bit off. Does this looks like a royal kitten to you? (i.redd.it)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Extraltodeus on 2024-09-23 11:10:54+00:00.

454

1

Handpainted Miniatures LORA for FLUX1D (i.redd.it)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Estylon-KBW on 2024-09-23 06:41:22+00:00.

455

1

fractalized (www.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/PixarCEO on 2024-09-23 02:48:39+00:00.

456

1

🔥 ComfyUI Advanced Live Portrait + Webcam running on an RTX 4090! 🔥 (old.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/blackmixture on 2024-09-22 23:00:14+00:00.

457

1

Simple Vector Flux LoRA (www.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/renderartist on 2024-09-22 21:28:43+00:00.

458

1

Yoshitoshi Moon LoRA 🌘 (www.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/an303042 on 2024-09-22 19:57:33+00:00.

459

1

Do you use ai to make custom icons for your desktop? (www.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/evelryu on 2024-09-22 20:15:55+00:00.

460

1

I trained a Jacky Chan Lora with Fluxgym. (www.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/44Beatzz on 2024-09-22 13:20:05+00:00.

461

1

Flux and Stable Diffusion Model Test Card (www.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/wonderflex on 2024-09-22 06:28:12+00:00.

462

1

Cursed Graphics Card Boxes, FLUX LoRA (www.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Angrypenguinpng on 2024-09-22 16:07:17+00:00.

463

1

Flux: What happens if you keep feeding the output image into a transformer block? (old.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/rolux on 2024-09-22 14:12:10+00:00.

464

1

Many Lives of a Pug [ SDXL + Runway Alpha Turbo ] (old.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/love1008 on 2024-09-22 09:49:41+00:00.

465

1

For me now Flux is better than MJ (old.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Bronkilo on 2024-09-22 08:46:15+00:00.

Yes ok There are things that Midjourney does better than FLUX but FLUX is practically at the same level, I can't even imagine FLUX 2, or 3 etc.

but my concern is this, I noticed that there is one thing that MIDJOURNEY is really strong and masters to perfection, it is racial diversity, no matter the origin, country, style, MJ arrives and respects the physical traits, but FLUX tends to generate identical characters, (photorealistic human)

did you notice? or

466

1

Moshi: a speech-text foundation model for real time dialogue. (Open Source / Local - model weights uploaded to HF earlier this week) (i.redd.it)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/ArchiboldNemesis on 2024-09-22 06:11:22+00:00.

467

1

Flux with doubled weights & biases for specific slices within the first three double transformer blocks (www.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/rolux on 2024-09-21 21:06:54+00:00.

468

1

I generated a fake video game poster in Flux for a hypothetical Japanese game called "Spirit Village". Out of curiosity, I used Google Translate on my phone to check if the Japanese text means any... (i.redd.it)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Envy_AI on 2024-09-22 00:36:54+00:00.

Original Title: I generated a fake video game poster in Flux for a hypothetical Japanese game called "Spirit Village". Out of curiosity, I used Google Translate on my phone to check if the Japanese text means anything, and it said it means "slap your friend". Can anyone confirm?

469

1

If you have a 3060 12Gb, use Flux-Dev-Q5_1.gguf (old.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/lazyspock on 2024-09-21 23:28:34+00:00.

I've been using (and struggling with) the original Flux Dev FP16 since launch. It works in my 3060 12Gb, but it takes 2 to 3 minutes to generate an image without any LoRa, makes the computer unusable for anything else during generation, and it's even worse with bigger LoRas (when it needs to reload the model for each generation). But it was the price to pay for being able to use Flux Dev, right?

Wrong. After this marvelous post from u/Lory1998 (thanks again!), I've decided to test Flux-Dev-Q5_1.gguf as he suggested and, man, what a difference! Now I can generate images considerably faster even with two LoRas, as the model fits entirely in my VRAM. There's no reload of the model ever, as long as I don't change the checkpoint, and even the LoRas are loaded in an instant. Also, I can use my computer for other non-intensive things like YouTube, Reddit, ect, while generating the image without Windows almost choking and without making the generation slower. And the best part is that there are no discernible quality differences in the generated images,

So, if you're also a 12Gb VRAM person, try it. It's worth it.

470

1

ComfyUI — Access Anywhere, No Setup Required, Free GPUs Included (old.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/ComprehensiveHand515 on 2024-09-21 22:51:32+00:00.

We’re launching ComfyAI.run, a cloud platform that allows you to run ComfyUI quickly from anywhere without the need to set up your own GPU machines. This is our first Alpha release.

Serverless Example Workflows: SD, SD with ControlNet, Flux

Key Features:

Access from anywhere: Just click the link to launch ComfyUI and start creating immediately.
No setup required: Get started right away without worrying about technical installations.
Free cloud GPUs: No need to manage your own local or cloud-based GPU
Sharable link to the cloud: Create a link for easy collaboration or sharing.

Alpha Version Limitations:

Supports only a limited number of SD15, SDXL, and Flux checkpoints.
Supports a limited number of Custom Nodes.
Free machine pools are shared. If many users are running jobs simultaneously, you may experience a wait time in the queue.

Goal:

We would like to enable anyone to participate in the image generation workflow with easy-to-access and shareable infrastructure.

Feedback

Feedback and suggestions are always welcome! I’m sharing to gather your input. Since it’s still early, feel free to share any feature requests you may have.

471

1

80s Film Flux LoRA (www.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/renderartist on 2024-09-21 18:30:21+00:00.

472

1

I tried all sampler/scheduler combinations with flux-dev-fp8 so you don't have to (old.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/VirusCharacter on 2024-09-21 20:58:17+00:00.

These are the only scheduler/sampler combinations worth the time with Flux-dev-fp8. I'm sure the other checkpoints will get similar results, but that is up to someone else to spend their time on 😎

I have removed the samplers/scheduler combinations so they don't take up valueable space in the table.

🟢=Good 🟡= Almost good 🔴= Really bad!

Here I have compared all sampler/scheduler combinations by speed for flux-dev-fp8 and it's apparent that scheduler doesn't change much, but sampler do. The fastest ones are DPM++ 2M and Euler and the slowest one is HeunPP2

Percentual speed differences between sampler/scheduler combinations

From the following analysis it's clear that the scheduler Beta consistently delivers the best images of the samplers. The runner-up will be the Normal scheduler!

SGM Uniform: This sampler consistently produced clear, well-lit images with balanced sharpness. However, the overall mood and cinematic quality were often lacking compared to other samplers. It’s great for crispness and technical accuracy but doesn't add much dramatic flair.
Simple: The Simple sampler performed adequately but didn't excel in either sharpness or atmosphere. The images had good balance, but the results were often less vibrant or dynamic. It’s a solid, consistent performer without any extremes in quality or mood.
Normal: The Normal sampler frequently produced vibrant, sharp images with good lighting and atmosphere. It was one of the stronger performers, especially in creating dynamic lighting, particularly in portraits and scenes involving cars. It’s a solid choice for a balance of mood and clarity.
DDIM: DDIM was strong in atmospheric and cinematic results, but it often came at the cost of sharpness. The mood it created, especially in scenes with fog or dramatic lighting, was a strong point. However, if you prioritize sharpness and fine detail, DDIM occasionally fell short.
Beta: Beta consistently delivered the best overall results. The lighting was dynamic, the mood was cinematic, and the details remained sharp. Whether it was the portrait, the orange, the fisherman, or the SUV scenes, Beta created images that were both technically strong and atmospherically rich. It’s clearly the top performer across the board.

When it comes to which sampler is the best it's not as easy. Mostly because it's in the eye of the beholder. I believe this should be guidance enough to know what to try. If not you can go through the tiled images yourself and be the judge 😉

PS. I don't get reddit... I uploaded all the tiled images and it looked like it worked, but when posting, they are gone. Sorry 🤔😥

473

1

My comfyui Cog video workflow with adtailer using the fun_5b model, with some examples of outputs. You need to really dive in with some prompting, describing clothing and objects being held helps ... (www.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/HonorableFoe on 2024-09-21 20:56:13+00:00.

Original Title: My comfyui Cog video workflow with adtailer using the fun_5b model, with some examples of outputs. You need to really dive in with some prompting, describing clothing and objects being held helps a lot too. Comfy workflow in the comments.

474

1

Diablo 4 ARPG Concept Art Style - [FLUX] (www.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/jenza1 on 2024-09-21 14:35:23+00:00.

475

1

JoyCaption: Free, Open, Uncensored VLM (Alpha One release) (old.reddit.com)

submitted 1 month ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/fpgaminer on 2024-09-21 18:37:01+00:00.

This is an update and follow-up to my previous post (). To recap, JoyCaption is being built from the ground up as a free, open, and uncensored captioning VLM model for the community to use in training Diffusion models.

Free and Open: It will be released for free, open weights, no restrictions, and just like bigASP, will come with training scripts and lots of juicy details on how it gets built.
Uncensored: Equal coverage of SFW and NSFW concepts. No "cylindrical shaped object with a white substance coming out on it" here.
Diversity: All are welcome here. Do you like digital art? Photoreal? Anime? Furry? JoyCaption is for everyone. Pains are being taken to ensure broad coverage of image styles, content, ethnicity, gender, orientation, etc.
Minimal filtering: JoyCaption is trained on large swathes of images so that it can understand almost all aspects of our world. almost. Illegal content will never be tolerated in JoyCaption's training.

The Demo

WARNING ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ This is a preview release, a demo, alpha, highly unstable, not ready for production use, not indicative of the final product, may irradiate your cat, etc.

JoyCaption is still under development, but I like to release early and often to garner feedback, suggestions, and involvement from the community. So, here you go!

What's New

Wow, it's almost been two months since the Pre-Alpha! The comments and feedback from the community have been invaluable, and I've spent the time since then working to improve JoyCaption and bring it closer to my vision for version one.

First and foremost, based on feedback, I expanded the dataset in various directions to hopefully improve: anime/video game character recognition, classic art, movie names, artist names, watermark detection, male nsfw understanding, and more.
Second, and perhaps most importantly, you can now control the length of captions JoyCaption generates! You'll find in the demo above that you can ask for a number of words (20 to 260 words), a rough length (very short to very long), or "Any" which gives JoyCaption free reign.
Third, you can now control whether JoyCaption writes in the same style as the Pre-Alpha release, which is very formal and clincal, or a new "informal" style, which will use such vulgar and non-Victorian words as "dong" and "chick".
Fourth, there are new "Caption Types" to choose from. "Descriptive" is just like the pre-alpha, purely natural language captions. "Training Prompt" will write random mixtures of natural language, sentence fragments, and booru tags, to try and mimic how users typically write Stable Diffusion prompts. It's highly experimental and unstable; use with caution. "rng-tags" writes only booru tags. It doesn't work very well; I don't recommend it. (NOTE: "Caption Tone" only affects "Descriptive" captions.)

The Details

It has been a grueling month. I spent the majority of the time manually writing 2,000 Training Prompt captions from scratch to try and get that mode working. Unfortunately, I failed miserably. JoyCaption Pre-Alpha was turning out to be quite difficult to fine-tune for the new modes, so I decided to start back at the beginning and massively rework its base training data to hopefully make it more flexible and general. "rng-tags" mode was added to help it learn booru tags better. Half of the existing captions were re-worded into "informal" style to help the model learn new vocabulary. 200k brand new captions were added with varying lengths to help it learn how to write more tersely. And I added a LORA on the LLM module to help it adapt.

The upshot of all that work is the new Caption Length and Caption Tone controls, which I hope will make JoyCaption more useful. The downside is that none of that really helped Training Prompt mode function better. The issue is that, in that mode, it will often go haywire and spiral into a repeating loop. So while it kinda works, it's too unstable to be useful in practice. 2k captions is also quite small and so Training Prompt mode has picked up on some idiosyncrasies in the training data.

That said, I'm quite happy with the new length conditioning controls on Descriptive captions. They help a lot with reducing the verbosity of the captions. And for training Stable Diffusion models, you can randomly sample from the different caption lengths to help ensure that the model doesn't overfit to a particular caption length.

Caveats

As stated, Training Prompt mode is still not working very well, so use with caution. rng-tags mode is mostly just there to help expand the model's understanding, I wouldn't recommend actually using it.

Informal style is ... interesting. For training Stable Diffusion models, I think it'll be helpful because it greatly expands the vocabulary used in the captions. But I'm not terribly happy with the particular style it writes in. It very much sounds like a boomer trying to be hip. Also, the informal style was made by having a strong LLM rephrase half of the existing captions in the dataset; they were not built directly from the images they are associated with. That means that the informal style captions tend to be slightly less accurate than the formal style captions.

And the usual caveats from before. I think the dataset expansion did improve some things slightly like movie, art, and character recognition. OCR is still meh, especially on difficult to read stuff like artist signatures. And artist recognition is ... quite bad at the moment. I'm going to have to pour more classical art into the model to improve that. It should be better at calling out male NSFW details (erect/flaccid, circumcised/uncircumcised), but accuracy needs more improvement there.

Feedback

Please let me know what you think of the new features, if the model is performing better for you, or if it's performing worse. Feedback, like before, is always welcome and crucial to me improving JoyCaption for everyone to use.