1
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/_micah_h on 2024-10-26 07:25:35+00:00.

2
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/metal079 on 2024-10-26 06:32:25+00:00.


Been messing around with Kling AI and so far it's pretty decent but wondering if there's anything better? Both closed sourced or open source options are welcomed. I have a 4090 so hopefully running wouldn't be an issue.

3
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/iLab-c on 2024-10-26 03:31:34+00:00.

4
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Pretend_Potential on 2024-10-25 23:11:34+00:00.


From the post:

"Target Audience: Engineers or technical people with at least basic familiarity with fine-tuning

Purpose: Understand the difference between fine-tuning SD1.5/SDXL and Stable Diffusion 3 Medium/Large (SD3.5M/L) and enable more users to fine-tune on both models.

Introduction

Hello! My name is Yeo Wang, and I’m a Generative Media Solutions Engineer at Stability AI and freelance 2D/3D concept designer. You might have seen some of my videos on YouTube or know about me through the community (Github).

The previous fine-tuning guide regarding Stable Diffusion 3 Medium was also written by me (with a slight allusion to this new 3.5 family of models). I’ll be building off the information in that post, so if you’ve gone through it before, it will make this much easier as I’ll be using similar techniques from there."

The rest if the tutorial is here:

5
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/twistedgames on 2024-10-26 02:59:16+00:00.

6
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/FennelFetish on 2024-10-25 21:52:23+00:00.


I've been working on a tool for creating image datasets.

Initially built as an image viewer with comparison and quick cropping functions, qapyq now includes a captioning interface and supports multi-modal models and LLMs for automated batch processing.

A key concept is storing multiple captions in intermediate .json files, which can then be combined and refined with your favourite LLM and custom prompt(s).

Features:

Tabbed image viewer

  • Zoom/pan and fullscreen mode
  • Gallery, Slideshow
  • Crop, compare, take measurements

Manual and automated captioning/tagging

  • Drag-and-drop interface and colored text highlighting
  • Tag sorting and filtering rules
  • Further refinement with LLMs
  • GPU acceleration with CPU offload support
  • On-the-fly NF4 and INT8 quantization

Supports JoyTag and WD for tagging.

InternVL2, MiniCPM, Molmo, Ovis, Qwen2-VL for automatic captioning.

And GGUF format for LLMs.

Download and further information are available on GitHub:

Given the importance of quality datasets in training, I hope this tool can assist creators of models, finetunes and LoRA.

Looking forward to your feedback! Do you have any good prompts to share?

Screenshots:

Overview of qapyq's modular interface

Quick cropping

Image comparison

Apply sorting and filtering rules

Edit quickly with drag-and-drop support

Select one-of-many

Batch caption with multiple prompts sent sequentially

Batch transform multiple captions and tags into one

Load models even when resources are limited

7
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/ZootAllures9111 on 2024-10-25 18:16:53+00:00.

8
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Pretend_Potential on 2024-10-25 17:12:40+00:00.


a lot of people put their LoRAs up on huggingface, and there are already quite a few for Stable Diffusion 3.5. you can find them all here

As of the time/date of this post, there are already 28 of them, here's a screenshot of the top of the list.

bookmark this link as more will be very rapidly added

9
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Why_Soooo_Serious on 2024-10-25 17:20:38+00:00.

10
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/wyem on 2024-10-25 14:51:35+00:00.


  1. Anthropic announced computer use, a new capability in public beta. Available on the API, developers can direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking buttons, and typing text. Anthropic also announced a new model, Claude 3.5 Haiku and an upgraded Claude 3.5 Sonnet which demonstrates significant improvements in coding and tool use. The upgraded Claude 3.5 Sonnet is now available for all users, while the new Claude 3.5 Haiku will be released later this month [Details].
  2. Cohere released Aya Expanse, a family of highly performant multilingual models that excels across 23 languages and outperforms other leading open-weights models. Aya Expanse 32B outperforms Gemma 2 27B, Mistral 8x22B, and Llama 3.1 70B, a model more than 2x its size, setting a new state-of-the-art for multilingual performance. Aya Expanse 8B, outperforms the leading open-weights models in its parameter class such as Gemma 2 9B, Llama 3.1 8B, and the recently released Ministral 8B [Details].
  3. Genmo released a research preview of Mochi 1, an open-source video generation model that performs competitively with the leading closed models and is licensed under Apache 2.0 for free personal and commercial use. Users can try it at genmo.ai/play, with weights and architecture available on HuggingFace. The 480p model is live now, with Mochi 1 HD coming later this year [Details].
  4. Rhymes AI released, Allegro, a small and efficient open-source text-to-video model that transforms text into 6-second videos at 15 FPS and 720p. It surpasses existing open-source models and most commercial models, ranking just behind Hailuo and Kling. Model weights and code available, Apache 2.0 [Details | Gallery]
  5. Meta AI released new quantized versions of Llama 3.2 1B and 3B models. These models offer a reduced memory footprint, faster on-device inference, accuracy, and portability, all the while maintaining quality and safety for deploying on resource-constrained devices [Details].
  6. Stability AI introduced Stable Diffusion 3.5. This open release includes multiple model variants, including Stable Diffusion 3.5 Large and Stable Diffusion 3.5 Large Turbo. Additionally, Stable Diffusion 3.5 Medium will be released on October 29th. These models are highly customizable for their size, run on consumer hardware, and are free for both commercial and non-commercial use under the permissive Stability AI Community License   [Details].
  7. Hugging Face launched Hugging Face Generative AI Services a.k.a. HUGS. HUGS offers an easy way to build AI applications with open models hosted in your own infrastructure [Details].
  8. Runway is rolling out Act-One, a new tool for generating expressive character performances inside Gen-3 Alpha using just a single driving video and character image [Details].
  9. Anthropic launched the analysis tool, a new built-in feature for Claude.ai that enables Claude to write and run JavaScript code. Claude can now process data, conduct analysis, and produce real-time insights [Details].
  10. IBM released new Granite 3.0 8B & 2B models, released under the permissive Apache 2.0 license that show strong performance across many academic and enterprise benchmarks, able to outperform or match similar-sized models [Details]
  11. Playground AI introduced Playground v3, a new image generation model focused on graphic design [Details].
  12. Meta released several new research artifacts including Meta Spirit LM, an open source multimodal language model that freely mixes text and speech. Meta Segment Anything 2.1 (SAM 2.1), an update to Segment Anything Model 2 for images and videos has also been released. SAM 2.1 includes a new developer suite with the code for model training and the web demo [Details].
  13. Haiper AI launched Haiper 2.0, an upgraded video model with lifelike motion, intricate details and cinematic camera control. The platform now includes templates for quick creation [Link].
  14. Ideogram launched Canvas, a creative board for organizing, generating, editing, and combining images. It features tools like Magic Fill for inpainting and Extend for outpainting [Details].
  15. Perplexity has introduced two new features: Internal Knowledge Search, allowing users to search across both public web content and internal knowledge bases., and Spaces, AI-powered collaboration hubs that allow teams to organize and share relevant information [Details].
  16. Google DeepMind announced updates for: a) Music AI Sandbox, an experimental suite of music AI tools that aims to supercharge the workflows of musicians. b) MusicFX DJ, a digital tool that makes it easier for anyone to generate music, interactively, in real time [Details].
  17. Microsoft released OmniParser, an open-source general screen parsing tool, which interprets/converts UI screenshot to structured format, to improve existing LLM based UI agent [Details].
  18. Replicate announced playground for users to experiment with image models on Replicate. It's currently in beta and works with FLUX and related models and lets you compare different models, prompts, and settings side by side [Link].
  19. Embed 3 AI search model by Cohere is now multimodal. It is capable of generating embeddings from both text and images [Details].
  20. DeepSeek released Janus, a 1.3B unified MLLM, which decouples visual encoding for multimodal understanding and generation. Its based on DeepSeek-LLM-1.3b-base and SigLIP-L as the vision encoder [Details].
  21. Google DeepMind has open-sourced their SynthID text watermarking tool for identifying AI-generated content [Details].
  22. ElevenLabs launched VoiceDesign - a new tool to generate a unique voice from a text prompt by describing the unique characteristics of the voice you need [Details].
  23. Microsoft announced that the ability to create autonomous agents with Copilot Studio will be in public preview next month. Ten new autonomous agents will be introduced in Microsoft Dynamics 365 for sales, service, finance, and supply chain teams [Details].
  24. xAI, Elon Musk’s AI startup, launched an API allowing developers to build on its Grok model[Detail].
  25. Asana announced AI Studio, a No-Code builder for designing and deploying AI Agents in workflows [Details].

Source: AI Brews - Links removed from this post due to auto-delete, but they are present in the newsletter. it's free to join, sent only once a week with bite-sized news, learning resources and selected tools. Thanks!

11
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/sktksm on 2024-10-25 14:10:44+00:00.

12
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/diStyR on 2024-10-25 12:07:08+00:00.

13
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/BigRub7079 on 2024-10-25 07:39:23+00:00.

14
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Angrypenguinpng on 2024-10-25 01:21:45+00:00.

15
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/darkside1977 on 2024-10-25 08:32:32+00:00.

16
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/diStyR on 2024-10-25 07:07:42+00:00.

17
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Hunting-Succcubus on 2024-10-24 21:06:53+00:00.


What are the possibility of 5090 to have 48 GB Vram? with 3GB GDDR7 module it should be possible.

 Samsung's 3GB 40Gb/s card and 5090 with 16 modules and a 512-bit bus would have 48 GB and 2560 GB/s.

NVIDIA RTX 5090 Founder's Edition rumored to feature 16 GDDR7 memory modules in denser design - VideoCardz.com

18
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Successful_AI on 2024-10-24 21:00:50+00:00.

19
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Presnobo on 2024-10-24 23:32:08+00:00.

20
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/terminusresearchorg on 2024-10-24 21:33:58+00:00.


We used industry-standard dataset to train SD 3.5 and quantify its trainability on a single concept, 1boy.

full guide:

example model:

huggingface:

Hardware; 3x 4090

Training time, a cpl hours

Config:

  • Learning rate: 1e-05
  • Number of images: 15
  • Max grad norm: 0.01
  • Effective batch size: 3
    • Micro-batch size: 1
    • Gradient accumulation steps: 1
    • Number of GPUs: 3
  • Optimizer: optimi-lion
  • Precision: Pure BF16
  • Quantised: No

Total used was about 18GB VRAM over the whole run. with int8-quanto it comes down to like 11gb needed.

LyCORIS config:

{
    "bypass_mode": true,
    "algo": "lokr",
    "multiplier": 1.0,
    "full_matrix": true,
    "linear_dim": 10000,
    "linear_alpha": 1,
    "factor": 12,
    "apply_preset": {
        "target_module": [
            "Attention"
        ],
        "module_algo_map": {
            "Attention": {
                "factor": 6
            }
        }
    }
}

See hugging face hub link for more config info.

21
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Pretend_Potential on 2024-10-24 21:33:33+00:00.


Because i know there are some here that want the GGUFs, and that might not have seen this, they are located in this huggingface repo

22
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/twotimefind on 2024-10-24 21:29:55+00:00.

23
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/ectoblob on 2024-10-24 18:13:35+00:00.

24
1
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/ZootAllures9111 on 2024-10-24 20:46:46+00:00.

25
1
ROYGBIV Flux LoRA (www.reddit.com)
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/renderartist on 2024-10-24 20:28:32+00:00.

view more: next ›

StableDiffusion

98 readers
1 users here now

/r/StableDiffusion is back open after the protest of Reddit killing open API access, which will bankrupt app developers, hamper moderation, and...

founded 1 year ago
MODERATORS