This week in AI - all the Major AI developments in a nutshell (old.reddit.com)

submitted 21 hours ago by bot@lemmit.online to c/stablediffusion@lemmit.online

0 comments fedilink hide all child comments

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/wyem on 2024-10-25 14:51:35+00:00.

Anthropic announced computer use, a new capability in public beta. Available on the API, developers can direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking buttons, and typing text. Anthropic also announced a new model, Claude 3.5 Haiku and an upgraded Claude 3.5 Sonnet which demonstrates significant improvements in coding and tool use. The upgraded Claude 3.5 Sonnet is now available for all users, while the new Claude 3.5 Haiku will be released later this month [Details].
Cohere released Aya Expanse, a family of highly performant multilingual models that excels across 23 languages and outperforms other leading open-weights models. Aya Expanse 32B outperforms Gemma 2 27B, Mistral 8x22B, and Llama 3.1 70B, a model more than 2x its size, setting a new state-of-the-art for multilingual performance. Aya Expanse 8B, outperforms the leading open-weights models in its parameter class such as Gemma 2 9B, Llama 3.1 8B, and the recently released Ministral 8B [Details].
Genmo released a research preview of Mochi 1, an open-source video generation model that performs competitively with the leading closed models and is licensed under Apache 2.0 for free personal and commercial use. Users can try it at genmo.ai/play, with weights and architecture available on HuggingFace. The 480p model is live now, with Mochi 1 HD coming later this year [Details].
Rhymes AI released, Allegro, a small and efficient open-source text-to-video model that transforms text into 6-second videos at 15 FPS and 720p. It surpasses existing open-source models and most commercial models, ranking just behind Hailuo and Kling. Model weights and code available, Apache 2.0 [Details | Gallery]
Meta AI released new quantized versions of Llama 3.2 1B and 3B models. These models offer a reduced memory footprint, faster on-device inference, accuracy, and portability, all the while maintaining quality and safety for deploying on resource-constrained devices [Details].
Stability AI introduced Stable Diffusion 3.5. This open release includes multiple model variants, including Stable Diffusion 3.5 Large and Stable Diffusion 3.5 Large Turbo. Additionally, Stable Diffusion 3.5 Medium will be released on October 29th. These models are highly customizable for their size, run on consumer hardware, and are free for both commercial and non-commercial use under the permissive Stability AI Community License [Details].
Hugging Face launched Hugging Face Generative AI Services a.k.a. HUGS. HUGS offers an easy way to build AI applications with open models hosted in your own infrastructure [Details].
Runway is rolling out Act-One, a new tool for generating expressive character performances inside Gen-3 Alpha using just a single driving video and character image [Details].
Anthropic launched the analysis tool, a new built-in feature for Claude.ai that enables Claude to write and run JavaScript code. Claude can now process data, conduct analysis, and produce real-time insights [Details].
IBM released new Granite 3.0 8B & 2B models, released under the permissive Apache 2.0 license that show strong performance across many academic and enterprise benchmarks, able to outperform or match similar-sized models [Details]
Playground AI introduced Playground v3, a new image generation model focused on graphic design [Details].
Meta released several new research artifacts including Meta Spirit LM, an open source multimodal language model that freely mixes text and speech. Meta Segment Anything 2.1 (SAM 2.1), an update to Segment Anything Model 2 for images and videos has also been released. SAM 2.1 includes a new developer suite with the code for model training and the web demo [Details].
Haiper AI launched Haiper 2.0, an upgraded video model with lifelike motion, intricate details and cinematic camera control. The platform now includes templates for quick creation [Link].
Ideogram launched Canvas, a creative board for organizing, generating, editing, and combining images. It features tools like Magic Fill for inpainting and Extend for outpainting [Details].
Perplexity has introduced two new features: Internal Knowledge Search, allowing users to search across both public web content and internal knowledge bases., and Spaces, AI-powered collaboration hubs that allow teams to organize and share relevant information [Details].
Google DeepMind announced updates for: a) Music AI Sandbox, an experimental suite of music AI tools that aims to supercharge the workflows of musicians. b) MusicFX DJ, a digital tool that makes it easier for anyone to generate music, interactively, in real time [Details].
Microsoft released OmniParser, an open-source general screen parsing tool, which interprets/converts UI screenshot to structured format, to improve existing LLM based UI agent [Details].
Replicate announced playground for users to experiment with image models on Replicate. It's currently in beta and works with FLUX and related models and lets you compare different models, prompts, and settings side by side [Link].
Embed 3 AI search model by Cohere is now multimodal. It is capable of generating embeddings from both text and images [Details].
DeepSeek released Janus, a 1.3B unified MLLM, which decouples visual encoding for multimodal understanding and generation. Its based on DeepSeek-LLM-1.3b-base and SigLIP-L as the vision encoder [Details].
Google DeepMind has open-sourced their SynthID text watermarking tool for identifying AI-generated content [Details].
ElevenLabs launched VoiceDesign - a new tool to generate a unique voice from a text prompt by describing the unique characteristics of the voice you need [Details].
Microsoft announced that the ability to create autonomous agents with Copilot Studio will be in public preview next month. Ten new autonomous agents will be introduced in Microsoft Dynamics 365 for sales, service, finance, and supply chain teams [Details].
xAI, Elon Musk’s AI startup, launched an API allowing developers to build on its Grok model[Detail].
Asana announced AI Studio, a No-Code builder for designing and deploying AI Agents in workflows [Details].

Source: AI Brews - Links removed from this post due to auto-delete, but they are present in the newsletter. it's free to join, sent only once a week with bite-sized news, learning resources and selected tools. Thanks!

no comments (yet)

sorted by: hot top controversial new old

there doesn't seem to be anything here

this post was submitted on 25 Oct 2024

1 points (100.0% liked)

StableDiffusion

98 readers

1 users here now

/r/StableDiffusion is back open after the protest of Reddit killing open API access, which will bankrupt app developers, hamper moderation, and...

founded 1 year ago

MODERATORS

bot@lemmit.online