1

In human conversations, individuals can indicate relevant regions within a scene while addressing others. In turn, the other person can then respond by referring to specific regions if necessary. This natural referential ability in dialogue remains absent in current Multimodal Large Language Models (MLLMs). To fill this gap, this paper proposes an MLLM called Shikra, which can handle spatial coordinate inputs and outputs in natural language. Its architecture consists of a vision encoder, an alignment layer, and a LLM. It is designed to be straightforward and simple, without the need for extra vocabularies, position encoder, pre-/post-detection modules, or external plug-in models. All inputs and outputs are in natural language form. Referential dialogue is a superset of various vision-language (VL) tasks. Shikra can naturally handle location-related tasks like REC and PointQA, as well as conventional VL tasks such as Image Captioning and VQA. Experimental results showcase Shikra's promising performance. Furthermore, it enables numerous exciting applications, like providing mentioned objects' coordinates in chains of thoughts and comparing user-pointed regions similarities. Our code, model and dataset are accessed at this https URL.

1
submitted 1 year ago* (last edited 1 year ago) by Martineski@lemmy.fmhy.ml to c/singularity@lemmy.fmhy.ml

Amazon CEO Andy Jassy called generative A.I. “one of the biggest technical transformations of our lifetimes” in an interview with CNBC on Thursday. He also called many of today’s A.I. chatbots and other generative A.I. tools part of the “hype cycle,” declaring that Amazon was focused on the “substance cycle.”

Amazon’s bona fides in the space are well established, having been a player in artificial intelligence and machine learning long before the ChatGPTs and Bards of the world were publicly released. Former Fortune editor Brian Dumaine wrote a book in 2020 about how Amazon founder Jeff Bezos realized early on that imbuing machine learning into every facet of the company would allow it to gather data to constantly improve itself.

Much as it did with Amazon Web Services, which practically birthed the cloud computing industry that now powers the internet’s biggest companies, including its competitors, Amazon’s A.I. strategy is focused on cementing its position as a major player across the entirety of the A.I. supply chain.

“Every single business unit inside of Amazon is working intensely and very broadly on generative A.I.,” Jassy says.

Jassy shed some light on Amazon’s A.I. game plan, outlining three macro layers: the computing capabilities, the underlying models, and what Jassy refers to as the “application layer,” for example, ChatGPT or Bard.

1

For 3D object manipulation, methods that build an explicit 3D representation perform better than those relying only on camera images. But using explicit 3D representations like voxels comes at large computing cost, adversely affecting scalability. In this work, we propose RVT, a multi-view transformer for 3D manipulation that is both scalable and accurate. Some key features of RVT are an attention mechanism to aggregate information across views and re-rendering of the camera input from virtual views around the robot workspace. In simulations, we find that a single RVT model works well across 18 RLBench tasks with 249 task variations, achieving 26% higher relative success than the existing state-of-the-art method (PerAct). It also trains 36X faster than PerAct for achieving the same performance and achieves 2.3X the inference speed of PerAct. Further, RVT can perform a variety of manipulation tasks in the real world with just a few (∼10) demonstrations per task. Visual results, code, and trained model are provided at this https URL.

5
submitted 1 year ago* (last edited 1 year ago) by Martineski@lemmy.fmhy.ml to c/singularity@lemmy.fmhy.ml

Covid-19 is said to cause long-term side effects in up to 67% of patients, and these health consequences can include chronic fatigue, loss of taste and smell and brain fog. Increasingly common too is Covid-related hair loss. Known as telogen effluvium, this phenomenon manifests as clumps of hair falling out after brushing or washing your hair.

It’s normal to shed hair daily – we lose about 100-150 hairs each day as hair drops from follicles to make way for new hair growth. This growth cycle occurs because 90% of the hair on our heads is in a growth phase (called anagen), while the remaining 10% is in a resting phase (called telogen). Anagen lasts for about three years before transitioning into the shorter telogen phase, following which hair is shed.

A stressful event like childbirth, certain medications, intense psychological stress and Covid-19 can trigger our bodies to shift a greater-than-normal proportion of growing anagen hairs into a resting telogen state, according to the University of Utah.

“Covid-related hair loss can affect up to 33% of symptomatic patients and 10% of asymptomatic patients,” says a plastic surgeon who deals with hair loss patients. “And this kind of hair loss seems to be different from that induced by stress or disease as cytokines (substances secreted by the body’s immune system) appear to cause direct damage to hair follicles,” she adds.

Covid-induced hair loss has also been reported to start earlier after the stressful event – in two months instead of the usual three.

1

Abstract:

Since the first laser was invented, the pursuit of high-energy lasers (HELs) has always been enthusiastic. The first revolution of HELs was pushed by the fusion of laser and aerospace in the 1960s, with the chemical rocket engines giving fresh impetus to the birth of gas flow and chemical lasers, which finally turned megawatt lasers from dream into reality. Nowadays, the development of HELs has entered the age of electricity as well as the rocket engines. The properties of current electric rocket engines are highly consistent with HELs’ goals, including electrical driving, effective heat dissipation, little medium consumption and extremely light weight and size, which inspired a second fusion of laser and aerospace and motivated the exploration for potential HELs. As an exploratory attempt, a new configuration of diode pumped metastable rare gas laser was demonstrated, with the gain generator resembling an electric rocket-engine for improved power scaling ability.

1
1
submitted 1 year ago* (last edited 1 year ago) by Martineski@lemmy.fmhy.ml to c/singularity@lemmy.fmhy.ml

Original title: Focused Transformer: Contrastive Training for Context Scaling

Large language models have an exceptional capability to incorporate new information in a contextual manner. However, the full potential of such an approach is often restrained due to a limitation in the effective context length. One solution to this issue is to endow an attention layer with access to an external memory, which comprises of (key, value) pairs. Yet, as the number of documents increases, the proportion of relevant keys to irrelevant ones decreases, leading the model to focus more on the irrelevant keys. We identify a significant challenge, dubbed the distraction issue, where keys linked to different semantic values might overlap, making them hard to distinguish. To tackle this problem, we introduce the Focused Transformer (FoT), a technique that employs a training process inspired by contrastive learning. This novel approach enhances the structure of the (key, value) space, enabling an extension of the context length. Our method allows for fine-tuning pre-existing, large-scale models to lengthen their effective context. This is demonstrated by our fine-tuning of 3B and 7B OpenLLaMA checkpoints. The resulting models, which we name LongLLaMA, exhibit advancements in tasks requiring a long context. We further illustrate that our LongLLaMA models adeptly manage a 256k context length for passkey retrieval.

2
12
hmmm (lemmy.fmhy.ml)
7
hmmm (lemmy.fmhy.ml)
submitted 1 year ago* (last edited 1 year ago) by Martineski@lemmy.fmhy.ml to c/hmmm@lemmy.fmhy.ml
0
hmmm (lemmy.fmhy.ml)
7

Link to the sublemmy: !hmmmtexts@lemmy.fmhy.ml

The other sublemmy from the hmmm category that I didn't make an announcement for: !hmmmgifs@lemmy.fmhy.ml

[-] Martineski@lemmy.fmhy.ml 16 points 1 year ago
[-] Martineski@lemmy.fmhy.ml 11 points 1 year ago

You can change it in the settings

[-] Martineski@lemmy.fmhy.ml 12 points 1 year ago

Oh boy, the identity and copyright laws will be chaotic as ai gets more and more advanced. I'm all in for abolishing copyrights but I have no idea what to think about your identity being duplicated/recreated. When is something your identity and when it stops being it? It will be obvious with 1:1 copies of popular people/actors but what about situations where copies are tinkered with to resemble someone less or when you do a mix of multiple people to create one person? What about people that are not known by everyone? What if the virtual person resembles someone by accident?

[-] Martineski@lemmy.fmhy.ml 13 points 1 year ago

At least one person in the comments gave me a good feedback. But to decide what to do I will need to contact the person that works on tags on github and go into an in-depth discussion with him about how it will work when the first stage of the tags isn’t implemented yet…

[-] Martineski@lemmy.fmhy.ml 15 points 1 year ago

Click on my nickname and you should see an option to block me there.

[-] Martineski@lemmy.fmhy.ml 15 points 1 year ago

Nope, it's an instance

[-] Martineski@lemmy.fmhy.ml 17 points 1 year ago

exploding-heads

:o

[-] Martineski@lemmy.fmhy.ml 13 points 1 year ago

Web interface for the instance was down to prevent being hacked like lemmy.world and blahaj.zone. Seeing how it's working again they must have fixed it. Beehaw.org also fixed this problem just recently. Now we just need to wait for more instances to apply the fix to prevent more hacks.

[-] Martineski@lemmy.fmhy.ml 10 points 1 year ago

I posted it to only 3 communites and 1 of them is a small local community. It's not like I spam it everywhere.

view more: next ›

Martineski

joined 1 year ago
MODERATOR OF