5

Summary:

Focused Transformer: A new technique for long-context language modeling. The paper introduces Focused Transformer (FOT), a method that uses contrastive learning and external memory to improve the structure of the (key, value) space and extend the context length of transformer models. FOT can fine-tune existing large models without changing their architecture and achieve better performance on tasks that require long context.

LONGLLAMA: Extending LLaMA’s context length with FOT. The paper demonstrates the application of FOT to fine-tune OpenLLaMA models, which are large language models with memory augmentation. The resulting models, called LONGLLAMAs, can handle a context length of up to 256k tokens and show improvements on few-shot learning tasks such as TREC and WebQS.

Distraction issue: A key challenge for scaling context length. The paper identifies the distraction issue as a major obstacle for using large memory databases in multi-document scenarios. The distraction issue occurs when keys from irrelevant documents overlap with keys from relevant ones, making them hard to distinguish. FOT alleviates this issue by exposing the memory attention layer to both positive and negative examples during training.

ELI5

Sure! Imagine you have a toy box with lots of toys inside. You want to find your favorite toy, but there are so many toys that it's hard to find it. The Focused Transformer is like a special helper that can look inside the toy box and find your favorite toy quickly, even if there are lots of other toys in the way. It does this by remembering which toys are important and which ones are not, so it can find the right toy faster. Does that make sense?

Implications

The Focused Transformer (FOT) technique has the potential to improve the performance of language models by extending their context length. This means that the models can better understand and incorporate new information, even when it is spread across a large number of documents. The resulting LONGLLAMA models show significant improvements on tasks that require long-context modeling, such as retrieving information from large databases. This research could have implications for natural language processing, code generation, quantitative reasoning, and theorem proving, among other areas. It could also make it easier to fine-tune existing large-scale models to lengthen their effective context. Is there anything else you would like to know?

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here
this post was submitted on 11 Jul 2023
5 points (100.0% liked)

Singularity | Artificial Intelligence (ai), Technology & Futurology

3 readers
1 users here now

About:

This sublemmy is a place for sharing news and discussions about artificial intelligence, core developments of humanity's technology and societal changes that come with them. Basically futurology sublemmy centered around ai but not limited to ai only.

Rules:
  1. Posts that don't follow the rules and don't comply with them after being pointed out that they break the rules will be deleted no matter how much engagement they got and then reposted by me in a way that follows the rules. I'm going to wait for max 2 days for the poster to comply with the rules before I decide to do this.
  2. No Low-quality/Wildly Speculative Posts.
  3. Keep posts on topic.
  4. Don't make posts with link/s to paywalled articles as their main focus.
  5. No posts linking to reddit posts.
  6. Memes are fine as long they are quality or/and can lead to serious on topic discussions. If we end up having too much memes we will do meme specific singularity sublemmy.
  7. Titles must include information on how old the source is in this format dd.mm.yyyy (ex. 24.06.2023).
  8. Please be respectful to each other.
  9. No summaries made by LLMs. I would like to keep quality of comments as high as possible.
  10. (Rule implemented 30.06.2023) Don't make posts with link/s to tweets as their main focus. Melon decided that the content on the platform is going to be locked behind login requirement and I'm not going to force everyone to make a twitter account just so they can see some news.
  11. No ai generated images/videos unless their role is to represent new advancements in generative technology which are not older that 1 month.
  12. If the title of the post isn't an original title of the article or paper then the first thing in the body of the post should be an original title written in this format "Original title: {title here}".
  13. Please be respectful to each other.

Related sublemmies:

!auai@programming.dev (Our community focuses on programming-oriented, hype-free discussion of Artificial Intelligence (AI) topics. We aim to curate content that truly contributes to the understanding and practical application of AI, making it, as the name suggests, “actually useful” for developers and enthusiasts alike.)

Note:

My posts on this sub are currently VERY reliant on getting info from r/singularity and other subreddits on reddit. I'm planning to at some point make a list of sites that write/aggregate news that this subreddit is about so we could get news faster and not rely on reddit as much. If you know any good sites please dm me.

founded 1 year ago
MODERATORS