Artwork

Innhold levert av EDGE AI FOUNDATION. Alt podcastinnhold, inkludert episoder, grafikk og podcastbeskrivelser, lastes opp og leveres direkte av EDGE AI FOUNDATION eller deres podcastplattformpartner. Hvis du tror at noen bruker det opphavsrettsbeskyttede verket ditt uten din tillatelse, kan du følge prosessen skissert her https://no.player.fm/legal.
Player FM - Podcast-app
Gå frakoblet med Player FM -appen!

Tomorrow's Edge AI: Cutting-Edge Memory Optimization for Large Language Models with Seonyeong Heo of Kyung Hee University

30:29
 
Del
 

Manage episode 453994436 series 3574631
Innhold levert av EDGE AI FOUNDATION. Alt podcastinnhold, inkludert episoder, grafikk og podcastbeskrivelser, lastes opp og leveres direkte av EDGE AI FOUNDATION eller deres podcastplattformpartner. Hvis du tror at noen bruker det opphavsrettsbeskyttede verket ditt uten din tillatelse, kan du følge prosessen skissert her https://no.player.fm/legal.

Send us a text

Discover the cutting-edge techniques behind memory optimization for large language models with our guest, Seonyeong Heo from Kyung-Hee University. Join us as we promise to unlock the secrets of deploying 7-billion-parameter models on small devices with limited memory. This episode delves into the intricacies of key-value caching in decoder-only transformers, a crucial innovation that reduces computational overhead by efficiently storing and reusing outputs. Seon-young shares insightful strategies that tackle the high demands of memory management, offering a glimpse into how these models can be more feasible and energy-efficient.
Our conversation also ventures into the world of dynamic compression methods essential for optimizing memory usage. We unpack the challenges of compressing key-value arrays and explore the merits of techniques like quantization, pruning, and dimensionality reduction with autoencoders. Weighted quantization is highlighted as a standout method for achieving remarkable compression rates with minimal errors, provided it's fine-tuned effectively. This episode is a must-listen for those interested in the future of on-device LLMs, as we underscore the significance of efficient memory management in enhancing their performance, especially in resource-constrained settings. Tune in for this enlightening discussion paving the way for innovative advancements in the field.

Support the show

Learn more about the EDGE AI FOUNDATION - edgeaifoundation.org

  continue reading

Kapitler

1. Tomorrow's Edge AI: Cutting-Edge Memory Optimization for Large Language Models with Seonyeong Heo of Kyung Hee University (00:00:00)

2. Memory Optimization for on-Device LLM (00:00:24)

3. Memory Optimization Techniques for LLMs (00:11:50)

24 episoder

Artwork
iconDel
 
Manage episode 453994436 series 3574631
Innhold levert av EDGE AI FOUNDATION. Alt podcastinnhold, inkludert episoder, grafikk og podcastbeskrivelser, lastes opp og leveres direkte av EDGE AI FOUNDATION eller deres podcastplattformpartner. Hvis du tror at noen bruker det opphavsrettsbeskyttede verket ditt uten din tillatelse, kan du følge prosessen skissert her https://no.player.fm/legal.

Send us a text

Discover the cutting-edge techniques behind memory optimization for large language models with our guest, Seonyeong Heo from Kyung-Hee University. Join us as we promise to unlock the secrets of deploying 7-billion-parameter models on small devices with limited memory. This episode delves into the intricacies of key-value caching in decoder-only transformers, a crucial innovation that reduces computational overhead by efficiently storing and reusing outputs. Seon-young shares insightful strategies that tackle the high demands of memory management, offering a glimpse into how these models can be more feasible and energy-efficient.
Our conversation also ventures into the world of dynamic compression methods essential for optimizing memory usage. We unpack the challenges of compressing key-value arrays and explore the merits of techniques like quantization, pruning, and dimensionality reduction with autoencoders. Weighted quantization is highlighted as a standout method for achieving remarkable compression rates with minimal errors, provided it's fine-tuned effectively. This episode is a must-listen for those interested in the future of on-device LLMs, as we underscore the significance of efficient memory management in enhancing their performance, especially in resource-constrained settings. Tune in for this enlightening discussion paving the way for innovative advancements in the field.

Support the show

Learn more about the EDGE AI FOUNDATION - edgeaifoundation.org

  continue reading

Kapitler

1. Tomorrow's Edge AI: Cutting-Edge Memory Optimization for Large Language Models with Seonyeong Heo of Kyung Hee University (00:00:00)

2. Memory Optimization for on-Device LLM (00:00:24)

3. Memory Optimization Techniques for LLMs (00:11:50)

24 episoder

Alle episoder

×
 
Loading …

Velkommen til Player FM!

Player FM scanner netter for høykvalitets podcaster som du kan nyte nå. Det er den beste podcastappen og fungerer på Android, iPhone og internett. Registrer deg for å synkronisere abonnement på flere enheter.

 

Hurtigreferanseguide

Copyright 2025 | Sitemap | Personvern | Vilkår for bruk | | opphavsrett
Lytt til dette showet mens du utforsker
Spill