LLM Inference Speed (Tech Deep Dive) Thinking Machines: AI & Philosophy podcast

Artwork

Tech Machine Learning Artificial Intelligence Society Philosophy Daniel Reid Cahn MLOps

Innhold levert av Daniel Reid Cahn. Alt podcastinnhold, inkludert episoder, grafikk og podcastbeskrivelser, lastes opp og leveres direkte av Daniel Reid Cahn eller deres podcastplattformpartner. Hvis du tror at noen bruker det opphavsrettsbeskyttede verket ditt uten din tillatelse, kan du følge prosessen skissert her https://no.player.fm/legal.

Thinking Machines: AI & Philosophy « »
LLM Inference Speed (Tech Deep Dive)

1y ago 39:36

Del

MP3•Episoder hjem

Innhold levert av Daniel Reid Cahn. Alt podcastinnhold, inkludert episoder, grafikk og podcastbeskrivelser, lastes opp og leveres direkte av Daniel Reid Cahn eller deres podcastplattformpartner. Hvis du tror at noen bruker det opphavsrettsbeskyttede verket ditt uten din tillatelse, kan du følge prosessen skissert her https://no.player.fm/legal.

In this tech talk, we dive deep into the technical specifics around LLM inference.

The big question is: Why are LLMs slow? How can they be faster? And might slow inference affect UX in the next generation of AI-powered software?

We jump into:

Is fast model inference the real moat for LLM companies?
What are the implications of slow model inference on the future of decentralized and edge model inference?
As demand rises, what will the latency/throughput tradeoff look like?
What innovations on the horizon might massively speed up model inference?

… continue reading

23 episoder

#Tech #Machine Learning #Artificial Intelligence #Society #Philosophy #Daniel Reid Cahn #MLOps

Artwork

LLM Inference Speed (Tech Deep Dive)

Thinking Machines: AI & Philosophy

published 1y ago

Del

MP3•Episoder hjem

Innhold levert av Daniel Reid Cahn. Alt podcastinnhold, inkludert episoder, grafikk og podcastbeskrivelser, lastes opp og leveres direkte av Daniel Reid Cahn eller deres podcastplattformpartner. Hvis du tror at noen bruker det opphavsrettsbeskyttede verket ditt uten din tillatelse, kan du følge prosessen skissert her https://no.player.fm/legal.

In this tech talk, we dive deep into the technical specifics around LLM inference.

The big question is: Why are LLMs slow? How can they be faster? And might slow inference affect UX in the next generation of AI-powered software?

We jump into:

Is fast model inference the real moat for LLM companies?
What are the implications of slow model inference on the future of decentralized and edge model inference?
As demand rises, what will the latency/throughput tradeoff look like?
What innovations on the horizon might massively speed up model inference?

… continue reading

23 episoder

#Tech #Machine Learning #Artificial Intelligence #Society #Philosophy #Daniel Reid Cahn #MLOps

Alle episoder

×

Velkommen til Player FM!

Player FM scanner netter for høykvalitets podcaster som du kan nyte nå. Det er den beste podcastappen og fungerer på Android, iPhone og internett. Registrer deg for å synkronisere abonnement på flere enheter.

Lytt til 500+ tema