Teaching Large Language Models to Reason with Reinforcement Learning with Alex Havrilla - #680

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Player FM - Internet Radio Done Right

1,732 subscribers

Artificial Intelligence

Lagt til seven år siden

Innhold levert av TWIML and Sam Charrington. Alt podcastinnhold, inkludert episoder, grafikk og podcastbeskrivelser, lastes opp og leveres direkte av TWIML and Sam Charrington eller deres podcastplattformpartner. Hvis du tror at noen bruker det opphavsrettsbeskyttede verket ditt uten din tillatelse, kan du følge prosessen skissert her https://no.player.fm/legal.

The 85 South Show with Karlous Miller, DC Young Fly and Chico Bean

1
ICE CUBE in the Trap! | 85 South Show Podcast 52:43

for 6 weeks siden52:43

Spill senere

Lister

Lik

Likt

52:43

West Coast legend Ice Cube pulls up to the trap to talk about his new album and kick it one good tine with Karlous Miller, Chico Bean, DC Young Fly and Clayton English! Off the rip they start talking about DC being in the New Friday movies. Cube takes it all the way back to how he started in Compton and Karlous asks about the lyrics to "Today Was A Good Day!" The squad talks about The Big 3 and the struggle to build an all new league. Cube talks about how the govt opposition to his early music and talks about how he got involved in developing a political plan for Black People. From Mike Epps to Bernie Mac, the conversations sways to talking about how comedians impact the movies. Cube talks "All About The Benjamins" and tells a crazy story from the time he was filming Anaconda with J Lo. This is the coldest podcast! || 85 SOUTH App : www.channeleightyfive.com || Twitter/IG : @85SouthShow || Our Website: www.85southshow.com See omnystudio.com/listener for privacy information.…

for ca. et år siden 46:24

MP3•Episoder hjem

Today we're joined by Alex Havrilla, a PhD student at Georgia Tech, to discuss "Teaching Large Language Models to Reason with Reinforcement Learning." Alex discusses the role of creativity and exploration in problem solving and explores the opportunities presented by applying reinforcement learning algorithms to the challenge of improving reasoning in large language models. Alex also shares his research on the effect of noise on language model training, highlighting the robustness of LLM architecture. Finally, we delve into the future of RL, and the potential of combining language models with traditional methods to achieve more robust AI reasoning.

The complete show notes for this episode can be found at twimlai.com/go/680.

733 episoder

#Artificial Intelligence #Tech News #Artificialintelligence #Machinelearning #Samcharrington #Technology #Thisweekinmachinelearning #Sam Charrington #Thetwimlaipocast #Twimlaipodcast #Tech #News #China #TWIML #Datascience #Science

Teaching Large Language Models to Reason with Reinforcement Learning with Alex Havrilla - #680

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

1,732 subscribers

published for ca. et år siden

Del

MP3•Episoder hjem

The complete show notes for this episode can be found at twimlai.com/go/680.

733 episoder

Tutti gli episodi

1
Evolving MLOps Platforms for Generative AI and Agents with Abhijit Bose - #714 58:08

for 7 dager siden58:08

58:08

Today, we're joined by Abhijit Bose, head of enterprise AI and ML platforms at Capital One to discuss the evolution of the company’s approach and insights on Generative AI and platform best practices. In this episode, we dig into the company’s platform-centric approach to AI, and how they’ve been evolving their existing MLOps and data platforms to support the new challenges and opportunities presented by generative AI workloads and AI agents. We explore their use of cloud-based infrastructure—in this case on AWS—to provide a foundation upon which they then layer open-source and proprietary services and tools. We cover their use of Llama 3 and open-weight models, their approach to fine-tuning, their observability tooling for Gen AI applications, their use of inference optimization techniques like quantization, and more. Finally, Abhijit shares the future of agentic workflows in the enterprise, the application of OpenAI o1-style reasoning in models, and the new roles and skillsets required in the evolving GenAI landscape. The complete show notes for this episode can be found at https://twimlai.com/go/714 .…

1
Why Agents Are Stupid & What We Can Do About It with Dan Jeffries - #713 1:08:49

for 5 weeks siden1:08:49

1:08:49

Today, we're joined by Dan Jeffries, founder and CEO of Kentauros AI to discuss the challenges currently faced by those developing advanced AI agents. We dig into how Dan defines agents and distinguishes them from other similar uses of LLM, explore various use cases for them, and dig into ways to create smarter agentic systems. Dan shared his “big brain, little brain, tool brain” approach to tackling real-world challenges in agents, the trade-offs in leveraging general-purpose vs. task-specific models, and his take on LLM reasoning. We also cover the way he thinks about model selection for agents, along with the need for new tools and platforms for deploying them. Finally, Dan emphasizes the importance of open source in advancing AI, shares the new products they’re working on, and explores the future directions in the agentic era. The complete show notes for this episode can be found at https://twimlai.com/go/713 .…

1
Automated Reasoning to Prevent LLM Hallucination with Byron Cook - #712 56:48

for 6 weeks siden56:48

56:48

Today, we're joined by Byron Cook, VP and distinguished scientist in the Automated Reasoning Group at AWS to dig into the underlying technology behind the newly announced Automated Reasoning Checks feature of Amazon Bedrock Guardrails. Automated Reasoning Checks uses mathematical proofs to help LLM users safeguard against hallucinations. We explore recent advancements in the field of automated reasoning, as well as some of the ways it is applied broadly, as well as across AWS, where it is used to enhance security, cryptography, virtualization, and more. We discuss how the new feature helps users to generate, refine, validate, and formalize policies, and how those policies can be deployed alongside LLM applications to ensure the accuracy of generated text. Finally, Byron also shares the benchmarks they’ve applied, the use of techniques like ‘constrained coding’ and ‘backtracking,’ and the future co-evolution of automated reasoning and generative AI. The complete show notes for this episode can be found at https://twimlai.com/go/712 .…

1
AI at the Edge: Qualcomm AI Research at NeurIPS 2024 with Arash Behboodi - #711 54:47

for 7 weeks siden54:47

54:47

Today, we're joined by Arash Behboodi, director of engineering at Qualcomm AI Research to discuss the papers and workshops Qualcomm will be presenting at this year’s NeurIPS conference. We dig into the challenges and opportunities presented by differentiable simulation in wireless systems, the sciences, and beyond. We also explore recent work that ties conformal prediction to information theory, yielding a novel approach to incorporating uncertainty quantification directly into machine learning models. Finally, we review several papers enabling the efficient use of LoRA (Low-Rank Adaptation) on mobile devices (Hollowed Net, ShiRA, FouRA). Arash also previews the demos Qualcomm will be hosting at NeurIPS, including new video editing diffusion and 3D content generation models running on-device, Qualcomm's AI Hub, and more! The complete show notes for this episode can be found at https://twimlai.com/go/711 .…

1
AI for Network Management with Shirley Wu - #710 53:44

for 9 weeks siden53:44

53:44

Today, we're joined by Shirley Wu, senior director of software engineering at Juniper Networks to discuss how machine learning and artificial intelligence are transforming network management. We explore various use cases where AI and ML are applied to enhance the quality, performance, and efficiency of networks across Juniper’s customers, including diagnosing cable degradation, proactive monitoring for coverage gaps, and real-time fault detection. We also dig into the complexities of integrating data science into networking, the trade-offs between traditional methods and ML-based solutions, the role of feature engineering and data in networking, the applicability of large language models, and Juniper’s approach to using smaller, specialized ML models to optimize speed, latency, and cost. Finally, Shirley shares some future directions for Juniper Mist such as proactive network testing and end-user self-service. The complete show notes for this episode can be found at https://twimlai.com/go/710 .…

1
Why Your RAG System Is Broken, and How to Fix It with Jason Liu - #709 58:03

for 10 weeks siden58:03

58:03

Today, we're joined by Jason Liu, freelance AI consultant, advisor, and creator of the Instructor library to discuss all things retrieval-augmented generation (RAG). We dig into the tactical and strategic challenges companies face with their RAG system, the different signs Jason looks for to identify looming problems, the issues he most commonly encounters, and the steps he takes to diagnose these issues. We also cover the significance of building out robust test datasets, data-driven experimentation, evaluation tools, and metrics for different use cases. We also touched on fine-tuning strategies for RAG systems, the effectiveness of different chunking strategies, the use of collaboration tools like Braintrust, and how future models will change the game. Lastly, we cover Jason’s interest in teaching others how to capitalize on their own AI experience via his AI consulting course . The complete show notes for this episode can be found at https://twimlai.com/go/709 .…

1
An Agentic Mixture of Experts for DevOps with Sunil Mallya - #708 1:15:09

for 11 weeks siden1:15:09

1:15:09

Today we're joined by Sunil Mallya, CTO and co-founder of Flip AI. We discuss Flip’s incident debugging system for DevOps, which was built using a custom mixture of experts (MoE) large language model (LLM) trained on a novel "CoMELT" observability dataset which combines traditional MELT data—metrics, events, logs, and traces—with code to efficiently identify root failure causes in complex software systems. We discuss the challenges of integrating time-series data with LLMs and their multi-decoder architecture designed for this purpose. Sunil describes their system's agent-based design, focusing on clear roles and boundaries to ensure reliability. We examine their "chaos gym," a reinforcement learning environment used for testing and improving the system's robustness. Finally, we discuss the practical considerations of deploying such a system at scale in diverse environments and much more. The complete show notes for this episode can be found at https://twimlai.com/go/708 .…

1
Building AI Voice Agents with Scott Stephenson - #707 1:01:44

for 12 weeks siden1:01:44

1:01:44

Today, we're joined by Scott Stephenson, co-founder and CEO of Deepgram to discuss voice AI agents. We explore the importance of perception, understanding, and interaction and how these key components work together in building intelligent AI voice agents. We discuss the role of multimodal LLMs as well as speech-to-text and text-to-speech models in building AI voice agents, and dig into the benefits and limitations of text-based approaches to voice interactions. We dig into what’s required to deliver real-time voice interactions and the promise of closed-loop, continuously improving, federated learning agents. Finally, Scott shares practical applications of AI voice agents at Deepgram and provides an overview of their newly released agent toolkit. The complete show notes for this episode can be found at https://twimlai.com/go/707 .…

1
Is Artificial Superintelligence Imminent? with Tim Rocktäschel - #706 55:52

for 13 weeks siden55:52

55:52

Today, we're joined by Tim Rocktäschel, senior staff research scientist at Google DeepMind, professor of Artificial Intelligence at University College London, and author of the recently published popular science book, “ Artificial Intelligence: 10 Things You Should Know .” We dig into the attainability of artificial superintelligence and the path to achieving generalized superhuman capabilities across multiple domains. We discuss the importance of open-endedness in developing autonomous and self-improving systems, as well as the role of evolutionary approaches and algorithms. Additionally, we cover Tim’s recent research projects such as “Promptbreeder,” “Debating with More Persuasive LLMs Leads to More Truthful Answers,” and more. The complete show notes for this episode can be found at https://twimlai.com/go/706 .…

1
ML Models for Safety-Critical Systems with Lucas García - #705 1:16:06

for 14 weeks siden1:16:06

1:16:06

Today, we're joined by Lucas García, principal product manager for deep learning at MathWorks to discuss incorporating ML models into safety-critical systems. We begin by exploring the critical role of verification and validation (V&V) in these applications. We review the popular V-model for engineering critical systems and then dig into the “W” adaptation that’s been proposed for incorporating ML models. Next, we discuss the complexities of applying deep learning neural networks in safety-critical applications using the aviation industry as an example, and talk through the importance of factors such as data quality, model stability, robustness, interpretability, and accuracy. We also explore formal verification methods, abstract transformer layers, transformer-based architectures, and the application of various software testing techniques. Lucas also introduces the field of constrained deep learning and convex neural networks and its benefits and trade-offs. The complete show notes for this episode can be found at https://twimlai.com/go/705 .…

1
AI Agents: Substance or Snake Oil with Arvind Narayanan - #704 54:22

for 15 weeks siden54:22

54:22

Today, we're joined by Arvind Narayanan, professor of Computer Science at Princeton University to discuss his recent works, AI Agents That Matter and AI Snake Oil . In “AI Agents That Matter”, we explore the range of agentic behaviors, the challenges in benchmarking agents, and the ‘capability and reliability gap’, which creates risks when deploying AI agents in real-world applications. We also discuss the importance of verifiers as a technique for safeguarding agent behavior. We then dig into the AI Snake Oil book, which uncovers examples of problematic and overhyped claims in AI. Arvind shares various use cases of failed applications of AI, outlines a taxonomy of AI risks, and shares his insights on AI’s catastrophic risks. Additionally, we also touched on different approaches to LLM-based reasoning, his views on tech policy and regulation, and his work on CORE-Bench , a benchmark designed to measure AI agents' accuracy in computational reproducibility tasks. The complete show notes for this episode can be found at https://twimlai.com/go/704 .…

1
AI Agents for Data Analysis with Shreya Shankar - #703 48:24

for 16 weeks siden48:24

48:24

Today, we're joined by Shreya Shankar, a PhD student at UC Berkeley to discuss DocETL , a declarative system for building and optimizing LLM-powered data processing pipelines for large-scale and complex document analysis tasks. We explore how DocETL's optimizer architecture works, the intricacies of building agentic systems for data processing, the current landscape of benchmarks for data processing tasks, how these differ from reasoning-based benchmarks, and the need for robust evaluation methods for human-in-the-loop LLM workflows. Additionally, Shreya shares real-world applications of DocETL, the importance of effective validation prompts, and building robust and fault-tolerant agentic systems. Lastly, we cover the need for benchmarks tailored to LLM-powered data processing tasks and the future directions for DocETL. The complete show notes for this episode can be found at https://twimlai.com/go/703 .…

1
Stealing Part of a Production Language Model with Nicholas Carlini - #702 1:03:30

for 17 weeks siden1:03:30

1:03:30

Today, we're joined by Nicholas Carlini, research scientist at Google DeepMind to discuss adversarial machine learning and model security, focusing on his 2024 ICML best paper winner, “ Stealing part of a production language model .” We dig into this work, which demonstrated the ability to successfully steal the last layer of production language models including ChatGPT and PaLM-2. Nicholas shares the current landscape of AI security research in the age of LLMs, the implications of model stealing, ethical concerns surrounding model privacy, how the attack works, and the significance of the embedding layer in language models. We also discuss the remediation strategies implemented by OpenAI and Google, and the future directions in the field of AI security. Plus, we also cover his other ICML 2024 best paper, “ Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining ,” which questions the use and promotion of differential privacy in conjunction with pre-trained models. The complete show notes for this episode can be found at https://twimlai.com/go/702 .…

1
Supercharging Developer Productivity with ChatGPT and Claude with Simon Willison - #701 1:14:15

for 18 weeks siden1:14:15

1:14:15

Today, we're joined by Simon Willison, independent researcher and creator of Datasette to discuss the many ways software developers and engineers can take advantage of large language models (LLMs) to boost their productivity. We dig into Simon’s own workflows and how he uses popular models like ChatGPT and Anthropic’s Claude to write and test hundreds of lines of code while out walking his dog. We review Simon’s favorite prompting and debugging techniques, his strategies for sidestepping the limitations of contemporary models, how he uses Claude’s Artifacts feature for rapid prototyping, his thoughts on the use and impact of vision models, the role he sees for open source models and local LLMs, and much more. The complete show notes for this episode can be found at https://twimlai.com/go/701 .…

1
Automated Design of Agentic Systems with Shengran Hu - #700 59:30

for 20 weeks siden59:30

59:30

Today, we're joined by Shengran Hu, a PhD student at the University of British Columbia, to discuss Automated Design of Agentic Systems (ADAS) , an approach focused on automatically creating agentic system designs. We explore the spectrum of agentic behaviors, the motivation for learning all aspects of agentic system design, the key components of the ADAS approach, and how it uses LLMs to design novel agent architectures in code. We also cover the iterative process of ADAS, its potential to shed light on the behavior of foundation models, the higher-level meta-behaviors that emerge in agentic systems, and how ADAS uncovers novel design patterns through emergent behaviors, particularly in complex tasks like the ARC challenge. Finally, we touch on the practical applications of ADAS and its potential use in system optimization for real-world tasks. The complete show notes for this episode can be found at https://twimlai.com/go/700 .…

Velkommen til Player FM!

Player FM scanner netter for høykvalitets podcaster som du kan nyte nå. Det er den beste podcastappen og fungerer på Android, iPhone og internett. Registrer deg for å synkronisere abonnement på flere enheter.

Lytt til 500+ tema

1,732 subscribers

Lik The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

2024 Panini PRIZM NFL Football Trading Cards 24-Count Retail Box

5 Pack Donald Trump Gold Plated Coin, Seal of The President Challenge Coins, Commemorative Gift with Case and Stand

$25 PlayStation Store Gift Card [Digital Code]

Podcaster verdt å lytte til

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) « » Teaching Large Language Models to Reason with Reinforcement Learning with Alex Havrilla - #680

Teaching Large Language Models to Reason with Reinforcement Learning with Alex Havrilla - #680

Podcaster verdt å lytte til

Velkommen til Player FM!

Amazon Basics Dog and Puppy Pee Pads with Leak-Proof Quick-Dry Design for Potty Training, Standard Absorbency, Regular Size, 22 x 22 Inches, Pack of 100, Blue & White

Ailun 3 Pack Screen Protector for iPhone 16 Pro Max [6.9 inch] + 3 Pack Camera Lens Protector with Installation Frame,Sensor Protection,Dynamic Island Compatible,Case Friendly Tempered Glass Film

Snow Joe Ice & Snow Melt for Driveway, Concrete & Sidewalk, Melt-2-Go, 25-Pound Bag, Boxed

Balloonerism (Amazon Exclusive)

Lik The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Hurtigreferanseguide

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) « »
Teaching Large Language Models to Reason with Reinforcement Learning with Alex Havrilla - #680