Artwork

Innhold levert av Brian Carter. Alt podcastinnhold, inkludert episoder, grafikk og podcastbeskrivelser, lastes opp og leveres direkte av Brian Carter eller deres podcastplattformpartner. Hvis du tror at noen bruker det opphavsrettsbeskyttede verket ditt uten din tillatelse, kan du følge prosessen skissert her https://no.player.fm/legal.
Player FM - Podcast-app
Gå frakoblet med Player FM -appen!

OpenAI's o1 and Journey Learning

7:28
 
Del
 

Manage episode 444738224 series 3605861
Innhold levert av Brian Carter. Alt podcastinnhold, inkludert episoder, grafikk og podcastbeskrivelser, lastes opp og leveres direkte av Brian Carter eller deres podcastplattformpartner. Hvis du tror at noen bruker det opphavsrettsbeskyttede verket ditt uten din tillatelse, kan du følge prosessen skissert her https://no.player.fm/legal.

This paper details the authors' research journey to replicate OpenAI's "O1" language model, which is designed to solve complex reasoning tasks. The researchers document their process with detailed insights, hypotheses, and challenges encountered. They present a novel paradigm called "Journey Learning" that enables models to learn the complete exploration process, including trial and error, reflection, and backtracking, which they argue outperforms traditional "shortcut learning" methods. The authors also propose a multi-step evaluation approach that utilizes reasoning trees, reward models, and a human-AI collaborative annotation pipeline to generate high-quality long-form reasoning data.

Read more: https://github.com/GAIR-NLP/O1-Journey/blob/main/resource/report.pdf

  continue reading

71 episoder

Artwork
iconDel
 
Manage episode 444738224 series 3605861
Innhold levert av Brian Carter. Alt podcastinnhold, inkludert episoder, grafikk og podcastbeskrivelser, lastes opp og leveres direkte av Brian Carter eller deres podcastplattformpartner. Hvis du tror at noen bruker det opphavsrettsbeskyttede verket ditt uten din tillatelse, kan du følge prosessen skissert her https://no.player.fm/legal.

This paper details the authors' research journey to replicate OpenAI's "O1" language model, which is designed to solve complex reasoning tasks. The researchers document their process with detailed insights, hypotheses, and challenges encountered. They present a novel paradigm called "Journey Learning" that enables models to learn the complete exploration process, including trial and error, reflection, and backtracking, which they argue outperforms traditional "shortcut learning" methods. The authors also propose a multi-step evaluation approach that utilizes reasoning trees, reward models, and a human-AI collaborative annotation pipeline to generate high-quality long-form reasoning data.

Read more: https://github.com/GAIR-NLP/O1-Journey/blob/main/resource/report.pdf

  continue reading

71 episoder

Alle episoder

×
 
Loading …

Velkommen til Player FM!

Player FM scanner netter for høykvalitets podcaster som du kan nyte nå. Det er den beste podcastappen og fungerer på Android, iPhone og internett. Registrer deg for å synkronisere abonnement på flere enheter.

 

Hurtigreferanseguide

Copyright 2025 | Sitemap | Personvern | Vilkår for bruk | | opphavsrett
Lytt til dette showet mens du utforsker
Spill