Gå frakoblet med Player FM -appen!
Mission Impossible: A Statistical Perspective on Jailbreaking LLMs
Manage episode 432612332 series 3524393
This paper analyzes preference alignment and jailbreaking in large language models, proposing E-RLHF as a cost-effective method to enhance safety without compromising performance.
https://arxiv.org/abs//2408.01420
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
--- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/support
1611 episoder
Manage episode 432612332 series 3524393
This paper analyzes preference alignment and jailbreaking in large language models, proposing E-RLHF as a cost-effective method to enhance safety without compromising performance.
https://arxiv.org/abs//2408.01420
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
--- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/support
1611 episoder
Alle episoder
×Velkommen til Player FM!
Player FM scanner netter for høykvalitets podcaster som du kan nyte nå. Det er den beste podcastappen og fungerer på Android, iPhone og internett. Registrer deg for å synkronisere abonnement på flere enheter.