DeepMind’s ”Frontier Safety Framework” is weak and unambitious
MP3•Episoder hjem
Manage episode 419193869 series 3364758
Innhold levert av LessWrong. Alt podcastinnhold, inkludert episoder, grafikk og podcastbeskrivelser, lastes opp og leveres direkte av LessWrong eller deres podcastplattformpartner. Hvis du tror at noen bruker det opphavsrettsbeskyttede verket ditt uten din tillatelse, kan du følge prosessen skissert her https://no.player.fm/legal.
FSF blogpost. Full document (just 6 pages; you should read it). Compare to Anthropic's RSP, OpenAI's RSP ("PF"), and METR's Key Components of an RSP.
DeepMind's FSF has three steps:
First published:
May 18th, 2024
Source:
https://www.lesswrong.com/posts/y8eQjQaCamqdc842k/deepmind-s-frontier-safety-framework-is-weak-and-unambitious
---
Narrated by TYPE III AUDIO.
…
continue reading
DeepMind's FSF has three steps:
- Create model evals for warning signs of "Critical Capability Levels"
- Evals should have a "safety buffer" of at least 6x effective compute so that CCLs will not be reached between evals
- They list 7 CCLs across "Autonomy, Biosecurity, Cybersecurity, and Machine Learning R&D"
- E.g. "Autonomy level 1: Capable of expanding its effective capacity in the world by autonomously acquiring resources and using them to run and sustain additional copies of itself on hardware it rents"
- Do model evals every 6x effective compute and every 3 months of fine-tuning
- This is an "aim," not a commitment
- Nothing about evals during deployment
- "When a model reaches evaluation thresholds (i.e. passes a set of early warning evaluations), we [...]
First published:
May 18th, 2024
Source:
https://www.lesswrong.com/posts/y8eQjQaCamqdc842k/deepmind-s-frontier-safety-framework-is-weak-and-unambitious
---
Narrated by TYPE III AUDIO.
280 episoder