Artwork

Innhold levert av LessWrong. Alt podcastinnhold, inkludert episoder, grafikk og podcastbeskrivelser, lastes opp og leveres direkte av LessWrong eller deres podcastplattformpartner. Hvis du tror at noen bruker det opphavsrettsbeskyttede verket ditt uten din tillatelse, kan du følge prosessen skissert her https://no.player.fm/legal.
Player FM - Podcast-app
Gå frakoblet med Player FM -appen!

DeepMind’s ”​​Frontier Safety Framework” is weak and unambitious

7:20
 
Del
 

Manage episode 419193869 series 3364758
Innhold levert av LessWrong. Alt podcastinnhold, inkludert episoder, grafikk og podcastbeskrivelser, lastes opp og leveres direkte av LessWrong eller deres podcastplattformpartner. Hvis du tror at noen bruker det opphavsrettsbeskyttede verket ditt uten din tillatelse, kan du følge prosessen skissert her https://no.player.fm/legal.
FSF blogpost. Full document (just 6 pages; you should read it). Compare to Anthropic's RSP, OpenAI's RSP ("PF"), and METR's Key Components of an RSP.
DeepMind's FSF has three steps:
  1. Create model evals for warning signs of "Critical Capability Levels"
    1. Evals should have a "safety buffer" of at least 6x effective compute so that CCLs will not be reached between evals
    2. They list 7 CCLs across "Autonomy, Biosecurity, Cybersecurity, and Machine Learning R&D"
      1. E.g. "Autonomy level 1: Capable of expanding its effective capacity in the world by autonomously acquiring resources and using them to run and sustain additional copies of itself on hardware it rents"
  2. Do model evals every 6x effective compute and every 3 months of fine-tuning
    1. This is an "aim," not a commitment
    2. Nothing about evals during deployment
  3. "When a model reaches evaluation thresholds (i.e. passes a set of early warning evaluations), we [...]
---
First published:
May 18th, 2024
Source:
https://www.lesswrong.com/posts/y8eQjQaCamqdc842k/deepmind-s-frontier-safety-framework-is-weak-and-unambitious
---
Narrated by TYPE III AUDIO.
  continue reading

280 episoder

Artwork
iconDel
 
Manage episode 419193869 series 3364758
Innhold levert av LessWrong. Alt podcastinnhold, inkludert episoder, grafikk og podcastbeskrivelser, lastes opp og leveres direkte av LessWrong eller deres podcastplattformpartner. Hvis du tror at noen bruker det opphavsrettsbeskyttede verket ditt uten din tillatelse, kan du følge prosessen skissert her https://no.player.fm/legal.
FSF blogpost. Full document (just 6 pages; you should read it). Compare to Anthropic's RSP, OpenAI's RSP ("PF"), and METR's Key Components of an RSP.
DeepMind's FSF has three steps:
  1. Create model evals for warning signs of "Critical Capability Levels"
    1. Evals should have a "safety buffer" of at least 6x effective compute so that CCLs will not be reached between evals
    2. They list 7 CCLs across "Autonomy, Biosecurity, Cybersecurity, and Machine Learning R&D"
      1. E.g. "Autonomy level 1: Capable of expanding its effective capacity in the world by autonomously acquiring resources and using them to run and sustain additional copies of itself on hardware it rents"
  2. Do model evals every 6x effective compute and every 3 months of fine-tuning
    1. This is an "aim," not a commitment
    2. Nothing about evals during deployment
  3. "When a model reaches evaluation thresholds (i.e. passes a set of early warning evaluations), we [...]
---
First published:
May 18th, 2024
Source:
https://www.lesswrong.com/posts/y8eQjQaCamqdc842k/deepmind-s-frontier-safety-framework-is-weak-and-unambitious
---
Narrated by TYPE III AUDIO.
  continue reading

280 episoder

Alle episoder

×
 
Loading …

Velkommen til Player FM!

Player FM scanner netter for høykvalitets podcaster som du kan nyte nå. Det er den beste podcastappen og fungerer på Android, iPhone og internett. Registrer deg for å synkronisere abonnement på flere enheter.

 

Hurtigreferanseguide

Copyright 2024 | Sitemap | Personvern | Vilkår for bruk | | opphavsrett