Was ChatGPT a good idea? W04
Arkivert serier ("Inaktiv feed" status)
When? This feed was archived on February 03, 2024 00:52 (). Last successful fetch was on March 31, 2023 18:30 ()
Why? Inaktiv feed status. Våre servere kunne ikke hente en gyldig podcast feed for en vedvarende periode.
What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.
Manage episode 353674240 series 3406958
In this week’s ML & AI Safety Update, we hear Paul Christiano’s take on one of OpenAI’s main alignment strategies, dive into the second round winners of the inverse scaling prize and share the many fascinating projects from our mechanistic interpretability hackathon!
Opportunities (https://ais.pub/aistraining)
- Deadline is coming up in 10 days for PIBBSS: https://ais.pub/pibbss
- EAG London is coming up in May: https://ais.pub/eag
- Introduction to ML safety: https://ais.pub/gt2
- Alignment competitions: https://ais.pub/aawards
Sources
- RLHF 2015: https://ai-alignment.com/efficient-feedback-a347748b1557
- Christiano on RLHF: https://www.alignmentforum.org/posts/vwu4kegAEZTBtpT6p/thoughts-on-the-impact-of-rlhf-research
- Inverse scaling prize winners: https://www.lesswrong.com/posts/DARiTSTx5xDLQGrrz/inverse-scaling-prize-second-round-winners
- We discovered “ an” neuron: https://itch.io/jam/mechint/rate/1890024
- Identifying a preliminary circuit for predicting gendered pronouns in GPT-2 small with the automatic circuit identification algorithm: https://itch.io/jam/mechint/rate/1889871
- Automated identification of potential feature neurons: https://itch.io/jam/mechint/rate/1889215
- Soft prompts are a convex set: https://itch.io/jam/mechint/rate/1889669
- Mentaleap team https://mentaleap.ai/
- Prompt tuning: https://arxiv.org/abs/2104.08691
- Results page: https://itch.io/jam/mechint/results
25 episoder