Reading list
Stack ranked
Backlog
- Open Problems in Cooperative AI.
- Density of States Estimation for Out-of-Distribution Detection.
- CICERO.
- Threat Model Literature Review - LessWrong.
- Geometric Rationality.
- AlphaFold 2 paper.
- AGI debate
New
- Delay, Detect, Defend: Preparing for a Future in which Thousands Can Release New Pandemics by Kevin Esvelt
- Orthogonality Thesis - Arbital
- Musings on the Speed Prior - LessWrong
- TODO: Find writing on why we wouldn’t expect consequentialism to be convergent.
- TODO: Find writing on why uncertainity in values wouldn’t work.
- DeepMind: Generally capable agents emerge from open-ended play - LessWrong
- Mechanistic anomaly detection and ELK - LessWrong
- The limited upside of interpretability - LessWrong
- How can we develop transformative tools for thought?
- Staring into the abyss as a core life skill - LessWrong
- Shard Theory in Nine Theses: a Distillation and Critical Appraisal - AI Alignment Forum
- Shard Theory: An Overview - LessWrong
- The “Minimal Latents” Approach to Natural Abstractions - LessWrong
- Why have Sex? Information Acquisition and Evolution.
- The story of VaccinateCA - Works in Progress
- How to do theoretical research, a personal perspective - LessWrong
- [1906.01820] Risks from Learned Optimization in Advanced Machine Learning Systems
- [2210.01892] Polysemanticity and Capacity in Neural Networks
- Finite Factored Sets - LessWrong
- [2111.06206] Towards Axiomatic, Hierarchical, and Symbolic Explanation for Deep Models
- [2301.05062] Tracr: Compiled Transformers as a Laboratory for Interpretability