AI x-risk model

N.B.: This was a failed experiment and thus is stuck in perpetual draft. Leaving up for fun, but please don’t take this as my current opinions or even my opinions at the time - just an effort to explore the space.

This is my model of AI x-risk in the next 50 years.

My goal with this document is to build a gearsy model of AI x-risk. I plan to edit this document as I learn.

The percentages haven’t necessarily been very well thought through. If anything sounds way off, please let me know.

Model

Humans will become extinct due to superintelligence in the next 50 years. 30% (>95% and >95% and 50% and 65% and >95%)

  1. It is possible to create a superintelligence. >95% (>95% and >95%)

    • Superintelligence: An AI system that is extremely competent at accomplishing its goals in the real world, whatever those goals may be. See defining superintelligence.
    • N.B.: Emphasis on the possible. This section tries to communicate that this is something that actually could happen: it’s not just sci-fi. Later sections will argue that this could happen soon.
      1. Its physically possible to make classical human-level AI systems. 80% (>95% and 80%)
      2. The Computational theory of mind is true. >95%
      1. Quantum computing is not critical to intelligence. 80% - I’m very uncertain about this, as I’ve done little-to-no reading on the topic. This is subject to change massively.
      1. We can make human-level AI systems into superintelligent AI systems. >95% (90% or 70% or >95%)
      2. There’s nothing special about the level of human intelligence. >95% - We can make systems that are better than humans in narrow domains (e.g. AlphaGo versus Lee Sedol). - There’s variance in intelligence across individuals. If there was a hard limit, why do we see human intelligence vary so much?
      1. Human-level AI systems can be made incrementally more intelligent. >95% (90% or 70% or 90%) 1. Scaling AI systems up will increase intelligence. 90% - The AI system will scale, e.g. training it for longer/making it larger/using more data will make it smarter. We see this all the time in smaller models. 2. Increasing speed will increase intelligence. 70% - A researcher given ten years will achieve a lot more than a researcher given two years. - Speeding up is equivalent to giving more time. - However: - This approach to increasing intelligence has a much clearer ceiling than other approaches. I’d bet the majority of people couldn’t come up with \(E=MC^2\) given all the time in the world. - Intelligence is somewhat bottlenecked on interactions with the environment, for example waiting for physical experiments to be carried out. However I think this bottleneck is not significant. 3. Running multiple instances will increase intelligence. 90% - A large group of researchers will achieve a lot more than a single researcher. - However: Larger groups have more diversity in thoughts, approaches, experience, etc., which may improve their ability to make progress on a problem. It isn’t obvious that AI systems will offer the same diversity.
      2. Intelligence improvement has a positive feedback loop. 90% - Scaling up, speeding up, and running in parallel all benefit from additional research. - Therefore, increasing intelligence will support increasing intelligence.
      3. The positive feedback loop will take us all the way to superintelligence. 90%
  2. A superintelligence would have power over humans. >95% (>95% and >95%)

    • Power: If the superintelligence’s and the human’s goals conflict, then humans have no say; the superintelligence’s goals take preference.
      1. The superintelligence will operate in the world, as we won’t be able to sandbox it. >95% (>95% or >95% or 80%)
      2. There will be a way to break out of the sandbox, as humans can’t write bug free code. >95%
      1. Humans will let a superintelligence out of the sandbox due to social engineering. >95%
      2. Humans will let a superintelligence out due to ignorance of risks. 80%
      1. If humans try to do anything that compromises the superintelligence’s goals, it will be able to stop humans. >95%
  3. We will develop superintelligence in the next 50 years. 50% (70% and 90% and 75% and >95%)

    1. We will discover all technical insights for building superintelligence in the next 50 years. 70% (50% or 35%)
    2. All key insights are already discovered, i.e. the scaling hypothesis is correct. 50%
    1. Key insights remain, but will be discovered. 35% (40%*80% + 10%*30%) 1. Key insights remain within the current paradigm. 40% - Current paradigm: By this, I mean the research that has previously advanced the state-of-the-art in machine learning (e.g. ReLU, Transformers). 2. We will discover remaining key insights within the current paradigm, given (1). 80% 3. Key insights remain outside of the current paradigm. 10% 4. We will discover remaining key insights outside of the current paradigm, given (3). 30%
    1. We will have the compute resources to create superintelligence. 90%
    2. We will have the data resources to create superintelligence. 75%
    3. If we can create superintelligence, someone will. >95%
  4. Superintelligences will be unaligned. 65% (not(30% or (70% and 10%)))

    • Unaligned: Its values (i.e. goals) will not be the same as human values.
      1. The superintelligence is aligned by default. 30%
      • See John S. Wentworth’s post.
        1. The creators of the superintelligence try to align it. 70%
      • It looks like DeepMind and OpenAI are at least planning to try to align their AI systems. Facebook AI Research does not, and I’m very uncertain about other AI firms.
        1. The alignment strategy works. 10%
      • N.B.: The 10% figure comes from a general feeling of (1) how hard alignment is and (2) how much progress has been made recently. I plan to formalize this more and amend this section with an updated figure and more detailed explanation.
  5. Power + unaligned = Human extinction. >95%

    • The superintelligence has power over humans; there is no need to compromise.
    • The superintelligence is unaligned with humans, and a lot of things like resources are zero-sum; we will be in direct competition.
    • (It’s worth mentioning that if the superintelligence feels the need to let us survive and flourish in any kind of way then the superintelligence is aligned.)

Key

Model philosophy

The goal is ambitious: Create a complete and succinct argument for AI x-risk. My test for completeness is that if a sceptic looks at this model, they should be able to identify which argument(s) they disagree with. They should be able to change a >95% into <5% or vice versa and arrive at a much lower x-risk. If they have arguments that fall outside of this model, then it isn’t complete.

This is very hard to do, and I’m certain I’ve fallen very, very short. But this feels like a good philosophy to have when building a model.

A non-philosophy is for this model to have infinite depth. Sure, I’d like to elaborate on sections as I learn more about certain parts of the problem, but I’m fine with e.g. the compute argument being very shallow.

Appendix

Defining superintelligence

A lot of AI x-risk arguments get stuck on defining superintelligence, or Artificial General Intelligence, or Transformative AI, or…

I’d like to sidestep these discussions by purposefully going for a circular definition: We should be worried about AI that is dangerous enough to make us worried. It’s sufficiently good at tasks that give it a lot of power (e.g. acquiring lots of money on the stock market, deceiving humans, building biological weapons…).

Have a bigger prior for weird things happening

This is a strong claim: most people’s have a very low prior on crazy stuff like AI killing everyone happening. But Holden Karnofsky’s writing on the Most Important Century is insightful here: We should expect things to get strange.

Known issues