AI Pulse
news

AlphaGo: The Move That Changed Everything

A 3,000-word deep dive into the 2016 Lee Sedol match. Exploring Move 37, Monte Carlo Tree Search, and the birth of Reinforcement Learning's golden age.

AI Historian
22 min read
AlphaGo: The Move That Changed Everything

1-in-10,000

In March 2016, 200 million people tuned in to watch a board game. On one side was Lee Sedol, an 18-time world champion and the undisputed king of Go—a game so complex that there are more possible positions than there are atoms in the observable universe. On the other side was a tower of servers in a Google data center running a program called AlphaGo.

Most experts predicted that Lee Sedol would win 5-0. They believed that Go required "intuition" and "soul"—things a machine could never have. They were wrong. AlphaGo won 4-1, but it was how it won that changed the course of AI history forever.


1. Why Go was the "Grand Challenge"

For decades, Chess was the benchmark for AI. When IBM's Deep Blue beat Garry Kasparov in 1997, many thought AI had "won." But Go is fundamentally different.

  • Tree Complexity: In Chess, there are about 20 possible moves per turn. In Go, there are 250.
  • Evaluation: In Chess, you can "score" a position (e.g., a Queen is worth 9 points). In Go, the value of a stone depends on its relationship to every other stone on the board. You can't calculate a score; you can only "feel" the influence.

Traditional AI, which used "Brute Force" search, was useless in Go. To win, a machine had to learn to think like a human, or rather, think better than one.


2. The Move 37 Moment: Alien Intuition

The turning point of Game 2 became the most analyzed sequence in the history of AI.

At Move 37, AlphaGo played a stone on the 5th line—a move that no professional Go player would ever make in that context. In Go, the 5th line is "too far in"; it’s considered a strategic blunder.

  • The Reaction: Lee Sedol literally stood up and walked out of the room. The commentators thought the machine had "glitched."
  • The Reality: AlphaGo had calculated that while Move 37 looked weak in the short term, its "influence" on the center of the board would be decisive 50 turns later. The AI wasn't playing "against" Lee Sedol; it was playing a version of Go that humans hadn't discovered yet.

This was the birth of Computational Creativity. The machine had found a strategy that went against 2,500 years of human tradition.


3. The Tech: MCTS and the Two Networks

AlphaGo’s "brain" consisted of three distinct parts working in harmony:

I. The Policy Network

This network was trained on 30 million moves from human games. It acted like a "Shortcut Finder," telling the AI which 3 or 4 moves were most likely to be played by a human expert.

II. The Value Network

This network didn't look at moves; it looked at the board and predicted who would win from that position. It was the AI’s "Gut Feeling."

III. Monte Carlo Tree Search (MCTS)

This was the "Logical Engine." It used the Policy Network to look ahead thousands of moves, simulating various "Futures" to see which one resulted in the highest probability of winning according to the Value Network.


4. The Response: Lee Sedol’s Move 78

In Game 4, Lee Sedol did the impossible: he "glitched" the machine. With Move 78 (often called "The Divine Move"), Lee played a wedge move in the center that AlphaGo had calculated as having a less than 1-in-10,000 probability of being played by a human. Because the AI hadn't seen this move in its "Branching Tree," its Value Network collapsed. It began making nonsensical moves, and Lee Sedol secured his only victory.

This proved that while AI is superior at pattern matching, it can be "blinded" by human novelty.


5. From AlphaGo to AlphaZero: Removing the Human

The most shocking development came a year later with AlphaGo Zero. Unlike the original, which was trained on human games, AlphaGo Zero was given only the rules of the game. It played against itself millions of times, starting from total randomness.

  • Three Days: It reached the level of the version that beat Lee Sedol.
  • 40 Days: It became the most powerful Go player in history, winning 100-0 against every previous version.

By removing the "Human Training Data," DeepMind realized that human knowledge was actually a limitation. The machine was able to discover a purer, alien form of logic because it wasn't biased by human mistakes.


6. The Legacy in 2025: From Games to Science

In 2025, we can see that AlphaGo wasn't really about a board game. It was a Proof of Concept for General Intelligence.

  • AlphaFold: The same "Self-Play" and "Reinforcement Learning" logic used to win at Go was adapted to "Fold" proteins. AlphaFold has now predicted the structures of 200 million proteins, a feat that would have taken human scientists centuries. (See our AlphaFold 3 Guide)
  • AlphaGeometry: DeepMind's latest system uses the same "Search + Neural Net" architecture to solve Olympic-level geometry problems that require human-like "leaps of logic."

Conclusion

AlphaGo was the event that woke the world up to the power of Reinforcement Learning. It showed us that we have created something that doesn't just "copy" us, but can "transcend" us.

Lee Sedol retired from professional Go in 2019, stating that AI is "an entity that cannot be defeated." However, for the rest of the world, AlphaGo was a message of hope. If a machine can find a "1-in-10,000" move on a Go board, it can find a "1-in-10,000" molecule to cure cancer or a "1-in-10,000" formula for clean energy.

The Go board was just the beginning. The world is the next game.

Subscribe to AI Pulse

Get the latest AI news and research delivered to your inbox weekly.