Netflix, Chaos Monkey, and Preparing for the Worst

By Stefan Auvache

On April 21, 2011, an entire Amazon AWS availability zone went down, taking a large chunk of the internet down with it. Companies like Reddit, Foursquare, and Quora lost their internet presence with no idea how long it would take to get it back.

Netflix—one of Amazon's biggest clients—was left untouched by the outage. While other AWS customers were powerless to get their systems back online, Netflix went about business as usual.

A few years before the outage, the IT team at Netflix decided that it would be a good idea to prepare for the worst. They wanted to make Netflix more robust and secure. They wanted to know what they could do to prevent catastrophic, massively expensive system failures.

So, they built Chaos Monkey.

Chaos Monkey

Chaos Monkey is a program. It looks at all of the technical resources that Netflix uses to stay up and running and starts randomly shutting them down.

When the team fired up Chaos Monkey in their testing environment for the first time, the result was—as intended—chaos. The site went down. Several features stopped working. All manner of unforeseeable problems occurred on the company's technical systems.

By forcing the system to buckle and break, engineers at Netflix discovered a series of previously-unknown weaknesses and vulnerabilities. They were then able to make specific improvements that would prevent the same chaos from ensuing if something went haywire with their actual, customer-facing website.

Years later, when the AWS outage occurred, Netflix was ready.

Premortems and Preparing for Chaos

Chaos Monkey was the result of mindful thinking about the future.

In project management, this often takes the form of a premortem—a planning exercise where a team tries to imagine all of the different ways that a project could fail. They then work backward to prevent and solve potential problems before they occur.

Premortems are useful on both a personal and a professional level.

While we can't run an automated program to detect all of the potential, unforeseen problems in our lives, we can think about the things that could go wrong in any given situation. For instance, what would happen if you lost your job? What if you weren't admitted to your chosen university or if the housing market crashed while you were trying to sell your home? Questions like these empower preparation. An ounce of temporary pessimism followed by problem-solving and fortification can help you weather future storms and protect the things that are most important to you.

Here is a practical example:

Imagine you are about to graduate from business school and are looking to start your career. You have a specific company that you really want to work for, and you send out your application. Instead of waiting and hoping for the best, you run a premortem. The project, in this case, is landing your dream job.

First, you make a list of everything that could go wrong:

What if there is a hiring freeze at your dream company?

What if a recruiter doesn't respond to your email after a few days?

What if someone else gets the job?

What if you don't have enough experience to get hired?

In asking these questions, you create a structure for problem-solving and risk mitigation. You can do research on other similar companies to expand employment options. You can network with alumni at your dream company to learn about hiring patterns and skills to develop for the job. You can apply to other positions at the company that provide opportunity to move to your desired position down the line.

By running through potential problems and solutions ahead of time, you prepare yourself to handle adversity well. You reduce the risks associated with failure, give yourself options when things don't go exactly according to plan, and prevent potential catastrophe.

You Can Prevent Catastrophic Failures

Your life is the product of how you respond to the chaos around you. Just like Netflix did with Chaos Monkey, you can gain perspective and control by proactively preparing for the challenges that lie ahead.

We are capable of changing the future. Solving problems before they occur turns anxiety into confidence and panic into clarity. Use premortems to gain perspective. Find your weaknesses. Strengthen them before they cause problems. Prevent catastrophic failures by thinking about the future and preparing accordingly.

What things are out of your control? What can you do to prevent them from creating chaos?


Enjoy this article? Share it and subscribe for weekly insights.

  • Share on X
  • Share on LinkedIn
  • Share on Facebook
  • Share on Reddit

One Email, One Idea, Every Week

Join Food for Thought—a weekly email about the iterative approach to building a fulfilling life.

Articles

AI Strategies to Safeguard Personal Development

AI can make you far more productive, but it can also cause valuable skills to atrophy. By focusing on understanding, reinvesting time saved into deeper work, and collaborating with AI intelligently, you can improve skills while taking full advantage of AI's power.

Measure What Is in Your Control

Stephen King has written dozens of bestsellers, sold over 350 million books, and built a net worth north of $500 million. While impressive, these are metrics he pays little attention to. As an author, there is only one metric that King pays attention to—words written per day.

The Ninety-Ninety Rule and Overcoming Unplanned Work

Every project takes longer than expected. Unplanned work derails progress, but it doesn't have to. Gain visibility, double your timelines, and triage like a pro to stay productive and in control.

Agile Development: A Pattern for Improvement

Stripped of business and coding jargon, Agile Development is an incredible framework for self-improvement. Make a plan to get a little closer to where you want to be. Act on that plan. Measure the outcome of your actions. Then, use what you have learned to adjust your vision for the future and plan your next move.