2 min read

Don't run from failure... look for it in as many ways as you can

Don't run from failure... look for it in as many ways as you can

We can trace back a lot of problems in business — as well as in life — to one big, though all-too-human, mistake: our frequent failure to recognize that some things simply aren’t within our control. Rather than accept that unpleasant yet undeniable fact, many of us ignore or gloss over negative developments that we can’t change… which, ultimately, of course, ends up making things far worse over time.

Health provides a good illustration of this. How many people have you known that developed a clearly worrying symptom or pain, yet put off going to see a doctor because they kept saying, “It’s probably nothing,” or simply didn’t want to talk about it at all? This type of response usually doesn’t end well.

So what does this have to do with digital business and customer service? Well, successful companies, it turns out, are the ones that don’t ignore or minimise problems. In fact, some of the most successful companies are the ones that go out of their way to find problems and then figure out how they can be resolved or managed in ways that keep disruption to customers to a minimum.

Netflix, for instance, is known for its “proactive failure testing,” which not only looks for problems to fix but actively creates small failures to discover if they result in “member pain.”

“Let’s say we are able to run 500 experiments in a day,” Netflix’s Kolton Andrus and Ben Schmaus wrote in a blog post earlier this year. “If we are potentially impacting 10 members each run, then the worst case impact is 5,000 members each day. But not every experiment results in a failure — in fact the majority of them result in success. If we only find a failure in one in ten experiments (a high estimate), then we’re actually impacting 500 members requests in a day, some of which are further mitigated by retries. When you’re serving billions of requests each day, the impact of these experiments is very small.”

The results, though, are better and faster ways to resolve unexpected problems quickly with a minimal impact on customers.

Inspired by Netflix’s success with this method, another company — PagerDuty — created a tradition of its own: Failure Fridays.

“We’re big believers in the notion that you need to plan for things that will go wrong, especially those things that aren’t in your control,” said Tim Armandpour, PagerDuty’s vice president of engineering. “Building that super strong culture where you’re not panicking in moments of failure, which I think is fairly commonplace, you build a ton of trust and empathy inside your organization that I think is absolutely invaluable, especially as organizations grow and infrastructures get more complex.”

As counter-intuitive as it might sound, companies like Netflix and PagerDuty have realised that one of the best ways to ensure good service for customers is to try and discover as many ways as possible in which service can go wrong… and then prepare to deal with those problems as effectively as possible. Failure, especially in today’s highly complex and interconnected systems, is always a possibility… so don’t fear it: embrace it like these companies do.