Fail Smart, Not Just Fast

By Rob Keefer

While the "fail fast" ethos is good for innovation, performing analysis as part of your project's planning phase can help you understand risks before they materialize and then plan to avoid them.

Lean Startup philosophy holds that innovators should not spend a lot of time getting something perfect on the first try. Instead, product pioneers achieve finished, feature-complete product designs much faster if they fail quickly, learn from failure, and then iterate.

Unfortunately, some failures are permanent. Sometimes innovators don't fail fast enough, or they fail too often, or they fail during the wrong stage of the project—all of this is borne out by the fact that 14% of all IT projects fail permanently.

Although this doesn't seem like a huge number, it means that businesses collectively waste $97 million for every $1 billion that they invest in improvements. What's more, the impact of failure is not discussed - the project that fails could have been the one project that would have modernized your company, improved your customer experience, and made the difference between success and failure as an organization. In short, it's best to get your failure rate as close to zero as possible.

How to Mitigate Risks of Project Failure

To understand how failure can get out of hand and to learn how to prevent failure from finalizing your projects, we will introduce a concept known as failure mode and effects analysis (FMEA). In a nutshell, FMEA starts by asking:

  • Which functions were your design intended to fulfill?
  • What kind of failure prevented your plan from achieving its function?
  • How to overcome that failure in the next iteration?

Intended initially as a manufacturing discipline, FMEA can be applied to all industries and during every phase of the product lifecycle from concept to production.

How Do You Undertake a Failure Mode Analysis?

There are two ways to understand failure modes. One way is to look at a risk assessment. The other way is to assess the qualitative and quantitative aspects of the project itself. It's best to do both.

In terms of risk assessment, the first step is to focus on the project's concrete, measurable goals - e.g., virtualizing a server, increasing website visitors, or launching a new product. Next, consider the constraints, which are usually along the lines of time, budget, and human resources. Finally, consider the potential blocks - the things that concretely may lead to failure or which stand in the way of success.

In the example above, you can see how the risk assessment framework allows clients to organize their risks and success requirements. Each project has both positive and negative outcomes. You may have an opportunity to improve team cohesion on the one hand, but the potential negative outcomes are vastly more numerous.

To achieve positive outcomes, consider success requirements that address every one of the potential negative outcomes. For example, if the rush to deliver jeopardizes quality, then implement the simplest possible product while ensuring that you test thoroughly. Any risks that you can't address should have a contingency plan.

Conducting a Project Assessment

A project assessment breaks a project down into a series of tasks. Each task has a way that it can fail, measured by the severity of the failure, a probability of failure, and mitigation of the risk. Each of these is scored numerically from one through eight.

There are already two example scenarios described above, so let's add one on our own. Let's say that your task is to lift and shift an application to the cloud. Put that in the first box. The second box describes a failure mode: let's say that the application might perform poorly in a cloud environment. This failure is a severe issue, so let's rank that a five.

Now let's move to the cause of failure. One cause of failure is that the application is under-resourced in its new environment. Since the application in our example is a bit newer, there's a lower probability of poor performance—call it a two. Lastly, how likely is it that we'll be able to detect and fix the issue? Well, you'll know when the application doesn't meet expected standards, and you'll probably be able to fix it by allocating more resources, so let's call that a two as well.

In this analysis model, we judge the overall risk of failure by multiplying the numbers horizontally across columns. Any project with a risk index greater than 30 needs particular attention, but 5*2*2=20. There's a comparatively low likelihood that the project will fail in the manner described.

Fail Smarter

The Harmonics Way point of view is that while the "fail fast" ethos is good for innovation, it's best to bake in quality from the start. By performing analysis as part of your project's planning phase, you can understand risks before they materialize and then plan to avoid them. Although you still have the opportunity to fail and iterate, your iterations are that much faster, and you can sidestep risks that have the potential to derail your project entirely (as opposed to merely creating a learning experience).


The Harmonics Way Systems Thinking Canvas [Download PDF]

The FMEA Worksheet [Download Excel Worksheet]