Selecting maintenance strategy for redundant equipment and stand-by equipment in Reliability Centred Maintenance RCM analysis.

Reliability Centered Maintenance (RCM) logic allows bad maintenance strategy selection for redundant equipment configurations. On stand-by units RCM lets you to run duty equipment to failure and happily put your company in jeopardy



Hi Mike,

While preparing an RCM draft for Boiler Feed Water Pumps I found some ambiguity. Kindly give some suggestion how to incorporate stand-by configuration in RCM (2 pumps operating, one standby) because RCM is focused on components failure?



Hello Fahad,

RCM is a very popular method for investigating plant and equipment to choose its maintenance regime (though not the best maintenance strategy selection method). Unfortunately RCM logic will allow you to bring your plant to a stand-still and make you feel good about it. You can cause serious downtime losses when using Reliability Centered Maintenance for maintenance strategy selection.

In an RCM analysis you eventually select combinations of run-to-failure strategy (breakdown maintenance), predictive maintenance strategy (requiring condition monitoring to decide when to do on-condition maintenance), preventive maintenance strategy (replacing aging parts before end-of-life failure rates increase too much), failure-finding tests (to spot hidden failures in equipment that must work correctly when required), or re-engineering to remove failure causes.

One of the biggest downfalls for the Reliability Centred Maintenance methodology is with redundant equipment. Because you have a complete stand-by ready to start-up and go, RCM says to run the duty equipment to failure. The RCM logic appears sound—one item of plant fails and the replacement takes over, so do not maintain the duty unit. But that is poor business success logic because once you lose redundancy you have increased the risk for total operation shut-down.

Let us take a look at the scenario for a three pump configuration with two operating pumps, and the third as stand-by ready to start should one of the duty pumps fail.


If two pumps are in service and one duty pump fails the stand-by pump starts. The RCM decision logic says to not maintain any duty pumps and run them to failure, expecting the stand-by to replace the failed duty pump for long enough to fix the out-of-service pump.

For that to be true we make the assumption that the stand-by is in 100 percent condition to immediately go into use when needed. But how do you know that the not-in-use stand-by pump (and its complete operating system) is in good health? You could have a hidden failure condition with the stand-by and never know that it has already failed until it does not start. For you to have confidence in the stand-by pump you must be sure the stand-by is well maintained, that its complete system is in operating condition, and it will run faultlessly for as long as it takes to repair the failed duty pump. By RCM logic the stand-by pump gets all the maintenance and inspections to keep it in great condition, but it is not used until needed in an emergency event.

Is it a good business decision to not maintain the duty pump systems at all and instead maintain the non-operated stand-by system in a ready-to-run state of health? It all depends on the risk that the RCM decision brings to the operation. Before making that choice it is necessary to investigate the operating risk in using a RCM decision.

In the spread sheet below is a risk model should the RCM team choose a run-to-failure strategy for the duty pumps. The costs and history of the situation are imaginary so the example model could be developed. In reality you would insert you own costs and equipment history so your model faithfully reflects your site historic practices and the business-wide costs of an equipment failure.

Risk cost analysis example for Reliability Centered Maintenance RCM stand-by equipment decision

The spread sheet shows the business-wide cost if failure occurs in the first hour of repair. Should there be a failure during the first hour after pump failure you will lose $240,000. With each hour that goes by during the repair the money at risk decreases by $10,000. The Risk Cost column is not the money that will be lost. Rather, it is used for comparison between options. Once operating risk is modelled in detail you are more knowledgeable of the business impact from your choices.

During the day that it takes to return the failed pump to service there are two possible outcomes—the pumping system will operate or the system will fail. Based on the risk of a failure event identified in the model (i.e. the opportunity for future pump failures remain the historic same) AND IF you can afford to lose as much as $240,000, you would go with the run-to-failure strategy for the duty pumps. If you cannot afford to lose $240,000 you could either hope that the odds go your way (The chance of pump system failure for the worst scenario is two percent, but a 98% chance that the remaining duty pump will keep pumping). If you cannot afford a single loss of $240,000 you would not use a run-to-failure strategy.

If instead of losing $10,000 per hour it was $100,000 lost each hour then the risk analysis below applies.

Risk cost analysis example for Reliability Centred Maintenance RCM redundant equipment strategy

The worst case scenario is still a two percent chance of failure but the cost of failure in the first hour of repair is $2,400,000. How confident are you that the odds will always be in your favour and neither operating pumps will stop during the repair?

This risk situation is also applicable to a home owner. It is very unlikely that your house will burn down, yet the typical home owner carries house fire insurance. The odds of your home burning down are 1 in 16,000 according to The odds are great that your house will never burn down in your lifetime. One could argue that you are crazy to pay $700 annual insurance for 50 years of your life. But you gladly pay $700 a year because you cannot afford even one $500,000 payout to rebuild after a fire. How much annual maintenance costs would you spend to protect against one $2,400,000 loss event during the operating life of your plant?

If your company can afford a $2,400,000 loss event I would be inclined to stay with the run-to-failure duty pump option. If your company cannot afford a $2,400,000 loss, then the run-to-failure strategy is the wrong one for the duty pumps. You would need to change to a predictive maintenance strategy and introduce condition monitoring on the duty pumps. The stand-by pump would still remain on a failure-finding maintenance regime. It would be used when a duty pump is stopped for corrective maintenance or if a duty pump suffers a surprise breakdown.

Reliability Centered Maintenance logic needs to be tempered with business risk reality. You cannot trust that RCM run-to-fail proposals are best for your company unless you model the true financial risk that such an RCM decision brings to your operation.

The RCM methodology does not ask you to do a robust risk model as part of the maintenance strategy selection process. Take it upon yourself to do a detailed risk analysis for all RCM run-to-fail maintenance strategy choices. If your operation cannot afford to have a breakdown in run-to-fail equipment, then select more appropriate maintenance strategy that makes money for you.

I hope that the above information is helpful to you.


All the best to you,

Mike Sondalini
Managing Director
Lifetime Reliability Solutions HQ