What is the connection between equipment risk and reliability and how does it affect your maintenance strategy selection?

When it comes to Risk we have simplified it so much that we have failed Einstein’s advice, “Everything should be made as simple as possible, but not simpler.” It has been made too simple and missed the connection between risk and reliability.

Equipment risk has a direct impact on equipment reliability. What risks you allow your production plant and equipment to suffer will negatively impact its reliability. About 6 months ago I was playing about with the risk equation and a mathematical connection between risk and reliability revealed itself.

The most commonly used form of risk equation is:

Risk ($/yr) = Consequence of Occurrence ($) x Frequency of Occurrence (/yr)

Risk is equal to the frequency of an event occurring multiplied by its cost, should it occur. Frequency is the number of times an event actually happens during a period. Usually a year is used. An event that happens every five years has a frequency of 0.2 times a year. The consequence of an occurrence is the total financial impact of the event on a business. By calculating the frequency of an event per year, and calculating consequence of the occurrence in monetary value, the equation measures the annual cost of risk. It is a means to quantify the yearly cost to the organization of every event it may suffer, good or bad. It provides a figure to gauge one risk against another and so allows the setting of priorities for addressing risk.

The ‘Frequency of Occurrence’ divides further so that the full form of the risk equation becomes:

Risk ($/yr) = Consequence ($) x [No. of Opportunities to Fail (/yr) x Chance of Failure]

The ‘Number of Opportunities’ is how many times a year the situation arises that could lead to a failure. The ‘Chance of Occurrence’ (or Probability) is the odds that a situation will happen. It is one (1) if it will definitely fail every time the situation arises, and zero (0) if there will never be a failure when the situation arises. It normally takes values between 1 and 0 because the chance of a thing going wrong is usually possible to some degree.

The full risk equation is more meaningful to us, much more so than the common form of the equation, because it links risk from an event directly to the chance of the event—if you want low risk you must remove the possibility of a failure event occurring. Otherwise all that is left for you to do is to react speedily after the event to minimize your consequential losses.

The full risk equation gives us massive insight into how we can maximize production equipment uptime. In the case of operating plant and equipment the ‘chance of occurrence’ of equipment failure becomes the ‘chance of equipment failure’, which is the opposite of equipment reliability (the chance of not having a failure). The risk equation when applied to production machinery is:

Risk ($/yr) = Consequence ($) x [No. of Opportunities to fail (/yr) x {1 – Reliability}]

Now we can see a direct inverse connection between equipment risk and equipment reliability. The image below is useful to help make the connection clear—reliability and risk are inextricably connected. If a maintenance activity does not reduce equipment risk it is a waste of time, money and effort.

Your Risk Matrix is a business risk financial modelling tool

Direct Link between Equipment Risk and Equipment Reliability

Your operational equipment reliability and resulting plant uptime is directly linked to the risks you allow your equipment and machinery to suffer. If you want high production uptime and low operational cost you need to create high equipment reliability by removing the causes of risk to each of your plant items.

There are great benefits available to businesses that reduce their risk of equipment failure. If the chance of a failure is reduced so it happens less often, it saves vast moneys over time because there are fewer events to spend it on.

The maintenance activities that pay-off the most are those that reduce frequency of a failure event. When you reduce failure frequency you increase equipment reliability. Those activities that reduce consequence do not improve reliability, but they do save some maintenance costs.

Reduce the chance of an event occurring, and you reduce the risk. Stop the necessary requirements for an incident to happen and the incident cannot occur. The use of ‘chance reduction techniques’ is the prime principle of risk control in the Plant Wellness Methodology. Risk can also be reduced by decreasing the consequences of an incident. That is the purpose of such things as emergency plans, fire brigades and ambulances. If we react quickly, correctly and early enough, the consequences can be reduced. The use of consequence reduction techniques is a second risk control principle.

But in the risk equation the two factors, chance and consequence, are multiplied together. It would seem that the impact of either factor has equal effect on the risk. Halving the chance is equally as good as halving the consequences. Unfortunately, most organizations fall into this trap. They think that it does not matter how they reduce their risk because either path produces the same result. It is not true. In reality the two ‘paths’ to reducing risk have totally different impacts on the prosperity of an organization.

Impact of Risk Management Strategy

Companies that use consequence reduction strategies minimize their losses by learning to fix breakdowns quickly. You do that by holding lots of spare parts in-store, setting-up a cache of parts by machines, training your repair people to fix things speedily or improving the equipment maintainability to do repairs faster. Reducing the downtime produces profit improvement—your losses are less if the plant gets back into production quickly. Consequence reduction strategies do reduce the cost of risk.

What is interesting is that though consequence reduction reduces costs there will be much frantic activity and ‘fire-fighting’ happening in the operation. Minimizing risk by reducing its consequences accepts failure incidents as a normal way of doing business. In organizations that use consequence failure management, many things go wrong. Its people wait for the failures and then react to them. In this way the management instill a reactive culture in the organization. Reducing only the consequences of risk still makes work for everyone. This work is all wasted time, money and effort because people and resources spend their time fixing failures instead of improving the business. If you were to walk about in this company you would see that everyone is busy, but little of their time and efforts would add value to the operation; only more cost.

The alternate risk management strategy is to apply chance reduction techniques. Over the same period there is less profit lost with chance-reduction strategies than consequence-reduction strategies. Fewer failure incidents occur because chance reduction stops opportunities developing. Add-up the savings from failure costs not spent and you get a very profitable operation. The lower cost strategy is clear: chance reduction delivers less failures because fewer defects are present to rob resources and waste money.

Consequence reduction strategies expect failure to happen and then they manage it so least time, money and effort is lost. The consequence reduction strategies tolerate failure and loss as normal. They accept that it is only a matter of time before problems severely affect the operation. They come into play late in the life cycle when few risk reduction options are left.

In comparison, the chance reduction strategies focus on identification of problems and making business system changes to prevent or remove the opportunity for failure. The chance reduction strategies view failure as avoidable and preventable. These methodologies rely heavily on improving business processes rather than improving failure detection methods. They expend time, money and effort early in the life cycle to identify and stop problems so the chance of failure is minimized.

Both risk reduction philosophies are necessary for optimal protection. But a business with chance reduction focus will proactively prevent defects, unlike one with consequence reduction focus that will fix defects. Those organizations that primarily apply chance reduction strategies truly have set-up their business to ensure decreasing numbers of failures, and as a consequence they get high equipment reliability, and reap all the wonderful business performance that world class reliability brings.

It is in your organization’s best interest, and it will generate the most profit consistently for the least amount of work, to focus strongly on the use of chance reduction strategies. Consequence reduction strategies are still important and necessary—once a failure sequence has initiated, you must find it quickly, address it and minimize its effects so you lose the least amount of money. But consequence reduction will not take your organization to world-class success and profit because it expends resources. Only chance reduction strategies reduce the need for resources because they proactively eliminate failure incidents through defect elimination and failure prevention. Nothing is certain with risk; it changes with the circumstances. Controlling risk demands that an organization has the culture and practices to guarantee continuous, rigorous compliance to risk reduction practices, else the chance of failure rises over time as business processes and operational systems degrade, and eventually the worst will happen.

My best regards to you,

Mike Sondalini
Managing Director
Lifetime Reliability Solutions