What is Reliability?

“Reliability is the chance of success.”

“We get reliability by creating and building a thing that can do the duty, and preventing its failure during use.” (Lifetime Reliability Solutions Consultants use Plant Wellness Way EAM to get world class plant, equipment and machine reliability.)

PEW/PWW EAM Course Day 1 – Foundations Session 2 – Reliability | What is Reliability?
Duration 3:22

Machine Reliability = Sum of Parts’ Reliability
Duration 3:02

PLEASE NOTE THIS CLARIFICATION: The word ‘sum’ used in the slide heading does NOT mean addition. In this case ‘sum’ means the holistic effect of all the parts and components cause the machine reliability. Machine Reliability is in fact the product of its parts’ reliability.

A machine is made of parts with varying reliability that produce a unique Rate of Occurrence of Failure (ROCOF) for the complete machine. Production processes are made up of machines, each with their own ROCOF. The reliability of the entire arrangement, whether a machine or a process, can be calculated using the mathematics of probability and statistics.

The slide shows how the failure of individual parts affects the availability of a machine. When many parts are failing the machine’s rate of failing increases and there is less time for it to be operating. The ROCOF of a machine is moved about by the failure rate of its parts. In a large group of many such machines (e.g. diesel engines of the same model, jet engines of the same model, etc) the times for failure of parts will vary and if combined into the ‘average’ curve for such machines a typical ‘failure curve’ for these machines would arise.

The shape of the failure curve can be changed by 1) renewing parts on Preventive Maintenance, 2) the applied stress level suffered by a part’s microstructure, and 3) quality control of each part’s manufacture, assembly, operation and maintenance.

What is the Chance of this Drinking Glass Breaking? Its Reliability is, ‘The chance it will hold water next time you use it.’
Duration 0:47

If ‘reliability’ is the chance that a thing will work properly, we can ask what will stop the glass from ‘working properly’. There are numerous reasons that a glass will break (the ‘failure mechanisms’), many of them are listed in the table on the slide. Each cause of failure can happen to a glass if the particular circumstances arise. This means the ‘chance’ of the glass breaking depends on the frequency, or how often, that ‘bad’ circumstances arise. But before the glass breaks it needs to be both put in danger (the opportunity) AND enough force applied (the failure mechanism) to break it.

Most often people say ‘failure modes’ rather than ‘mechanisms’.

Chance of Failure for a Drinking Glass
Duration 6:55

The activity is to draw the likely reliability curve (the Hazard curve) for a glass.

We do not have any real data, but using our experiences we can visualise the shape of the probability of failure curve for the glass shown. For example the likelihood of the glasses failing due to internal faults is zero. But the likelihood of them failing due to mishandling is real, and people experience it when they break a glass. It is reasonable to expect breakages will begin on the day of purchase and continuing for as long as the glasses are used. The number failing each day is unknown, but our life experiences tell us that glasses will be broken occasionally in every household.

There are 15 causes of drinking glass breakage shown in the list. I’m sure that you can come-up with more causes. We can estimate the chance of breaking a glass in a year, i.e. the failure rate, by analysing the history of the glass. Let’s say it came from a manufacturing run of a million drinking glasses which were sold through shops around the world in a carrier packs of twelve glasses. Each pack went to a household, one of them was your place and another was my place. That means 83,333 households had a set of glasses and put them on their shelf to use.

How many times a year does a glass get broken in your place? People have told me from one a year in their place and others up to five a year at their place. In my house about two glasses a year get broken. Mostly by me, because a I wash the plates and glasses after meals.

At the beginning only a few of the many causes of glass breakage can happen. When a new drinking glass is taken out of the glass-carrier and put on a shelf it is possible to drop it. As the glass is first moved into place on the shelf it is possible for it to hit something else on the shelf. So the chance of the glass being broken at the start of its ‘working’ life is not zero because in some of the 83,333 households a glass will be broken when first stored. Over time more opportunities for failure arise. As the glass is used for different functions, family get-togethers, celebrations, special occasions, etc opportunities constantly arise for an accident or problem to occur that results in a broken glass. With enough time the causes repeat endlessly.

You can see on the slide how the annual failure rate of 0.167 was calculated for the group of 1,000,000 glasses. For 12 glasses at my home the failure rate is 0.167 ÷ 83,333 = 0.000002, or two in a million each year.

If you wanted to reduce the number of drinking glasses broken in a year what can you do?

Stop Breakage = Remove Failure Causes = Improved Reliability
Duration 3:53

Once the causes of failure are known they can be targeted with solutions to prevent them. Glass breakages can be stopped by a design change, such as replacing glass with plastic , by changing the glass design to one that is stronger, or using a glass of a design that prevents a failure cause arising. Procedural changes can be made such as carrying glasses in locating trays. Improved instructions with training can be used to up-skill people and give them specialised knowledge and techniques.

Once failure causes are removed there will be fewer failures and the failure rate curve falls. With fewer failures less money is lost to DAFT Costs. The maintenance costs fall, the operating profit improves and people win back time to spend on improving the operation further.

Lift Machine Reliability = Remove the Chance of Parts Failure
Duration 1:35

For each failure mode of a part the failure curves can be developed. Data is collected for each type of part from many applications. For each failure mode, the life of the parts is measured and the numbers of parts failing from that mode in each time period is charted. The sum of the likelihoods for each mode becomes the total chance of the part failing.

The curve for the total of each part’s various failure modes shows the chance of the part failing in a particular time period.

The Unreliability of Systems of Parts (i.e. a Machine) is the Sum of Its Parts Failure Rates | “Equipment reliability is malleable by choice of policy and the quality of practice.”
Duration 1:08

When components are combined together into a machine or assembly they form systems of parts. The system fails every time a component fails. Hence system reliability, i.e. machine reliability, is lower than individual component reliability because numerous components are present that can fail, and any one that fails will cause the whole machine to be failed.

To improve system reliability, e.g. machine reliability, it is necessary to either improve individual component reliability, or to include redundancy so if one item fails a second one replaces it and continues operating. In all cases it is worthwhile to adopt system-wide best practices, as they benefit every part of the system.

Within the slide is shown various strategies to adopt to reduce the chance of failure, depending on the stage of the equipment life cycle.

Equipment Life vs. Chance of Failure
Duration 5:25

In this slide, Wayne Bissett from OneSteel in Australia, shows on a pie-chart that within three months half of the equipment that was repaired at OneSteel will again be repaired. This is evidence of high incidence of Early Life Failure. He makes the point in the graph that we must hit the ‘sweet spot’ on all machinery critical parameters to make any real difference to the life of machines. He has developed a chant that makes clear what we need to deliver to our machines – “Only precise, smooth, tight and dry, clean and cool will do.”

Where does Failure Start in a Process?
Duration 7:28

Joint failure, loose fasteners and broken fasteners are inherent in the muscular-feel process. Torque is a poor means for ensuring proper fastener tension. To stop fasteners failing needs a process that delivers a required shank extension. It is the fastening process that must be changed to one that guarantees the necessary fastener stretch. Only after that management decision is made and followed through by purchasing the necessary technology, quality controlling the new method to limit variation, and training the workforce in the correct practice until competent, that the intended outcome can always be expected. The use of operator feel when tensioning fasteners is a management decision that automatically leads to breakdowns. Any operation using people’s muscles to control fastener tension has failure built into its design – it is the nature of the process.

This is why W. Edwards Deming said his famous warning to managers, “Your business is perfectly designed to give you the results that you get.” Poor equipment reliability is the result of choosing to use business and engineering processes that have inherently wide variation. These processes are statistically incapable of delivering the required performance with certainty, and so equipment failure is a normal outcome of their use and must be regularly expected. Failure is designed into the process and it is mostly luck that keeps these companies in business.

Empirical evidence from use of load indicating washer indicates that if you are within + or – 10% fastener tension accuracy your fastener problems stop happening. It seems that provided the final tension is within 10% of the ideal tension, then loose and broken fastener failures stop.

We can imagine situations that will cause parts failure
Duration 0:35

The Bill of Material is a powerful document for deciding what maintenance to do on equipment parts. You take one part number at a time and ask how many ways can it fail (i.e. its failure modes), or be failed (i.e. the mechanisms that cause the failure modes. As you identify the causes of the causes of the failure you can make good maintenance strategy choices and identify what preventive and predictive actions to take.

Identify Equipment Assemblies and Parts at Risk of Failure
Duration 2:35

Simply mark-up the Bill of Material with the failure types that can fail a part and as you collect and analyse the causes of failure it becomes clear how to protect the equipment and its parts with the right practices and strategies. These are what you put into your standard maintenance and operating procedures and planned maintenance work order system.

World Class Plant, Equipment and Machine Reliability can Be a Total Certainty
Duration 0:31

Industrial and Manufacturing Wellness Book also explains plant, equipment and machine reliability

The new Industrial and Manufacturing Wellness book contains all the latest information, all the latest templates, and worked examples of how to design and build a Plant Wellness Way Enterprise Asset Management (PWWEAM) system-of-reliability. Get the book from its publisher, Industrial Press, and Amazon Books.

The PLANT WELLNESS WAY EAM TRAINING COURSE teaches you to use and master the Plant Wellness Way EAM methodology. Follow this link to read about Training for New Users in the Plant Wellness Way EAM Methodology for World Class Reliability.

You are welcome to go to the Plant Wellness Way Tutorials webpage and look at worked examples of Plant Wellness Way EAM techniques and read in-depth explanations of the latest version of many PWWEAM presentation slides.

Use the head office email address on the Contact Us page if you have questions about the Session 2 Equipment Reliability Explained videos.