Accurate maintenance history reporting

Living RCM Certified®

Video transcript

Scene 00-1 A systematic process for increasing physical asset reliability eludes many maintenance departments.

Scene 00-2 How can we increase equipment reliability? By capturing and then analyzing maintenance and repair history.

Reliability improvement depends on your having performed reliability analysis.

Scene 00-3 And, reliability analysis (RA) requires adequately captured

historical maintenance event data, in conjunction with
monitored condition data, such as sensor data, oil debris data and other observed data sources relevant to an asset’s health.

Scene 01-1 What do we mean by historical event data?

Well, two types of equipment history can be recorded and tracked:

Descriptive, and
Quantitative.

Maintenance engineers typically analyze “Descriptive” history, consisting of textual narratives, sketches, and photographs that document operational problems encountered and proposed solutions.

A subject matter expert reviews this information,

and then proposes engineering modifications that will resolve the issue.

Maintenance engineers are less familiar with the second type of historical record, required for quantitative analysis.

This type of analysis, called “reliability analysis”, is seldom performed successfully in the maintenance organization. Why?

Scene 01-2 The primary reason for the paucity of quantitative analysis in maintenance is well known. The history of object parts and their ending events were not recorded with sufficient precision, completeness, and accuracy in the Enterprise Asset Management (or EAM) System database.

In this video we’ll look at a new type of data entry form that can fill the reliability gap. It will ensure the transfer of “analysis grade” data suitable for reliability analysis into the maintenance historical database.

The largely untraveled road to Condition Based Maintenance optimization depends on correlating failure mode history with condition monitoring data.

This multi-dimensional analysis of age and condition monitored data leading to Asset Performance allows us to answer important questions such as …

Scene 01-3

What is the actual predictive capability of a particular set of monitored data?
What are the monitored variables that are most influential to the probability of failure in an upcoming calendar or age period?
What is the probabilistic relationship between those variables and an item’s remaining useful life?
What is the confidence with which a predictive decision is taken?
What is the return on investment of a given predictive maintenance strategy?
How can predictive performance be measured?

Scene 02-1 Predictive performance can be assessed by plotting the Conditional Probability Density (CPD) distribution. It is similar to the well-known probability density function except that the origin is not positioned at age “0” when the item was new.

Rather, it is located at the current moment in time, which is, of course, the moment at which a predictive decision must be made.

Scene 02-2 The quality of that decision is measured in terms of confidence as reflected by the narrowness of the curve’s variability about the mean.

The mean, by definition, is the object part’s Remaining Useful Life Estimate or RULE.

Reporting the coefficient of variation σ/µ (the standard deviation divided by the mean) is a convenient way of tracking confidence in predictive decisions.

The equation for this curve is known as the Cox Proportional Hazard Model ( or PHM).

It predicts the conditional failure probability or hazard h in terms of the item’s current age t, and the current values of the significant condition indicators represented by the vector product of gamma and Z.

Scene 02-3 The shape parameter β is a measure of the quality with which failure modes and their life ending events (either Functional Failure, Potential Failure, or Suspension) have been recorded by the technician. A suspension is the preventive renewal of an object part that has not failed.

Successful quantitative reliability analysis for predictive maintenance depends almost entirely on the reporting skills of the or technicians. How can we ensure that they record these events accurately?

Scene 03-1 The form ensures the quality of data needed for the continuous improvement process. The pane on the left contains the asset RCM tree view. The leaves represent failure modes that can affect asset performance. When a leaf node is selected in the tree view the corresponding failure mitigating strategy will appear in the center pane.

This is key. Because the technician assures himself, in the light of the revealed object part, object damage, failure effects, and strategy, that the selected node precisely represents the failure mode he has observed, and, whose occurrence he wants to accurately record.

He does so in the rightmost frame. That pane requires selection of an “Ending Event” corresponding to each object part renewed during work order execution. The ending event, one of functional failure (FF), potential failure (PF) or suspension (S) is prerequisite for subsequent reliability analysis.

Scene 03-2 Analysis requires precision in the selection of Oject part and its life ending event as one of potential failure, functional failure, or suspension.

The good news is that errors can be avoided given the completeness of the contextual information provided in the form’s central pane.

Scene 03-3 The form has one more essential feature. Continuous update, of the maintenance strategy itself, may be routinely accomplished by considering a technician’s on-the-spot observations. Text input areas in the “Feedback” pane adjacent to each text box in the strategy pane encourage the technician to suggest changes or additions to any of the strategic information such as Object Part, Object Damage, Effects, Consequences and Mitigation in the light of his actual observations. Such valuable information structured in an “RCM-like” way drives incremental strategy improvement so that the maintenance plan can respond better and better to observed reality.

Scene 03-4 A generally recognized management principle suggests that motivation increases when employees, such as operators and technicians, can contribute directly to the maintenance strategy.

Scene 03-5 Declaring a failure mode’s life ending event can be challenging. Operational context is a decisive factor when recording the event as one of potential failure PF.or functional failure FF

Fortunately when completing the form’s third pane the relevant facts are fresh. Discussion among engineers, supervisors, and technicians will naturally evolve into standards for the consistent identification and capture of the ending event.

For example, when a certain function, say “To contain”, has been lost, this by definition would be reported as a “failure” when leakage rate exceeds a standard.

But should we record it as a potential failure PF or a functional failure FF? The function might be backed up, and, consistent with the equipment’s strategy, the consequences of the protected function’s failure may have,been minimal.

We might, in such a case, record the event as a potential failure, so as to indicate that our maintenance strategy has avoided the direst consequences of failure..

The identical seal failure in some other context could be a functional failure if significant operational, maintenance, safety, or hidden consequences were incurred.

Declaring a failure as “functional” or “potential” is context dependent. The form allows us to update the equipment’s “Effects analysis” to reflect all reasonably likely circumstances that could arise as a result of this failure mode.

If the object part still has an indefinite amount life remaining in it, but is nevertheless replaced preventively, its life ending event should be reported as a suspension S.

Scene 03-6 A final important point to make is that this reliability enabling data entry form can be implemented easily using tools and methods already available in your Microsoft Office tool set.

Scene 04-1 Quantitative RA requires that we enforce a one-to-one relationship between catalog profile object parts and the failure modes identified in the RCM derived strategy.

Here is an example where two object parts were pinpointed in the initial RCM analysis and stored in the Asset Performance Management System or APMS

But the EAM catalog lists from which technicians must make their selections contain far more detail. Those profiles reflect the original equipment manufacturer’s engineering design, maintenance manuals, and bills of material.

When developing the catalog profiles, erring on the side of greater depth and more detail was considered cautious, and thus desirable.

However, when setting up the catalog profiles, scant attention was paid to the complexity of matching real situations encountered in the field to long lists of selection choices.

Scene 04-2 The equipment strategy, on the other hand, was built with the benefit of SME and front line experience using a structured RCM or similar process that addressed the reasonably likely failure modes, their effects, and consequences.

Discrepancies in detail and depth such as in these two unrelated lists, given their separate development processes, are not surprising.

Scene 04-3 The technician should not be required to parse long lists of internal object parts that are irrelevant to the maintenance activities performed in actual practice on an asset.

In other words, the level of detail imposed by the catalog profiles often contradicts maintenance objectives for the asset in its operational context.

When the technician encounters such circumstances he may use the text areas on this Living RCM form to propose the elimination of extraneous detail.

Scene 04-4 For example, often we don’t know and don’t even care to know precisely which of the internal object parts listed in the catalog profile was responsible for the failure. It’s just not worth it. We simply discard the entire component and replace it with a new item. In such cases the technician may propose that the item itself be reassigned as the object part whose life ends for whatever internal reason.

Scene 05-1 The sole purpose of captured maintenance event history, specifically, the renewal of object parts, is to provide reliability engineers with the ability to perform Reliability Analysis on a sample of data.

Scene 05-2 A sample is bounded within a calendar window. Data “points” are the lifetimes of failure modes occurring entirely or partially within that window.

The lifetimes included in a sample are represented by arcs. Each arc connects two events, a beginning event B with an ending by failure EF or ending by suspension ES.

RA, in its basic sense, is the “counting” of the arcs in the sample. Each failure mode’s age and its life ending event will have been recorded in the maintenance history capture process. The dashed arcs represent suspended lifetimes, i.e. lifetimes that occur partially outside the sample window.

Scene 05-3 Suspended data contributes to the uncertainty of an analysis. RA software algorithms manage the uncertainty associated with suspensions so that confidence in a decision can be stated and thereby considered by stakeholders.

Scene 05-4 The EAM system can track an asset’s working age in calendar or in operational hours (or in any other units considered to be proportional to the accumulated stress on the asset, for example throughput, or energy consumed).

RA software calculates the age of a given failure mode (object part) at the moment of each event in order to develop a predictive relationship.

Unlike the the living RCM form, conventional EAM procedures do not explicitly record life ending events.

Reliability engineers, consequently, cannot develop policies based on quantitative reliability analysis with the degree of confidence necessary for their adoption in an an asset perfromance strategy. At best, a Suspension is often assumed to be a Failure.

Such an assumption results in low confidence and overly pessimistic analyses.

Scene 06-1 Now that we have a method to record accurately an object part’s ending events we’ll perform Reliabilbility Analysis on a sample of that data.

Consider a fleet of Haul Trucks subject to the RCM listed failure mode “transmission fails”. We’ll define the object part as the Transmission itself. In other words we’ll not concern ourselves with exactly which internal part failed. There is no right or wrong level of analysis. In some other context we might have focusd on the failure of a particular gear, bearing, or seal in which case they would have been defined as the object parts.

Scene 06-2 We’ll apply a multi-variate reliability analysis method to object part events and to condition monitoring data, both occurring in the same sample window. The method is known as EXAKT Condition Based Maintenance optimization developed at the University of Toronto by Professors Andrew Jardine and Dragan Banjevic.

Scene 06-3 Within the EXAKT application our sample appears in a table that combines condition monitoring and event history in a single virtual data structure.

The life ending events, sorted chronologically, are represented by the symbols EF for ending by failure and ES for ending by suspension.

We’ll run the proportional hazard algorithm that will parse each record. It will incrementally build a probabalistic model or equation relating the object part’s age and monitored condition data to its probability of failure, or hazard h.

Scene 06-4 The proportional hazard analysis result displays in a report. Multi-dimensional regression has converged to estimated values for the parameters β, η, and γ in the equation for hazard rate.

The most influential condition indicators were found by statistical testing to be the ppm of iron and lead.

Scene 06-5 Next we invoke a transition probability model to predict the transition rates of these significant condition indicators from one range of values to the another.

A failure probability estimate of failure in the next observation interval is determined for each condition indicator range. This may be observed in a Transition Probabilities matrix.

Scene 06-6 A conditional probabilty density graph reveals the object part’s Remaining Useful Life, standard deviation, and confidence interval.

Scene 07-1 This video revealed a new, quantitative dimension that we can apply to the challenge of equipment reliability

We identified inaccurate event data in the conventional work order data entry process as being the the principal structural impediment to reliability improvement.

We proposed a living RCM data entry form based entirely on MS Office tools and integrated with the existing Enterprise Asset Management reporting process that resolves the event data inadequacy problem.

Finally we demonstrated the proportional hazard modeling method with which to optimize predictive performance of existing condition based maintenance initiatives.

Thank you

Video image credits

Pages: 1 2

Video transcript

Video image credits

Search:

Categories

Forum topics

Other

Contact us: