Two kinds of decision making in maintenance

In this article we hope to clear up some vagueness in the types of decisions that maintenance managers are required to make. We distinguish between two separate types of decision procedures used in the maintenance management domain.

Tactical (short term) decisions answer the question: What maintenance, if any should we perform on a specified equipment unit at the current moment or in the immediate future given its age and and its condition as reflected by monitored sensor and other condition or operational data? That is one type of decision taken routinely in day to day operation and maintenance.

Decisions that impact the longer term are “strategic decisions”. Strategic decisions respond to questions such as: What is the best allocation of the capital and expense in the maintenance budget over a one year, 3 year, or 5 year horizon. What design upgrades to equipment or procedures will best fulfill expected requirements? What spare parts should we stock, in what quantities, and where should we locate them? What is the most economic age at which to replace units in a fleet? Strategic decisions have little or no immediate impact on current maintenance activities or on (lagging) performance indicators such as availability, cost, or reliability.

All decisions, tactical or strategic, should be evidence or “model” based. A model can be a rule or a procedure for interpreting available information about the asset’s condition. When the model is executed it converts that information into a recommended action. When no model exists the decision derives from opinion rather than from an objective rule or calculation. For example, the overhaul of an engine when it reaches a working age of 12000 hours because it has always been done that way, or because it is the vendor’s recommended overhaul age. That decision responds to, in effect, the active decision model. But has that model been examined in the current operating context? Can it be improved? What were the observed conditions of engines prior to scheduled overhaul at 12000 hours? Were they all on the verge of failing? Probably not. What were the observed Failure Modes? How many engines failed in service prior to scheduled overhaul? What sensor or monitored data patterns preceded the failure or preventive overhaul? Few maintenance departments will have analyzed such information in order to challenge the “model” with the intent to improve it. Those maintenance organizations cannot claim to employ a process of “continuous improvement” in regard to engine overhauls. Continuous improvement, nevertheless, is one of the foremost requirements of modern maintenance practice mandated by such programs and standards as PAS 55 or ISO 55000.

Could a maintenance organization gradually supplant a particular legacy age based model in favor if a more aggressive policy based on both age and condition monitoring data? How could this be done systematically based on the evidence? Such an endeavor would do more than simply collecting and observing sensor and oil analysis data. A rule or procedure needs to be formulated to transform that data into a decision that is verifiable. Monitored relevant data must supplement the engine’s Age data^[1] and both must be incorporated into a decision model. The decision model must be shown to render decisions that optimally satisfy one or more desired availability, profitability, or reliability objectives.

An optimized CBM decision model would monitor condition or operational data and trigger a more intrusive inspection of the asset. What factors and monitored data would comprise the significant risk variables? How should they be weighted and combined to trigger a tactical or short term maintenance intervention? How can we verify if the model is optimal? These questions will be answered within a process known as LRCM (Living RCM) whose purpose is to assemble data and knowledge correctly and systematically for analysis and decision automation. The diagram below describes the components of a MESH LRCM process.

MESH LRCM Project Components

The column on the right hand side of the diagram lists the LRCM software components that have been designed to “mesh” with the EAM/CMMS:

Work order completion & data recording:

Reliability Analysis (RA) for maintenance decision modeling requires high accuracy and consistency in recording of work order observational information. Current CMMS/EAM work order procedures fail to record Failure Mode and their ending Event Types consistently. This software module integrates with the CMMS/EAM and ensures 100% accurate recording of data into.

Synchronization:

Accuracy in the CMMS historical database requires synchronization between the RCM knowledge base and the CMMS/EAM drop down menu choices of failure codes or catalog values. This LRCM module ensures that the work order form menu selections correspond precisely with the Failure Modes as they have been specified in the RCM knowledge base.

Knowledge Builder:

This module constitutes the bridge connecting RCM and the EAM/CMMS database ensuring that the work order data history can render analyzable samples of Failure Mode instances and their Event Types.

Knowledge Update:

The initial RCM knowledge base is a starting point or baseline upon which to develop and improve. It was compiled initially using the best recollections and understanding by the RCM review group members. As such the maintenance programs derived from the initial RCM analysis tend to be conservative so as to be safe. Since its inception at UAW in the late seventies, RCM has always been understood as a living process that is revisited often continually approaching the true failure behavior of an assets. The Knowledge Update module assures a controlled yet simple process for reliability improvement based on repeated reality checks and dynamic RCM knowledge refinement.

Image gallery:

The reason for this feature is directly related to practical reliability analysis. It is unique to LRCM. It is essential that the technician distinguish Failures, Potential Failures, and Suspensions consistently. But how to judge between a Potential Failure and a Suspension? For example, a worn hose that is replaced preventively. It is still fully functional. Should this be reported as a Potential Failure or a Suspension? One technician might consider this level of wear as “Failed”. Another may consider its life as having been “Suspended”. Where to draw the line? The answer is that this must be an organizational standard arrived at progressively by consensus. By definition a Suspension is the preventive replacement of an item (Failure Mode) which still has an indeterminate amount of useful life left in it. Because this call is subjective, the images in the knowledge base make the process of establishing standards for distinguishing between Suspension and Failure easy. Everyone will know what the corporate standard is because it is described through the images associated with each Failure Mode. Furthermore, anyone can suggest making the standard more or less stringent. Why is discriminating between Failure and Suspension so important? Because that is the only way Reliability Analysis software can do what it is designed to do – to use the history of instances of Failures and Suspensions for the development and refinement of reliability analysis, prediction and decision models.

^[1] Age data consists of the fleet history of instances of failure events and suspension events↩

Search:

Categories

Forum topics

Other

Contact us: