CMMS Impediments to Reliability Analysis

In the maintenance department of a reliability-obsessed and maintenance-intensive company, both RCM (Reliability-Centered Maintenance) knowledge and work order information are stored in different databases using different software packages.

CMMS v.s. RCM

The two software applications have different purposes:

  1. RCM knowledge software describes likely failure problems accurately and in as much detail as necessary, while
  2. The work order system records actual failure problems, and, its designers and users want the failure list as simple and easy to use as possible.


Due to these different focuses, the RCM knowledge software and the work order information system use different data structures. These different data structures lead invariably to two different codifications describing failures that occur in a given maintenance organization. One set of codes is embodied in drop down lists that personnel access when closing a work order. The other is the Function, Failure, Failure Mode and Effects (FMEA) hierarchy contained in the RCM software. And these two versions are asynchronous since no way is provided by either software to keep them synchronized as knowledge accrues with ongoing work order experience.

The foregoing is an example of the most critical of data issues in a maintenance department’s information strategy. The issue causes work order records to imperfectly represent instances of RCM failure modes. The result of this disconnection between RCM and the CMMS is that it is generally impossible to generate good samples for Reliability Analysis (RA). Without RA it is impossible to develop, verify, and improve maintenance decision models. The diagram above illustrates the need for a symbolic “bridge” unifying the theory of RCM knowledge and day-to-day practice as recorded in the work order system.

The problem has been rectified in the maintenance department of Carbones de Cerrejón by using a unique “RCM Hierarchy Synchronization Tool”[1] for maintaining a dynamic link between the RCM knowledge base and the CMMS failure codes and category lists.

Age tracing

A second typical issue is related to the “age tracing” of components. RA is impossible unless the true working ages of important failure modes can be known at their life endings. In the maintenance information system, a complex system such as a haul truck, for example, is grouped into “major” and “minor” components.

Major components such as the engine are “traced” by the CMMS, for example Ellipse or SAP. A component ID assigned to a major component enables the recording of its historical information. Minor components, on the other hand, are not traced. That is, no ID is assigned to a minor component and its age history is not recorded.

This difference in procedures will cause a problem for a reliability engineer wishing to analyze significant failure modes that take place in an untraced component. The CMMS database structure has the capability to solve this problem by associating minor components with a traced parent component. However, this capability has been partially or entirely overlooked and often remains unused by CMMS implementers because its importance to RA has not been emphasized by the maintenance organization’s Reliability Engineers and Analysts.

Event Type

A third major impediment to RA is that failure mode endings are not well typed in the CMMS as one of Failure, Potential Failure, or Suspension[2]. RA cannot be performed without a sample of failure mode lifetimes whose endings are identified distinctly as either “by failure” or “by suspension”. This definition of a sample for RA is illustrated in the chart below.


 The left part of the diagram designates work orders issued chronologically. Each work order refers to a RCM Failure mode (e.g. 15 or 16) whose life ended by Functional Failure (FF), Potential Failure (PF), or by Suspension.  A sample cannot be generated from work orders in the form in which they are stored in the CMMS.  Each work order must be transformed into two events, a beginning and ending event as shown in right hand column of the diagram. In this form it can be seen that life cycles of failure modes span two work orders. RA is, simplistically speaking, the counting of life cycles that occur within a given calendar window[3]. RA accounts for both complete life cycles, represented by solid arcs, and partial (i.e. suspended) life cycles represented  by the dashed arcs.[4]

The Human Interface

The most difficult factor in the equation relating data to analysis is the human one. The technician is expected to describe a complex situation by selecting catalog values from a series of drop down list. Besides being tedious and prone to error the codes are often ambiguous, too general, or too specific to represent the reality on the ground.[5] A radically different type of user interface is required that will meet two criteria essential for achieving reliability from data. Firstly that failure modes (Object Part, Object Damage, Failure Cause) are selected accurately. Secondly, that any divergence between the observed Failure Mode, its Effects and Consequences and the knowledge base as encapsulated in the CMMS catalog lists are reconciled quickly and systematically.

A major part of the success of a pilot LRCM project is to uncover issues such as those described here. Once this is done RA will become a practical tool for providing effective (optimal) maintenance at minimal cost. The EXAKT/LRCM solution removes all four of these and other barriers to Reliability Analysis and to successful maintenance decision optimization.

© 2012 – 2014, Oscar Hoyos Vásquez. All rights reserved.

  1. [1]The RCM to CMMS sync function is part of the MESH LRCM (Living Reliability Centered Maintenance) software package.
  2. [2]A suspension is a renewal of a component, part, or failure mode for any reason other than failure.
  3. [3]The fleet operational hours over a sufficiently wide calendar window (sample) divided by the count of Failure instances in that same period yields the MTBF, the simplest and most immediately useful form of RA. The LRCM user interface includes a running calculation of MTBF adjacent to each Failure Mode in the knowledge tree displayed when populating a work order.
  4. [4]Mistaking Suspensions for Failures will yield an incorrect analysis resulting in an overly conservative decision model. Assume 6 items were renewed in the calendar window, 3 of which failed and 3 were suspended. The three failures occurred at 5, 7, and 9 months age. The three non failed items were renewed, that is they had their lives suspended at 10, 12, and 17 months of age. If we considered only the failures and ignored the suspended data that would be a “biased” sample from which we would predict a pessimistic reliability of MTBF = (5+7+9)/3 = 7 months. But the true MTBF must be greater than (5+7+9+10+12+17)/6. = 10 months, at least a 30% reliability prediction error (10-7/10).  If we were to mistakenly count the three Suspensions as Failures and there was still an unused residual operating life of 30 months in the suspended items that would result in a 33% underestimation of  reliability (15-10/15) error in MTBF prediction since (5+7+9+10+12+17+30)/6 = 15 months. It is preferable to report suspensions accurately and allow the statistical software to use probability calculations to estimate the reliability. For more information on how Suspensions are accounted for in the reliability computation see Real meaning of the RCM curves.
  5. [5]A common complaint of the Reliability Engineer is that the default “other” is overused by the technicians.
This entry was posted in Data and samples, Managing LRCM. Bookmark the permalink.
Subscribe
Notify of
1 Comment
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
trackback

[…] Living Reliability By Oscar Hoyos Vásquez provides a technical explanation of how the differences between RCM and CMMS systems can be bridged. […]