Maintenance Decision Support Pilot at Orica

IV DATA COLLECTION & SCREENING

The phrase “rubbish in equals rubbish out” could not be more applicable when attempting to analyze data using a software reliability analysis tool. With PHM based analysis, data quality is of great concern since the results of the analysis are intended to be used day-to-day for practical decision making.

The user of the software must have a good understanding of the equipment under analysis, its failure modes, and the monitored variables likely to be influential factors that reflect failure mode deterioration. The software confirms or refutes such assumptions using intensive computations based on statistical techniques. If the hypothesis that a CM variable is significant is confirmed (not rejected at the 5% significance level) by the software, the reliability engineer will obtain a probabilistic relation among:

  1. significant variables,
  2. working age, and
  3. component failure probability.

Subsequently, the method applies a “predictive” algorithm[1] in combination with the PHM to generate a Remaining Useful Life Estimation (RULE) model. Once developed and accepted the model will be deployed as an agent “watchdog” silently scanning condition monitoring data as it appears in designated database locations. The agent writes the results into a database table accessible to the Reliability Engineer and the Maintenance Manager via the normal CMMS reporting system.

A. Failure data

At the Laverton site the CMMS is used to raise work orders, issue permit to work forms and report equipment failure history. Fields available to be completed on work order closeout include “observations, cause, components and comments”. The data extraction process for the pumps found comments ranging from general statements such as “removed-damaged” to observations like “OK”.  In many cases there was no attempt to identify failure modes or causes (e.g. dry running or cavitation) or to distinguish between a potential failure and a suspension. We at Orica hasten to point out that the technicians themselves are not to “blame” for such communication gaps. CMMS trainers focus on the mechanics of manipulating the software rather than on encouraging precise Reliability Centred Maintenance (RCM) styles of expression of field observations into readable descriptions of as-found equipment state. As a result, given the pride in their work held by technicians, their commentary text includes largely descriptions of “what I did” and fewer descriptions of “what I found”.  Both, of course, are required for Reliability Analysis.

The data available from the CMMS was, therefore, unsuitable for loading directly into the reliability analysis software. Two obstacles were encountered. Firstly, the structure of the CMMS data is not that which is needed for generating a sample. A sample of life cycles (discriminating between ending-by-failure and ending-by-suspension) is necessary before reliability analysis can be performed. The issue was resolved relatively easily using data mapping and transformation algorithms (illustrated in Figure 5).

Figure 5 Transformation of CMMS data to a Sample for Reliability Analysis

Because a sample is a collection of life cycles it is impossible to develop a sample directly from the CMMS’s structural representation of work order history.  Figure 5 above indicates that the data in the CMMS must be transformed to a structure wherein life cycles are identifiable and countable. Both complete and partial (suspended) life cycles in the sample must be accounted for by the reliability analysis procedure or software. Furthermore the best way to ensure an unbiased sample is to select two points in calendar time that define the sample window. One selects the window width such that there are a sufficient number of life cycles for analysis. Sufficiency depends on several factors one of which is how closely the condition monitoring data reflects the true health state of a given failure mode. External variables reflecting operating context within mixed populations should be identified and accounted for in the model.

The second obstacle, on the other hand, is far more daunting. In some cases it was difficult to determine if the pump had failed or if the work order represented a suspension. Mistaking a suspension for a failure will mislead the analysis and modelling into mistakenly associating preventive repair with failure. That is, the model will “try” to correlate values of condition monitoring variables occurring at a time when the component is actually in good condition, with a failure event. This will have the effect of introducing scatter (i.e. lowering confidence) in the model’s predictive capability.

The most basic data requirement, therefore, of reliability analysis (Weibull, PHM, and most others) is to distinguish between failure and suspension when reporting the as-found condition of each significant failure mode encountered during the execution of a work order.

B. Working Age

In any reliability study the working age of the equipment is important. Working age is a reference line measuring the accumulated usage of, or stress on, a component. The engineering units selected for working age should reflect the accumulated normal wear and tear on the component. Calendar age is appropriate when the equipment operates more or less uniformly. Energy consumed or production units delivered often provide a better indication of true working age. Pump operating hours were not easily available and had to be estimated based on the date of the work order and known operating practices for the pumps. For example, the two catholyte pumps shared the same duty and swapped from online to standby every two weeks. Knowing this, the working age could be estimated based on calendar dates, average plant uptime, and 50% run time. The other two pumps ran continuously and the working life was based directly on the work order’s date.

C. Vibration data

Condition monitoring (CM) on site at Laverton has been used for over 9 years. This includes vibration analysis (VA) of all pump sets, fans and compressors. The CM is conducted by a specialist external contractor. Critical drives that have standby redundancy are changed over regularly to ensure that they still run. Standby units are started up to perform VA. The VA data is compiled by the Contractor and an executive summary forwarded to Laverton each fortnight.

The VA report attributes to each of the rotating machines a performance rating of “1” to “5”. When machines reach level 3 we begin to monitor closely, at level 4 we plan to replace at next opportunity and if level 5 we would replace immediately. No “scoreboard” is kept to tally hits, misses, and false alarms by this condition monitoring program. (Doing so, in a “Living RCM”[2] project, is an important conclusion of this study.)

If the VA reported equipment condition is so severe and a decision to replace is made, it will have significant production impact. An example is a magnetic drive pump motor on the Catholyte system that was exhibiting excessive noise. A decision was made to replace the motor rather than risk an unplanned trip (potentially occurring only hours later).

When faced with a decision to shutdown and replace an item the level of confidence in making that decision is, for the aforementioned reasons, not known.  Some pumps have been known to run for extended periods at high vibration levels without the need for replacement. This implies that factors, others than those reported by VA, influence failure probability. It is incumbent then, upon the organization and its reliability engineers, to identify, through observation and analysis, those internal and external factors likely to influence production and profitability.

D. Operational History

Another source of information was obtained from the plant Distributed Control System (DCS) alarm history logs. This source of data assisted with confirming pump working ages by flagging stop and start events.

V DATA CLEANSING & TRANSFORMATION

A. Data Cleansing

Before using reliability analysis software a number of steps are required to cleanse and transform the data.

  1. Prepare or update the Failure Mode and Effects Analysis (FMEA) for pump and motor. The FMEA constitutes a “knowledge base” each record of which describes a failure mode whose behavior is to be determined by the “counting up” (i.e. basic reliability analysis) of the work order instances of that failure mode.
  2. Identify the failure modes from the work orders and link them to the FMEA. Each link represents an ending and beginning event in the sample (see Figure 5).
  3. Correlate VA data to the pump failure and suspension events using a technique such as PHM. Refer to Table 3.
  4. Ensure that any PM activities are properly allocated to either suspension or failure events of the pumps.
  5. Use the DCS recorded stop/start events, if necessary, to determine pump working age at each life event (i.e. work order).
  6. Before modelling use the data validation function in the software to locate, repair or eliminate erroneous and illogical data. A common example of the latter would be an Event or Inspection record containing a working age at a later date that is lower than a working age at an earlier date.
  7. Create beginning events where life cycles began prior to start date of the sample window.
  8. Ensure that failures and suspensions are accurately identified and distinguished. Confirmed “potential” failures should be counted as failures. Well discussed maintenance department standards should distinguish failures from suspensions.

Surprisingly, FMEA for the basic magnetic pump and even the standard induction motor were difficult to find in the public domain. Many references to the method are available however no specific analysis could be found. The study developed a FMEA model (by mining the work order history) for the pumps and motors and this was used to identify the significant failure modes of interest. One of the surprising outcomes from the pump analysis was that pump failures were largely related to operational factors rather than to intrinsic mechanical defects.

The next step was to link the CMMS data associated with failures with the different failure modes from the FMEA. Refer to Table 2. One important step is to assign the work order history with beginning and end dates for pump or motor events, paying particular attention to failures or suspensions[3]. Refer to Table 4.

Table 2 Some failure mode data from the work orders
Table 3 Vibration data
Table 4 Some work order records with RCM reference and Event type indicated

VI RESULTS

The study identified that vibration variables did not strongly associate with the failures reported. In fact the results indicate that most failures were due to operational techniques, rather than to mechanical deterioration. This is considered a valuable finding of the study as it does indicate the area on which to focus asset management training as well as the CBM program itself. For the former, a lesson is to spend more time in training operators on correct pump operation.  For the latter, we may examine the returned value of the VA program. It is recommended to track (through a living RCM process) VA’s good and bad calls in order to have an evaluation of the program’s predictive performance. Such an evaluation, consistent with continuous improvement, will result in more a more effective CBM program. The objectives of improvement are

  1. Discrimination of failure and suspension leading to more dependable decision models, and
  2. determination of vibration or other condition monitoring extracted features that reflect actually occurring failure modes.

VII CONCLUSIONS

This was a preliminary study based on a small sample of pumps operating over only a two year period. The authors intend to expand the sample and to apply the lessons learned relative to the management of failure and suspension data, particularly the following:

  1. Improve the management of the work order to account for RCM relationship by identifying both the failure mode and the Event type (either PF, FF, or S) on the work order.
  2. Report the failure mode as a reference to a FMEA record where it is defined in the context of the Function, Functional Failure and Effects.
  3. Include in the free text field of the work order both “what was found” as well as “what was done”.
  4. Update the FMEA dynamically, based on day-to-day observations surrounding the execution of a work order. The work order free text should, to the extent justified, support updates to  the Effects text field of a FMEA/RCM record in a continuous process of knowledge refinement. The work order free text should be examined by the reliability engineer in order to expand the Effects text of the RCM knowledge base to cover all reasonably likely situations that may arise in the course of the enterprise’s operations. Feedback and exchange of these concepts with the technicians should occur regularly.

ACKNOWLEDGEMENT

This paper acknowledges the following people:
For support and encouragement provided by M Wiseman and Dr D Lin (OMDEC),
For assistance from C Hill (CMA) and S Mustadanagic (Iwaki)
For suggestions from Dr. Naaman Gurvitz (Clockwork Solutions)

REFERENCES

  1. http://www.reliasoft.com/products.htm
  2. http://www.plant-maintenance.com/
  3. http://www.isograph-software.com/
  4. Weibull, W. (1951), “A statistical distribution function of wide applicability”, J. Appl. Mech.-Trans.ASME 18 (3): 293–297 .
  5. Jardine A.K.S., Banjevic D., Wiseman M.., Buck S. and Joseph T. “Optimising a Mine Haul Truck Wheel Motors’ Condition Monitoring Program: Use of Proportional Hazards Modelling”,  http://www.omdec.com/moxie/About/cases/
  6. Living RCM and EXAKT, https://www.livingreliability.com/en
  7. Mixed Populations Mathematical Basis, Naaman Gurvitz, Clockwork Solutions Inc. http://www.clockwork-solutions.com
  8. http://www.omdec.com/wiki/tiki-index.php?page=The+elusive+PF+interval

© 2011, Ron Jenkins. All rights reserved.

  1. [1]From the historical records of condition monitoring data, the past transitions from each state to all other states can be compiled in a matrix and the probabilities of each transition can be thus determined. These probabilities when combined with the Proportional Hazard Model will yield a failure prediction. For a detailed explanation and more information see Ref 8.
  2. [2]“Living” RCM (LRCM) is a dynamic process whereby work orders are linked to RCM/FMEA knowledge records, each link constituting a data point in a sample for reliability analysis. Secondly, the RCM/FMEA records should be updated as each work order reveals new knowledge about a failure and its effects. Ref 6.
  3. [3]This step including preparation of the Events table should be automated through a “living” RCM process and supporting software. Ref. 6
This entry was posted in Case studies and tagged , . Bookmark the permalink.
Subscribe
Notify of
0 Comments
Inline Feedbacks
View all comments