A Generalized Delay-timer for Alarm Triggering

Similar documents
An Introduction to Alarm Analysis and Design

Reduction of alerts in automated systems based on a combined analysis of process connectivity and alarm logs

Methodology and Application of Pattern Mining in Multiple Alarm Flood Sequences

Computers and Chemical Engineering

Microgrid Fault Protection Based on Symmetrical and Differential Current Components

ON-LINE SENSOR CALIBRATION MONITORING AND FAULT DETECTION FOR CHEMICAL PROCESSES

Detection of Temporal Dependencies in Alarm Time Series of Industrial Plants

Advanced Digital Signal Processing Part 4: DFT and FFT

Dynamic Simulation of Double Pipe Heat Exchanger using MATLAB simulink

Session Ten Achieving Compliance in Hardware Fault Tolerance

A Review of Hydronic Balancing of Cooling Water Circuit

CHAPTER 7 APPLICATION OF 128 POINT FFT PROCESSOR FOR MIMO OFDM SYSTEMS. Table of Contents

Real Time Pipeline Leak Detection on Shell s North Western Ethylene Pipeline

Rule-based reduction of alarm signals in industrial control 1

The Use of Fuzzy Spaces in Signal Detection

Effective Alarm Management for Dynamic and Vessel Control Systems

Temperature Control of Heat Exchanger Using Fuzzy Logic Controller

Implementing a Reliable Leak Detection System on a Crude Oil Pipeline

doi: / _57(

Smart Defrost Control for Refrigeration System

Intrusion Detection System: Facts, Challenges and Futures. By Gina Tjhai 13 th March 2007 Network Research Group

Fire Risks of Loviisa NPP During Shutdown States

False Alarm Analysis of the CATM-CFAR in Presence of Clutter Edge

NEW CD WARP CONTROL SYSTEM FOR THE CORRUGATING INDUSTRY

Design and Analysis of Safety Critical Systems

Module Features are-configurable, no module jumpers to set

Computer Modelling and Simulation of a Smart Water Heater

Fire Detection Using Image Processing

SIL DETERMINATION AND PROBLEMS WITH THE APPLICATION OF LOPA

Performance of an Improved Household Refrigerator/Freezer

COCB_ Circuit Breaker (2 state inputs/ 2 control inputs)

A Model of the Population of Fire Detection Equipment Based on the Geometry of the Building Stock

A Forest Fire Warning Method Based on Fire Dangerous Rating Dan Wang 1, a, Lei Xu 1, b*, Yuanyuan Zhou 1, c, Zhifu Gao 1, d

Official Journal of the European Union. (Non-legislative acts) REGULATIONS

SYNERGY IN LEAK DETECTION: COMBINING LEAK DETECTION TECHNOLOGIES THAT USE DIFFERENT PHYSICAL PRINCIPLES

Fuzzy Ventilation Control for Zone Temperature and Relative Humidity

Statistical Detection of Alarm Conditions in Building Automation Systems

Detection of Sensor Faults in Autonomous Helicopters *

White Paper: Video/Audio Analysis Technology. hanwhasecurity.com

AN ANALYSIS OF THE PERFORMANCE OF RESIDENTIAL SMOKE DETECTION TECHNOLOGIES UTILIZING THE CONCEPT OF RELATIVE TIME

Implementation of Auto Car Washing System Using Two Robotic Arms

C&I SYSTEM DIAGNOSTICS WITH SELF MONITORING AND REPORTING TECHNOLOGY (SMART)

Estimating the Level of Free Riders in the Refrigerator Buy-Back Program

Brief Contributions. Vulnerability of Sensor Networks to Unauthorized Traversal and Monitoring 1 INTRODUCTION 2 PROBLEM FORMULATION

Video Smoke Detection using Deep Domain Adaptation Enhanced with Synthetic Smoke Images

Numerical Stability Analysis of a Natural Circulation Steam Generator with a Non-uniform Heating Profile over the tube length

An improved Algorithm of Generating Network Intrusion Detector Li Ma 1, a, Yan Chen 1, b

Compression of Fins pipe and simple Heat pipe Using CFD

Performance Neuro-Fuzzy for Power System Fault Location

Performance Comparison of Ejector Expansion Refrigeration Cycle with Throttled Expansion Cycle Using R-170 as Refrigerant

Some Modeling Improvements for Unitary Air Conditioners and Heat Pumps at Off-Design Conditions

SIMULATION ANALYSIS ON THE FRESH AIR HANDLING UNIT WITH LIQUID DESICCANT TOTAL HEAT RECOVERY

Modeling of Ceiling Fan Based on Velocity Measurement for CFD Simulation of Airflow in Large Room

EC Plug Fans EC FanGrid Retrofits for High Efficiency Air Handling Units

Fault Isolation for Spacecraft Systems: An Application to a Power Distribution Testbed

Simulation Of Pneumatic Drying: Influence Of Particle Diameter And Solid Loading Ratio

Smart Combiners Installation Guide. For Obvius A89DC-08 sensor modules

Experimental Study on the Effects of Compression Parameters on Molding Quality of Dried Fish Floss

Applied Data Science: Using Machine Learning for Alarm Verification

Advanced Pattern Recognition for Anomaly Detection Chance Kleineke/Michael Santucci Engineering Consultants Group Inc.

Link loss measurement uncertainties: OTDR vs. light source power meter By EXFO s Systems Engineering and Research Team

WHITE PAPER. ANSI/AHRI Standard for Fan and Coil Evaporators - Benefits and Costs

Algorithmic Generation of Chinese Lattice Designs

Installer Manual KNX Touchscreen Thermostat

PACSystems* RX3i. Thermocouple Input Module, 12 Channels, IC695ALG412. GFK-2578B October 2011

Lesson 25 Analysis Of Complete Vapour Compression Refrigeration Systems

Finned Heat Sinks for Cooling Outdoor Electronics under Natural Convection

MODELLING AND OPTIMIZATION OF DIRECT EXPANSION AIR CONDITIONING SYSTEM FOR COMMERCIAL BUILDING ENERGY SAVING

Heat transfer and heating rate of food stuffs in commercial shop ovens

ZONE MODEL VERIFICATION BY ELECTRIC HEATER

Using Neural Networks for Alarm Correlation in Cellular Phone Networks

Event Detection System (EDS)

A Study on Process Capability on Connecting Rod in Assembly Line

COMMISSION DELEGATED REGULATION (EU) No /.. of

(Refer Slide Time: 00:00:40 min)

COMPARISON OF NFPA AND ISO APPROACHES FOR EVALUATING SEPARATION DISTANCES

The SIL Concept in the process industry International standards IEC 61508/ 61511

NFPA 85 COMPLIANCES OF BMS: A CASE STUDY OF BOILER CONTROL AT SBM OFFSHORE MALAYSIA COMPANY 1. AHMED ABOUELRISH 2 Universiti Teknologi Petronas

Research on the Monitor and Control System of Granary Temperature and Humidity Based on ARM

Research on Decision Tree Application in Data of Fire Alarm Receipt and Disposal

A Energy Efficient Approach to Processing Spatial Alarms on Mobile Clients

A Study on Improvement of Fire Protection Systems based on Failure Characteristics accodrding to Yearly Variation in Old Commercial Buildings

Fin-Fan Control System

Best Row Number Ratio Study of Surface Air Coolers for Segmented Handling Air-conditioning System

INDOOR CLIMATE IN HEATING CONDITION OF A LARGE GYMNASIUM WITH UNDER-FLOOR SUPPLY/RETURN SYSTEM

Ovation Alarm Management System

Reduced Order WECC Modeling for Frequency Response and Energy Storage Integration

Q&A Session from Alarm Management Workflow Webinar (Apr.24/2013)

SIMPLE METHODS FOR ALARM SANITATION

Modeling and Simulation of Axial Fan Using CFD Hemant Kumawat

COST-EFFECTIVE FIRE-SAFETY RETROFITS FOR CANADIAN GOVERNMENT OFFICE BUILDINGS

Study and Design Considerations of HRSG Evaporators in Fast Start Combined Cycle Plants. Govind Rengarajan, P.E.CEM, Dan Taylor

Fuzzy Logic Based Coolant Leak Detection

Numerical Simulation of Fluid Flow and Heat Transfer in a Water Heater

Simple Equations for Predicting Smoke Filling Time in Fire Rooms with Irregular Ceilings

EE04 804(B) Soft Computing Ver. 1.2 Class 1. Introduction February 21st,2012. Sasidharan Sreedharan

[Patil* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

Cooperation of a Steam Condenser with a Low-pressure Part of a Steam Turbine in Off-design Conditions

Optimized Residential Loads Scheduling Based On Dynamic Pricing Of Electricity : A Simulation Study

Supporting Information

Transcription:

2012 American Control Conference Fairmont Queen Elizabeth, Montreal, Canada June 27-June 29,2012 A Generalized Delay-timer for Alarm Triggering Naseeb Ahmed Adnan, Yue Cheng, Iman Izadi and Tongwen Chen Abstract-In process plants, alarms are configured to notify operators of any abnormalities or faults. However, in practice a majority of raised alarms are false or nuisance and create problems for operators as they face an increasing number of alarms to handle. Adding delay-timers is a simple technique that can reduce this problem and is widely exercised in industry. In this work we propose a generalized delay-timer framework where instead of consecutive n samples in the conventional case, nl out of n consecutive samples (nl :S n) are considered to raise an alarm. For the generalized delay-timer, two important performance indices, namely, the false alarm rate (FAR) and the missed alarm rate (MAR) are calculated using Markov processes. Also the performance of generalized delay-timers is compared with conventional delay-timers. Keywords: Alarm systems; Alarm management; Western electric rules; Fault detection; Markov processes; Delay-timers. I. INTRODUCTION The complexity and degree of automation of technical processes are continuously increasing with growing demands for plant reliability, higher efficiency, better performance and product quality. Due to significant advancement in communications and computer technologies nowadays a large number of process variables are monitored during plant operations. A downside of this advancement, however, is the large increase in the number of configured alarms and a complex alarm systems, which often causes difficulties to operators during abnormal events. For example, the Three- Mile Island nuclear power plant was designed with ability to provide alarms for every small problem [1]; but during the accident, operators were flooded with information, much of it irrelevant and misleading. Therefore an efficient alarm system is of paramount importance. Alarms indicating faults should be unmistakable and for each cause there should be no more than one alarm [2], [3]. According to ISA 18.2 and EEMUA standards, an operator should not receive more than six alarms per hour during normal operation of the plant [4], [5]. But this is hardly the case in practice and operators normally receive far more alarms than these standards suggests [3]. A single fault may give rise to a complex alarm situation due to consequences of the fault triggering other alarms [6]. There is no single solution to improve alarm systems; however a few improving techniques can be applied in different areas, e.g. multivariate process monitoring, state-based priority setting, model-based process monitoring, threshold design and data processing [2], [3], [7]. This work was supported by NSERC and Alberta Innovates - Technology Futures Naseeb Ahmed Adnan, Yue Cheng and Tongwen Chen are with Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta, Canada, T6G 2V4. naseeb@ualberta. ca, cheng5@ualberta.ca, tchen@ualberta.ca Iman Izadi is with Honeywell Process Solutions,# 1800, 10405 Jasper Ave, Edmonton, Alberta, Canada, T5J 3N4. iman@ualberta. ca We concentrate on the latter and focus in this paper a specific technique, namely the delay-timer. Fault detection and isolation (FDI) forms a very active area of research, mainly on the basis of mathematical or knowledge based models. Several model-based FDI methods were developed in the last decades [8], [9]. The main focus in model-based approach is to obtain a proper mathematical model which is relatively difficult to do for complex plants. An alternative to model-based methods of fault detection is the signal-based methods. In signal-based fault detection, process variables are directly monitored and compared with some thresholds (or alarm limits) [8], [9]. Signal-based methods of process monitoring are the most commonly used techniques in industry and are readily implementable on almost all modern distributed control systems (DCS). The signal based methods based on simple limit checking have the advantage of simplicity and ease of implementation. However, incorrect settings of the alarm limits or thresholds may result in problems of false, missed and nuisance alarms [2], [3], [10], [11]. An alarm that is raised in the faultfree operation of a plant is known as a false alarm. An alarm that is true but redundant as operator has already been notified of the fault by another alarm is a nuisance alarm. On the other hand the situation where an alarm is not raised in the presence of a fault is known as a missed alarm. For the univariate alarm systems, the performance can be assessed by the false alarm rate (FAR) and missed alarm rate (MAR), which provide a measure of alarm systems accuracy (sensitivity). In this work, the focus of study is signal processing based univariate alarm systems. For the proposed generalized delay-timers, alarm systems are modeled using Markov processes [3], [10]. Also false alarm rates and missed alarm rates for such systems are calculated. Comparing receiver operating characteristic (ROC) curves for the conventional and proposed generalized delay-timers, we also see that the proposed generalized delay-timer is less sensitive to changes in design parameters. Section II discusses the conventional delay-timers. Rules proposed by the Western Electric Company are discussed in section III. The generalized delay-timer is described in section IV. Section V is devoted to the algorithm to formulate the Markov process for the generalized case. Comparison of performance and sensitivity analysis are provided in section VI. Finally, some concluding remarks are given in section VII. II. CONVENTIONAL DELAY-TIMER Generally alarms are raised and cleared based on a single sample comparing with the threshold. For high alarms, if 978-1-4577-1096-4/12/$26.00 2012 AACC 6679

Fig. 1. Process data and corresponding PDFs. the sample is above the threshold, an alarm is raised; and when it goes below the threshold the alarm is cleared. A similar idea is applied in the case of low alarms. However, often this simple limit or threshold checking is not considered as a good design due to the problem of nuisance and false alarms [3]. To reduce nuisance and false alarms, techniques like delay-timers, deadbands, and filters are widely used in industry. The idea of delay-timers is very simple yet effective. In alarm systems two types of delay-timers are used: on-delay and off-delay [3], [10]. For on-delay timers, an alarm is only raised if n consecutive samples cross the threshold. In other words, the process measurement remains in the alarm state for n units of time before activating the alarm [4]. With off-delay configured, a raised alarm will only be cleared if a number of consecutive samples fall below the threshold. The Markov process for 3-sample on-delay and 3-sample off-delay is shown in Fig. 2. Here PI is the probability of one sample above the threshold in fault-free region of operation (we only consider the case of high alarm for simplicity). Therefore the probability of one sample falling below the threshold is, 1 - PI (Fig. 1). In cases where deadbands are applied with or without delay-timers, PI + P2 =1= 1 [10]. Assume initially the process is in fault-free region of operation and no alarm is raised, which is the state NA in the Markov process. If the next sample crosses the threshold with probability PI, the Markov process moves to the first intermediate N Al state and still no alarm is raised; it will require another 2 consecutive samples to raise the alarm. If the next sample is below the threshold (with probability 1- PI), the Markov process will return to the original NA state. In this case, the sample temporarily exceeding the threshold is treated as an outlier sample. On the other hand if the next sample is above the threshold (with probability PI), the Markov process would move to the second intermediate state (NA 2 ) from the first intermediate state (NAI). This process will carry forward until it reaches the alarm state (A) when an alarm will be activated. With on-delay configured, if N Ai (where i == 1, 2,...,n - 1) denotes intermediate states in the Markov process, any sample with probability 1- PI will cause the system to go back to the state N A. A similar process is followed in off-delay for clearing a raised alarm. Similar to the fault-free region, there will be another Markov process in the faulty region with probability q2 and ql (see Fig. 1 for definitions). In this paper, it is assumed that the probability density functions (PDFs) of the fault free and faulty data are known. Therefore enough relevant data points must be available to estimate these distributions that can be obtained from p= Fig. 2. Combined on and off-delay for n = m = 3 historical process data. Moreover, the time series must be stationary and the process data is independently and identically distributed (1.1.0.). Although these two assumptions are restrictive to some extent, they are not impractical [3], [10], [12]. Other cases of non-stationary distributions will be discussed elsewhere. A. False alarm rate The most common practice in process monitoring is to compare a variable with thresholds (also known as alarm limits or trip points) to detect out of range condition. For the process data shown in Fig. 1, some part of the probability distributions of the fault-free data falls over the threshold and will cause false alarms (also known as Type I error or false positive). Probability of false alarms for different cases of delay-timers are discussed below: 1) n-sample on-delay: For a conventional n samples or regular on-delay (in Fig. 2 only alarm state A will be present and intermediate alarm states Al and A 2 will be removed), Markov processes are used to calculate the probability of false alarms. The transition probabilities from one state to another is given in the following matrix I - PI PI 0 0 I - PI 0 PI 0 I - PI 0 0 P2 0 0 PI I - P2 Here P is of dimension (n + 1) x (n + 1). For the system operating in fault-free region, probability of false alarm is JID(A); which is the probability of alarm state in the steady state and can be calculated according to the theory of Markov chains [13]. It can be shown that the probability of false alarm (the steady-state probability of the alarm state A) is JID(False Alarm) = JID(A) = ----- Per: + P2 L pi i=o 2) m-sample off-delay: An alarm off-delay requires consecutive m samples below the threshold to clear an alarm (in Fig. 2 only no alarm state NA will be present and intermediate no alarm states N Al and N A 2 will be removed). Assume the system is in normal operation and any alarm raised will be a false alarm. According to the Markov process, probability of false alarm is the sum of all alarm states; i.e., PI L j=o JID(False Alarm) = JID(A + Al +... + A) = ----- +PI L j=o (1) (2) (3) 6680

3) Combined on and off-delay: The Markov process for both on and off-delay applied together is shown in Fig. 2 for n == m == 3. The false alarm rate with n-sample on-delay and m-sample off-delay is given by JID(False Alarm) = JID(A + Al +... + A) = -------- (4) L qi + q; L q{ i=o j=o PI L.J P2 P2 L.J PI i=o j=o B. Missed alarm rate Missed alarms (also known as type II error or false negative), occur when the system is in faulty state of operation but an alarm is not raised. From the process data in Fig. 1, it can be seen that by lowering the threshold, the missed alarm rate can be decreased. However it will increase the false alarm rate. Therefore it leads to the problem of balancing between the false alarm rate and the missed alarm rate [3], [10]. In section VI, this trade-off will be discussed. Calculation of missed alarm rates for different cases of delay-timers are similar to that of false alarm rates. To calculate the missed alarm rate, the probability density function of the faulty data should be used (Fig. 1, solid line). Here q2 is the probability of one sample above the threshold and ql (or 1- q2) is the probability of one sample below the threshold in the abnormal region. In the case of n-sample on-delay and m- sample off-delay, it can be shown that the probability of missed alarms is L qi i=o JID(Missed Alarm) = -------- (5) III. WESTERN ELECTRIC RULE Based on the Shewhart control limits the Western Electric Co. proposed the tests for instability in 1956. These tests are also known as the Western Electric Rules [14]. In the tests, the area between the centerline (mean) of the process data and one of the thresholds (either upper threshold or lower threshold which is 3a away from the mean, where a is the standard deviation) is divided into three zones; each zone has a width of a According to the rules, a pattern is unnatural and needs attention if any of the following combinations are formed: 1) One sample falls outside of the three sigma threshold. 2) Two out of three successive samples fall in zone A or beyond. 3) Four out of five successive samples fall in zone B or beyond. 4) Eight successive samples fall in zone C or beyond. The idea of test 1 is the same as a simple threshold checking. Here probability of false alarm is PI and probability of missed alarm is ql based on a single sample. Tests 2 and 3 are similar to the generalized delay-timer (to be discussed in section IV) for 2-out-of-3 and 4-out-of-5 on-delay timers. Test 4 is similar to the conventional delay-timer (as discussed in section II) for an 8-sample delay-timer. Tests 2 and 3 suggest that from the industrial point of view, application of a generalized delay-timer is not impractical. Rather it has certain advantages over the conventional case, e.g., in Fig. 3. Generalized on-delay with nl = 3, n = 4 section VI, it is discussed that the generalized delay-timer is less sensitive to changes in thresholds than the conventional one. IV. GENERALIZED DELAY-TIMER Sometimes conditions to raise an alarm in the conventional delay-timer are difficult to satisfy. For example, consider a system with 10 samples on-delay. When the system enters from the normal state to the faulty state of operation, consecutive 10 samples must cross the threshold to raise the alarm. It is very likely to have some delay in raising the alarm during this transition as first 10 samples may not satisfy the strict requirement. However if the condition is relaxed to for instance 8 out of 10 samples crossing the threshold, intuitively the probability of early detection is higher. Accordingly we propose the following two definitions: nl outofn samples generalizedon-delay: an alarm will be raised if nl out ofn consecutive samples cross the threshold. ml out of m samples generalized off-delay: an alarm will be cleared if ml out of m consecutive samples fall below the threshold. The Markov process for nl == 3 and n == 4 is shown in Fig. 3 where 3 out of 4 samples are required to raise an alarm and no off-delay is configured. It can be seen that in the generalized on-delay, the total number of intermediate no alarm states (NAi) is 5. Compared to the conventional 3 samples on-delay case (NAI, N A 2 in Fig. 2), this requires more intermediate states. The reason of this requirement can be explained by Table I. Assume initially the alarm is cleared and the Markov process is in the state N A. To raise an alarm the Markov process has to reach state A with the condition that 3 samples out of 4 have probabilities Pl. Unlike conventional on-delay where there is a single path to reach state A from state NA (Fig. 2), the Markov process here can achieve this in three different paths. Each row of Table I represents a path. The first element of each row is PI, which is required to move to the first intermediate state N AI. From N Al depending on probability of next sample it can either move to state NA 2 (with probability PI) or to state NA 3 (with probability 1- PI). This process will carry on until the Markov chain reaches state A. Since the number of paths is increased in the generalized on-delay compared to the conventional on-delay, the number of intermediate states has also been increased. TABLE I POSSIBLE PATHS TO REACH FROM STATE NA TO A. (*) DENOTES NOT APPLICABLE Sample 1 Sample 2 Sample 3 Sample 4 PI PI PI * PI P2 PI PI PI PI P2 PI 6681

p= Fig. 4. 1- PI PI 0 o 0 1 - PI 2 out of n generalized on-delay It is easy to see that, for nl out of n on-delay, there can be a maximum of (2 n 1 - - 1) intermediate states. However some of these states are unreachable and not shown in Fig. 3. Due to the possibility of several paths, generalized delaytimer calculation is more complex than the conventional case. Therefore with larger n, computational complexity increases significantly and it is difficult to provide a general representation of the Markov diagram for all combinations. Below some special cases of generalized delay-timers (2 out of n samples on-delay and 2 out of m samples off-delay cases) are discussed. In section V, a numerical algorithm to calculate the FAR and MAR in the general case is given. A. False alarm rate 1) 2 out ofn samples on-delay: Consider a 2 out of n ondelay, where n 2. The Markov process for the system is illustrated in Fig. 4. To calculate the probabilities of false alarms, Markov processes are used. The general Markov process for nl == 2 and any n is shown in Fig. 4. The corresponding transitional probability matrix is o 0 0 1- PI PI 1- PI 0 0 PI P2 0 0 0 1 - P2 Here P is of dimension (n + 1) x (n + 1). It can be shown that the probability of false alarm in the steady state is PI(1 - p ) JID(False Alarm) = I 2 I (7) PI(I- ) + P2(2 - ) 2) 2 out of m samples off-delay: Similar to the generalized on-delay, generalized off-delay will require ml out of m samples below the threshold to clear an alarm. As a special case, the Markov process for 2 out of m samples is considered (Fig. 5). In the fault-free region of operation, any 2 samples out of m will clear an already raised alarm. The probability of false alarm in steady state is PI(2-p m - l) JID(False Alarm) = I I I (8) PI (2 - p;n- ) + P2(1 - p;n- ) 3) Combined on and off-delay: Similarly, if both 2 out of n samples on-delay and 2 out of m samples off-delay are applied together, the probability of false alarm is (2 )(1 ) JID(False Alarm) = PI - PI - P2 PI(2 - p;)(1 - + P2(1 - p;)(2 _ (9) 6682 o o o PI (6) Fig. 5. 2 out of m generalized off-delay B. Missed alarm rate Calculation of the missed alarm rate is similar to that of the false alarm rate. But the probability density function of the abnormal data should be used. The probabilities of missed alarm for different cases of generalized delay-timers are as follows: 1) 2 out of n samples on-delay: ql(2- qn-l) JID(Missed Alarm) = I I I (10) ) 2) 2 out of m samples off-delay: q: (1 _ q) JID(Missed Alarm) = I 2 I (11) ) 3) Combined on and off-delay: (2 )(1 ) JID(Missed Alarm) = ql - ql - q2 q2(1 - - + ql(2 _ _ (12) C. Resetting Markov processes For the generalized delay-timer, in certain cases it may appear that there is a conflict of logic. Consider a 3 out of 4 samples on-delay timer for the series of samples shown in Fig. 6. The markov process for this example is shown in Fig. 3. Assume initially there is no alarm. Conditions to raise an alarm is satisfied at sample 4 and the Markov process moves to state A. Then the alarm is cleared at sample 5 since it has probability 1- PI to fall back within the threshold and the Markov process moves back to the state N A. Once it goes back to state N A, the next window to check for conditions of alarm raising will start from sample 6; we call this window shifting resetting of the Markov process. In this work resetting is important as otherwise it may cause conflict of logic. For example, if we consider the red rectangle in Fig. 6 it may appear that there should be an alarm at sample 6; however, it is not true as resetting is done. V. NUMERICAL CALCULATION OF FAR AND MAR The calculations of FAR and MAR are based on the information of the transition matrix (P). Except for some special cases discussed in earlier sections, there is no closed form solution so far for the general nl out of n samples on-delay and ml out of m samples off-delay. In this section we propose an algorithm that can numerically generate the Fig. 6. Resetting of Markov process

transition matrix, which then can be used to calculate FAR and MAR. It can generate and reduce the transition matrix for an nl out of n on-delay and ml out of m off-delay timer whose one sample above threshold probability is Pl. The algorithm comprises three phases: the generating phase, the discarding phase, and the lumping phase. In the generating phase, a full size transition matrix is generated, in the discarding phase, the transient states are discarded, while in the lumping phase, states are lumped to the coarsest partition and thus we obtain the reduced size transition matrix. A. Generating phase To generate the full size transition matrix, we firstly define the description sequence of a state. Definition: An Z == max(n, m) bits binary sequence is called a description sequence for the nl out of non-delay and ml out of m off-delay timer. The first bit of the sequence shows whether currently it is an alarm or no-alarm situation; while the rest bits provide the one sample above threshold information of the current sample and Z - 2 past samples. Obviously, there are totally 2 l states. Thus, we can obtain a Markov chain with 2 l states. For each state, it is easy to determine the next state to move to in the cases that the next sample is greater and smaller than the threshold, respectively. In other words, each state has two successor states with probability PI and 1 - PI, respectively. After this procedure, an 2 l x 2 l full size transition matrix is established. To make it clearer, we provide a simple example: A segment of the values of a process variable is [1.3,3.5,5.7,2.6,10.2,11.3]. The threshold is 8, and the delay timer rule is 2 out of 3 on-delay. In this case, we need a total of 2 3 == 8 states. At instant 4, the current and previous values, namely, 5.7 and 2.6, are both less than the threshold, and it is in the no-alarm situation. As a result, the description sequence is '000'. At instant 5, although the new value 10.2 exceeds the threshold, there is only 1 out of 3 values greater than the threshold. Hence it is still in the no-alarm situation. So the description sequence is ' 001'. When the last value 11.3 comes, 2 out of 3 values are greater than the threshold, which raises the alarm. Therefore, the description sequence changes to 'Ill'. One requirement we add on the alarm rule is that the Markov chain represented by the full size transition matrix should only have one ergodic set which is regular, and all the transient sets include only one state. This requirement is reasonable since the rule should guarantee that the probability that the Markov chain moves to the ergodic set in finite steps is 1. B. Discarding phase In this phase transient states are discussed. Based on the requirement we mentioned above, the discarding procedure is straight forward: Step 1: Search for such columns ik in the transition matrix that all the elements in the column are zero. Assume that K -# a columns, {ii, i 2,,i K }, are found, initialize the discarding states list to {i 1, i 2,...,iK }. Step 2: For an index i k that appears in the discarding states list, set the two non-zero entries in the ik-th row to zero, and check those two columns containing these two non-zero entries. If any of them is all-zero column, add its index into the discarding states list. Repeat step 2 for the new added indices in the list until no more such state can be found. Step 3: Remove all the columns and rows in the discarding list from the transition matrix. C. Lumping phase The goal of lumping is to group the states, i.e., make a partition, so that all the states in the same group have the probability of PI / (1 - PI) to move to states in a single group. The lumping phase can further decrease the size of the Markov chain. The Markov chain applied in our work is in a special case: each state has only two possible successor states, and there is a unique probability PI for all states. Because of this, the Markov chain is equivalent to a deterministic finite automaton (DFA) with 2 l states and 2 inputs that correspond to PI and 1 - Pl. The DFA minimization problem is a classical one in the field of computer science that has been extensively studied. Since the goal of lumping the states in our special Markov chain is exactly the same as the purpose of DFA minimization, we can directly adopt the algorithm for the lumping phase. The essence of the DFA algorithm is to first partition the states according to their outputs (in our case alarm or no-alarm). Then the groups are repeatedly refined based on the successor states on a given input for the states in the same group. If the successor states are in a single group, no further partition is required; otherwise partition the group to subgroups whose successor states are in a single group. When no further partition is needed, all states in the same group can be lumped to one state. By introducing the "process the smaller half' strategy, reference [15] speeded up the algorithm to an 0 (n log( n)) time bound, where n is the number of states of the automaton. The reason why we need to reduce the size of the transition matrix is that as the length of the delay-timer increases the size of the transition matrix increases exponentially. Large transition matrices cause great computational burden for invariant vectors, necessary for FAR and MAR calculation. We conjecture that the algorithm can reduce the size of the transition from 2 l to C:;l -1 + C: 1-1; where C represents combination. It is easy to find out the upper bound of the ratio of reduced size to full size by using Sterling's formula. The ratio converges to zero in the rate of 0(Z-0.5). Although it still means a large computational burden, the total computational complexity of the reduction algorithm and the eigenvector computation of the reduced transition matrix is greatly reduced compared with directly computing the eigenvector of the full size transition matrix. VI. COMPARATIVE ANALYSIS The receiver operating characteristic (ROC) curve is the plot of two alternatives when the decision variable is changed [16]. In our context, the ROC curve is the plot of FAR vs MAR when the threshold changes. It is widely used in signal detection theory for graphical sensitivity analysis. For 6683

Fig. 7. (a) Changes in ROC curves for corresponding changes in the threshold for n = 2 to 5, nl = 2; (b) ROC curves for smaller range of threshold changes simplicity of analysis only on-delay timers are considered in this section. However the analysis will be valid for the off-delay case as well. In Fig. 7(a), an ROC plot is shown for generalized delay-timers. Generally different costs are associated with false and missed alarm rates. Selection of the appropriate point on the ROC curve depends on these costs. Usually a weighted combination of false and missed alarm rates are used to include these associated costs in the design. For simplicity, if (FAR)2 + (MAR)2 is used as the performance index, then the optimum point on the ROC curve will be the point closest to the origin; and the corresponding threshold of the optimum point in the ROC curve will be the optimum operating threshold. In that case, if the threshold is set higher than the optimum point, false alarms will decrease at the cost of increased missed alarms. On the other hand, setting the threshold lower than the optimum point will result in smaller number of missed alarms but more false alarms. A process variable is considered with Gaussian distribution where fault-free data has mean 0, variance 2 and faulty data has mean 2, variance 2. The data length is 1000 for both fault-free and faulty section; 1 sec uniform sampling is used. The threshold is varied from minimum of the process data to the maximum with 0.05 increment. In Fig. 7(a), the corresponding ROC curves are obtained for different delaytimers (nl == 2, n == 2-5). It is normally desirable to minimize both FAR and MAR which account for accuracy of the alarm design; better accuracy indicates lower false and missed alarm rates. It can be seen from the figure that with use of the conventional 2 out of 2 samples delay-timer, the ROC curve is closer to the origin. But when generalized nl out of n idea is used, the curves move away from the origin. Which indicates that increasing of n with fixed nl decreases the accuracy of alarm systems. In other words, here conventional delay-timers cause lower FAR and MAR compared to the generalized cases, which is TABLE II COMPARISON OF SENSITIVITY FOR 2 OUT OF n CASE On-delay timer FAR/MAR Threshold Changes (%) 1 2 2 out of 2 FAR 5.6 0.6 89.3 + MAR 42.5 75.3 82.8 t 2 out of 3 FAR 8.4 1.1 86.9 + MAR 39.7 70.3 77.1 t 2 out of 4 FAR 10 1.5 85.0 + MAR 39.1 68.5 75.2 t 2 out of 5 FAR 11 1.8 83.6 + MAR 39 67.7 73.6 t expected. However, the following observation shows that a generalized nl out of n on-delay timer gives a less sensitive design compared to the conventional delay-timer. Consider the ROC curves in Fig. 7(b) which are obtained for the same set of data with the threshold changing from 1 to 2. It can be seen from Table II that as the threshold is moving from 1 to 2, for the 2 out of 2 samples on-delay, FAR is decreased by 89.3%; whereas for 2 out of 5 samples case, the decrease is 83.6%. This indicates the conventional delaytimer is more sensitive to changes in threshold compared to the generalized one. This sensitivity is important as small changes in the distribution of normal and abnormal data, which may be regarded as equivalent to small changes in the threshold, can cause a significant deviation in FAR and MAR; such sensitivity is not desirable in practice. VII. CONCLUSIONS AND FUTURE WORK Generalized delay-timers provide more flexibility in alarm design. In this work, two performance specifications of alarm delay-timers, namely, the FAR and MAR, are calculated for generalized delay-timers. Some preliminary sensitivity analysis and comparison of performance of the generalized and conventional frameworks are also discussed. As a future extension, further study on the sensitivity advantage of generalized delay-timers is required by a systematic approach. REFERENCES [1] c. Perrow, Normal Accidents: Living with High-Risk Technologies, Princeton University Press, New Jersey, 1999. [2] J. Ahnlund, T. Bergquist, and L. Spaanenburg, Rule-based reduction of alarm signals in industrial control, Journal of Intelligent and Fuzzy Systems, vol. 14, no. 2, pp. 73-84, 2003. [3] 1. Izadi, S. L. Shah, D. S. Shook, S. R. Kondaveeti, and T. Chen, A frameworkfor optimal design ofalarm systems, 7th IFAC Symposium on Fault Detection, Supervision and Safety of Technical Processes, pp. 651-656, June 30 - July 3, 2009, Barcelona, Spain. [4] International Society of Automation, Management of Alarm Systems for the Process Industries, ANSIIISA 18.2, 2009. [5] Engineering Equipment and Materials Users Association (EEMUA), Alarm Systems-A Guide to Design, Management and Procurement, EEMUA Publication 191, 2007. [6] F. Dahlstrand, Consequence analysis theory for alarm analysis, Knowledge-Based Systems, vol. 15, no. 1-2, pp. 27-36, 2002. [7] 1. Izadi, S.L. Shah, and T. Chen, Effective resource utilization for alarm management, 49th IEEE Conference on Decision and Control pp. 6803-6808, December 15-17, 2010, Atlanta, GA. ' [8] R. Isermann, Fault-Diagnosis Systems: An Introduction from Fault Detection to Fault Tolerance, Springer-Verlag, Berlin, 2006. [9] S. X. Ding, Model-Based Fault Diagnosis Techniques: Design Schemes, Algorithms, and Tools, Springer-Verlag, Berlin, 2008. [10] N. A. Adnan, 1. Izadi, and T. Chen, Computing detection delays in industrial alarm systems, American Control Conference, pp. 786-791, June 29 - July 1, 2011, San Francisco, CA, USA. [11] 1. Izadi, S.L. Shah, D. S. Shook, and T. Chen, An introduction to alarm analysis and design, 7th IFAC Symposium on Fault Detection, Supervision and Safety of Technical Processes, pp. 645-650, June 30 - July 3, 2009, Barcelona, Spain. [12] M. Bauer, J. W. Cox, M. H. Caveness, J. J. Downs, and N. F. Thornhill, Finding the direction ofdisturbance propagation in a chemical process using transfer entropy, IEEE Transactions on Control Systems Technology, vol.15, no.l, pp. 12-21,2007. [13] G. F. Lawler, Introduction to Stochastic Processes, 2nd Ed., Chapman & Hall/CRC, 2006. [14] Statistical Quality Control Handbook, 1st Ed., Western Electric Company, Indianapolis, Indiana, 1956. [15] J. E. Hopcroft, An n log n algorithm for minimizing states in a finite automaton, Theory of Machines and Computations, pp. 189-196, 1971. [16] B. C. Levy, Principles ofsignal Detection and Parameter Estimation, Springer, 2008. 6684