Benchmarking Industry Practices for the Use of Alarms as Safeguards and Layers of Protection

Similar documents
Session Number: 3 Making the Most of Alarms as a Layer of Protection

Beyond Compliance Auditing: Drill til you find the pain points and release the pressure!

excellence in Dependable Automation ALARM MANAGEMENT

SIL DETERMINATION AND PROBLEMS WITH THE APPLICATION OF LOPA

InstrumentationTools.com

IEC61511 Standard Overview

Fire and Gas Detection and Mitigation Systems

AVOID CATASTROPHIC SITUATIONS: EXPERT FIRE AND GAS CONSULTANCY OPTIMIZES SAFETY

Options for Developing a Compliant PLC-based BMS

AVOID CATASTROPHIC SITUATIONS: EXPERT FIRE AND GAS CONSULTANCY OPTIMIZES SAFETY

Alarm Management Standards Are You Taking Them Seriously?

Alarms play a significant role in maintaining plant

Assessment of the Safety Integrity of Electrical Protection Systems in the Petrochemical Industry

Technical Paper. Functional Safety Update IEC Edition 2 Standards Update

Safety Instrumented Systems Overview and Awareness. Workbook and Study Guide

Effective Alarm Management for Dynamic and Vessel Control Systems

Process Safety - Market Requirements. V.P.Raman Mott MacDonald Pvt. Ltd.

New requirements for IEC best practice compliance

Closing the Holes in the Swiss Cheese Model Maximizing the Reliability of Operator Response to Alarms

100 & 120 Series Pressure and Temperature Switches Safety Manual

PRIMATECH WHITE PAPER CHANGES IN THE SECOND EDITION OF IEC 61511: A PROCESS SAFETY PERSPECTIVE

Q&A Session from Alarm Management Workflow Webinar (Apr.24/2013)

ADIPEC 2013 Technical Conference Manuscript

BRIDGING THE SAFE AUTOMATION GAP PART 1

USER APPROVAL OF SAFETY INSTRUMENTED SYSTEM DEVICES

Practical Methods for Process Safety Management

White Paper. Integrated Safety for a Single BMS Evaluation Based on Siemens Simatic PCS7 System

Managing the Lifecycle of Independent Protection Layers

Economic and Effective Alarm Management

Safety Integrity Verification and Validation of a High Integrity Pressure Protection System to IEC 61511

Session Four Functional safety: the next edition of IEC Mirek Generowicz Engineering Manager, I&E Systems Pty Ltd

ANALYSIS OF HUMAN FACTORS FOR PROCESS SAFETY: APPLICATION OF LOPA-HF TO A FIRED FURNACE. Paul Baybutt Primatech Inc. and

Alarm Management Services

2015 Functional Safety Training & Workshops

Digital EPIC 2 Safety manual

Alarm Management Reflections

SIL Safety Guide Series MS Single-Acting Spring-Return Hydraulic Linear Actuators

Safety Transmitter / Logic Solver Hybrids. Standards Certification Education & Training Publishing Conferences & Exhibits

Alarm System Performance Metrics

The SIL Concept in the process industry International standards IEC 61508/ 61511

Functional Safety: the Next Edition of IEC 61511

2012 Honeywell Pacific Users Group. Sus tain.ability.

Integrated but separate

IEC PRODUCT APPROVALS VEERING OFF COURSE

DynAMo Alarm & Operations Management

Value Paper Authors: Stuart Nunns CEng, BSc, FIET, FInstMC. Compliance to IEC means more than just Pfd!

innova-ve entrepreneurial global 1

Safety Instrumented Systems

FUNCTIONAL SAFETY IN FIRE PROTECTION SYSTEM E-BOOK

LOPA. DR. AA Process Control and Safety Group

United Electric Controls One Series Safety Transmitter Safety Manual

SCADA ALARM MANAGEMENT. Tim Okely. GWMWater

Process Safety Workshop. Avoiding Major Accident Hazards the Key to Profitable Operations

Alarm Services. Introduction. Benefits. Service Data Sheet October Know and improve your alarm performance

Overfill Prevention Control Unit with Ground Verification & Vehicle Identification Options. TÜVRheinland

Failure Modes, Effects and Diagnostic Analysis

Alarm Rationalization

Implementing Safety Instrumented Burner Management Systems: Challenges and Opportunities

Pushing Process Limits Without Compromising Safety

Safety in the process industry

Alarm Management Plan

New Developments in the IEC61511 Edition 2

Enhance Alarm Management

Using HAZOP/LOPA to Create an Effective Mechanical Integrity Program

The Role of Engineering Judgement in Fire and Gas (F&G) Mapping

Key Topics. Steven T. Maher, PE CSP. Using HAZOP/LOPA to Create an Effective Mechanical Integrity Program. David J. Childs

Strathayr, Rhu-Na-Haven Road, Aboyne, AB34 5JB, Aberdeenshire, U.K. Tel: +44 (0)

Table of Contents PART I: The History and Current Status of the Industrial HMI PART II: Fundamentals of HMI Design and Best Practices

User s Manual. YTA110, YTA310, YTA320, and YTA710 Temperature Transmitters. Manual Change No

Understanding and Applying the ANSI/ISA 18.2 Alarm Management Standard

Kevin Brown and Chris Stearns

Australian Standard. Functional safety Safety instrumented systems for the process industry sector

excellence in Dependable Automation

Is your current safety system compliant to today's safety standard?

Where Process Safety meets Machine Safety

Integrating Control and Safety: Where to draw the line.

Improvements in Transmission Control Center Alarm Management Practices

Alarm Management for Pipelines

DeltaV Analyze. Introduction. Benefits. Continuous automated DeltaV System alarm system performance monitoring

The Top 10 Worst Performing Alarm Systems in Industry

SAFETY MANUAL. PointWatch Eclipse Infrared Hydrocarbon Gas Detector Safety Certified Model PIRECL

Sustain.Ability. Alarm Management: Be Pro-active, not Re-active Honeywell Users Group Europe, Middle East and Africa. Tyron Vardy, Honeywell

Safety lnstrumentation Simplified

Failure Modes, Effects and Diagnostic Analysis

Where Technology Shapes Solutions. Alarm management : Wasn t that problem already solved years ago?

Functional Safety Solutions

Alarm Management for SCADA control rooms

Product introduction Layers of Protection Layer 3: Safety System Instrumented & Mechanical. Layer 2: Alarms Manual action needed

Safety Instrumented Systems The Smart Approach

SAFETY MANUAL. Electrochemical Gas Detector GT3000 Series Includes Transmitter (GTX) with H 2 S or O 2 Sensor Module (GTS)

Communication and Coordination Failures in the Process Industries

Addressing Challenges in HIPPS Design and Implementation

The Use of an Operator as a SIL 1 component in a Tank Overfill Protection System

Failure Modes, Effects and Diagnostic Analysis. Rosemount Inc. Chanhassen, Minnesota USA

IMPLEMENTING PROCESS SAFETY KPI SUITE AT A WORLD-SCALE HFO UNIT

Certification Report of the ST3000 Pressure Transmitter

Managing Alarms to Support Operational Discipline

Tom Miesner Principal Pipeline Knowledge & Development

Session Ten Achieving Compliance in Hardware Fault Tolerance

Alarm Management. Version Prepared by: Michael Davis- Hannibal. Softcon Software Control Services (Pty) Ltd.

Transcription:

Benchmarking Industry Practices for the Use of Alarms as Safeguards and Layers of Protection Todd Stauffer, PE exida Consulting 64 N. Main Street, Sellersville, PA tstauffer@exida.com Dr. Peter Clarke, CFSE exida Asia Pacific Pte Ltd 51 Goldhill Plaza, #21-08/09, Singapore peter.clarke@exida.com Copyright exida 2013, all rights reserved. Distributed by AIChE with permission of the authors Prepared for Presentation at American Institute of Chemical Engineers 2013 Spring Meeting 9th Global Congress on Process Safety San Antonio, Texas April 28 May 1, 2013 UNPUBLISHED AIChE shall not be responsible for statements or opinions contained in papers or printed in its publications

Benchmarking Industry Practices for the Use of Alarms as Safeguards and Layers of Protection Todd Stauffer, PE exida Consulting 64 N. Main Street, Sellersville, PA tstauffer@exida.com Dr. Peter Clarke, CFSE exida Asia Pacific Pte Ltd Keywords: Alarm Management, ISA-18.2, Independent Protection Layers, Alarm Rationalization, Safety IPL Alarms, Operator PFD, Operator response to alarms, Safeguards, PHA, LOPA Abstract Operator response to alarms is a common risk reduction mechanism considered during layer of protection analysis (LOPA). Industry practices on how to treat alarms as independent protection layers can vary greatly. For example, some companies do not allow any credit to be taken for alarms during a LOPA (zero risk reduction), while others allow up to two orders of magnitude (risk reduction factor of 100, SIL 2) to be taken. This paper discusses current industry practices around the use of alarms as safeguards and layers of protection as established by a recent benchmark survey of over 200 safety practitioners from around the world. Areas explored in the survey include: typical and maximum claimed risk reduction, considerations used to determine whether an alarm can be credited with risk reduction, how often IPL alarms are determined to be invalid or ineffective in operation, and practices for display and annunciation through a Human-Machine Interface (HMI). Key results and conclusions are presented as well as recommendations on where industry should focus on improvement. 1. Introduction Alarms and operator response to them are one of the first layers of protection in preventing a plant upset from escalating into a hazardous event. When alarms fail as a layer of protection, catastrophic accidents, such as Milford Haven (UK), Texas City (USA), and Buncefield (UK) can be the result. At the Buncefield Oil Depot, a failure of a tank level sensor prevented its associated high level alarm from being annunciated to the operator. As the level in the tank reached its ultimate high level, a second protection layer, an independent safety switch, failed to trigger an alarm to notify the operator and failed to initiate a trip which would have automatically shut off the incoming flow. The tank overflow and ensuing fire resulted in a 1 billion (1.6 billion USD) loss [1]. Treatment of alarms used as safeguards and protection layers has become an increasingly important topic for companies and regulatory agencies alike. For example, OSHA s Refinery

National Emphasis Program includes provision for citing a refinery if they claim an ineffective alarm as a safeguard or if the alarm design and implementation does not comply with RAGAGEP (Recommended and Generally Accepted Good Engineering Practice) [2]. The standard ANSI/ISA-18.2, Management of Alarm Systems for the Process Industries (ISA- 18.2) provides guidance on how to design, engineer, implement and maintain an alarm system [3]. It is considered RAGAGEP by OSHA, so following its requirements and recommendations is critical for safety practitioners that want to use alarms as a layer of protection. This paper documents the results from a survey that was conducted to benchmark the current practices used in industry for the management of safety-critical alarms (those that are used as safeguards and/or independent protection layers). The purpose of the paper is to allow companies to compare their own practices against industry benchmarks and best practices, as well as to highlight areas where companies can improve. 2. Survey Demographics The survey was conducted over the period September 24 th October 5 th, 2012. A total of 225 respondents participated in the survey, which consisted of a series of 26 questions. Relevant results are analyzed and presented for the three largest demographic groups described below in order to highlight differences based on region or industry. Table 1. Survey Demographics # Region % of Respondents Industry % of Respondents 1 North America 30% Oil & Gas 55% 2 Europe 25% Chemical 23% 3 Asia Pacific 18% Engineering & Consulting 10% 3. Process Hazard Analysis (PHA) Process Hazard Analysis (PHA) is a required activity of the IEC 61511 standard on functional safety and the OSHA Process Safety Management (PSM) regulation [4, 5]. There are numerous different techniques that can be used to perform hazard analysis, including What-If, Checklist, Hazard and Operability Study (HAZOP), and Failure Modes & Effects Analysis (FMEA). The HAZOP technique is one of the most commonly used in the process industry [6]. Some of the survey questions are specific to the use of the HAZOP method while others are generic in nature.

3.1 Alarms Identified as Safeguards Survey respondents answered the following question: Estimate the number of different alarms in your system that are typically identified as a Safeguard or Recommendation during the Process Hazards Analysis (PHA) process? Number of Alarms that are Safeguards / Recommendations 21.6% 24.6% 25.1% 15.8% 7.6% 1.8% None (0) <10 11-50 51-100 101-500 >500 Figure 1. Number of Alarms that are Safeguards or Recommendations Number of Safeguards / Recommendations in a System - By Industry 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Chemical Engineering & Consulting Oil & Gas >500 101-500 51-100 11-50 <10 None (0) Figure 2. Alarms as Safeguards by Industry Figure 1 shows that the majority of respondents (>65%) indicated that they have more than 50 alarms in their system that were identified as safeguards / recommendations during a PHA.

Figure 2 shows that the number of alarms identified as safeguards varies considerably by industry. In oil & gas, 73% of the respondents identify more than 50 alarms as safeguards in their system, whereas only 55% for chemical. This can be partly attributed to the size of the respective systems; the most common system size for respondents in the chemical industry was 2,000-5,000 I/O, whereas it was 5,000-10,000 I/O for those in oil & gas. 3.2 Analysis of HAZOP Cause / Consequence Pairs Survey respondents answered the following question: Estimate what percentage of cause / consequence pairs (in a Hazard and Operability Study) call for the use of an alarm as safeguard or recommendation? 23.8% Percent of HAZOP Cause / Consequence Pairs that call for the use of an Alarm 19.4% 20.0% 23.1% 13.8% <5% 5-15% 16-25% 26-50% >50% Figure 3. Percent of Cause / Consequence Pairs that call for the Use of An Alarm Percent of Cause:Consequence Pairs that call for an Alarm - by Industry 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Chemical Engineering & Consulting Oil & Gas >50% 26-50% 16-25% 5-15% <5% Figure 4. Percent of Cause / Consequence Pairs By Industry

Figure 3 shows that the responses were relatively evenly distributed between the five choices. This indicates that there is significant variation in the percentage of HAZOP cause / consequence pairs that call for the use of an alarm. The same number of respondents answered < 5% as did 26-50%. One likely explanation for these results is variation in how the PHA / HAZOP process is carried out from company to company and the rigor with which all potential safeguards are documented. Figure 4 and Table 2 shows that there is also significant variation by industry. On one end of the spectrum, 36% of the respondents in the chemical industry answered that <5% (a small minority) of cause consequence pairs call for an alarm, compared to only 7% in engineering and consulting. At the opposite end of the spectrum 33% of engineering and consulting respondents indicated that >50% (a majority) of cause consequence pairs call for an alarm, versus only 3% for chemical. Table 2. Disparity in Alarms as Percentage of Cause / Consequence Pairs Industry % of Cause / Consequence Pairs that call for an alarm < 5% (small minority) > 50% (majority) Chemical 36% 3% Engineering & Consulting 7% 33% Oil & Gas 20% 15% 3.3 Steps to Ensure Alarms Identified in a PHA are Valid and Effective Survey respondents answered the following question: When an alarm is identified as a safeguard or recommendation during a PHA, what steps are typically taken to ensure that it is a valid and effective alarm? Check all that apply.

When an alarm is identified as a safeguard or recommendation during a PHA,what steps are typically taken to ensure that it is a valid and effective alarm? % of Responses 90.0% 80.0% 70.0% 60.0% 50.0% 40.0% 30.0% 20.0% 10.0% 0.0% Discuss / document the operator s response (action) to the alarm Discuss / document whether the operator has sufficient time to Define the basis for the alarm setpoint (limit) Verify that the alarm is independe nt from the cause Discuss / document operator training relative to the alarm Verify the operator response does not place him / her in danger Figure 5. Steps to Ensure Alarms are Valid and Effective Discuss / document alarm mechanical integrity requireme nts Series1 83.5% 70.6% 64.7% 62.9% 52.4% 47.1% 34.1% 2.9% None Figure 5 documents the steps that are taken to ensure that an alarm which is identified as a safeguard or recommendation is valid and effective. Best practices such as those documented in ISA-18.2, by exida, and by the Center for Chemical Process Safety (CCPS) would suggest that the following activities at a minimum should be performed for an alarm that is used as a safeguard: Discuss / document the operator s response (action) to the alarm According to ISA- 18.2, if the alarm does not require an operator action, then it should not be considered a valid alarm. During the rationalization process, each alarm is subjected to this review [3]. Discuss / document whether the operator has sufficient time to respond This is another criterion which is reviewed during the rationalization process. If an operator does not have sufficient time to respond to prevent the consequences, then the alarm will not be effective and should not be considered a safeguard [6, 7, 8]. Verify that the alarm is independent from the cause This must be TRUE for the alarm to be considered a valid Independent Protection Layer, so it would make sense that it should also be applied to a safeguard when appropriate [6, 7, 8]. Verify the operator response does not place him / her in danger If the operator s response to the alarm places them in danger, then the alarm should not be considered a safeguard. The survey indicated that over half the respondents (52.9%) do not apply this criterion [8]. If the four criteria described above are accepted as best practice, then 100% of the respondents should have indicated that these steps are taken. Instead only 83.5%, 70.6%, 62.9% and 47.1% respectively indicated that they follow these best practices. Thus there is a gap between the actual practices used in industry versus those that are recommended and accepted as best

practices. By not applying alarm management best practices upfront during the PHA / HAZOP process it is more likely that some of the alarms identified as safeguards will be proven to be invalid / ineffective during alarm rationalization or operation. Figure 6. Steps to Ensure Alarms are Valid and Effective By Region Figure 7. Steps to Ensure Alarms are Valid and Effective By Industry

Figures 6 and 7 present the results based on region and industry. In Figure 6 the percent of North American respondents which indicated that they discussed mechanical integrity (MI) requirements (45%) was significantly higher than Europe (28%) and Asia Pacific (38%). This is likely from the strength of OSHA in the US in driving compliance to their Process Safety Management (PSM) regulation 1910.119 which includes requirements for the creation of a mechanical integrity program (a management system assuring equipment is inspected, maintained, tested and operated in a safe manner)[5, 7]. Of interest in Figure 7 is that engineering & consulting and chemical had higher response scores (greater compliance to best practices) than oil & gas for all categories except for one. This indicates that the understanding, acceptance and adoption of best practices may be higher here than in oil & gas. 3.4 Treatment of PHA Results Survey respondents answered the following question: After the PHA or HAZOP has been completed, what is done with the requirements for alarms identified as safeguards or recommendations? Check all that apply. After the PHA or HAZOP has been completed, what is done with the requirements for alarm s identified as safeguards or recommendations? Check all that apply. 70.0% 60.0% 50.0% 40.0% 30.0% 20.0% 10.0% 0.0% They are available for review during alarm rationalizatio n and design Management of Change (MOC) process is initiated They are extracted manually by reviewing all PHA reports They are transferred automatically to a Master Alarm Database so that they are available during alarm rationalizatio n and design Figure 8. Treatment of PHA Results They are automatically extracted into a spreadsheet Response Percent 59.3% 51.5% 42.5% 27.5% 18.6% 5.4% None

The survey indicates that only 59.3% of respondents make the PHA results available during alarm rationalization. This figure should be 100% as alarm rationalization will revisit some of the topics that are covered during the PHA (establishing likely causes and consequences). This will improve the efficiency of the alarm rationalization and help ensure consistency between the alarm design and the PHA. 4. Layer of Protection Analysis (LOPA) Layer of Protection Analysis is one of the most commonly used techniques for risk analysis. It is a method of analyzing the likelihood (frequency) of a harmful outcome event based on initiating event frequency and on the probability of failure of a series of independent protection layers capable of preventing the harmful outcome [6]. The primary goal of a LOPA is to determine if there are adequate protective devices or features in the process to produce a tolerable risk level. These protective devices or features are called Protection Layers or Independent Protection Layers (IPLs). Examples of potential protection layers include the mechanical integrity of a vessel, control loops and trips within the basic process control system (BPCS), operator intervention, a safety instrumented function, and physical relief devices. It is important to note the difference between a safeguard and a layer of protection. A safeguard is any device, system or action that would likely interrupt the chain of events following an in initiating event. The benefit of some safeguards may not be able to be easily quantified because of lack of data, or uncertainty of whether it meets specific criteria such as independence, effectiveness, and auditability. An independent protection layer is a safeguard whose effectiveness can be quantified and which meets well-defined criteria. All IPLs are safeguards, but not all safeguards are IPLs [8]. 4.1 Origin of Alarms Identified in a LOPA Survey respondents answered the following question: What percentage of the alarms that are considered during a Layer of Protection Analysis (LOPA) were identified during a PHA?

What percentage of the alarms that are considered during a Layer of Protection Analysis (LOPA) were identified during a PHA 33.6% 12.4% 22.6% 17.5% All (approximately 100%) 75-99% 50-74% <50% Figure 9. Percentage of LOPA Alarms Originating during a PHA After the process hazards analysis has been completed, the results and recommendations are reviewed to determine which scenarios require further analysis to determine if there are adequate layers of protection, or if safety instrumented functions (SIF) will be needed to properly manage the risk. One industry reference defines a safeguard as a potential protection layer that has yet to be evaluated in a LOPA to determine effectiveness and independence [6]. Thus it would be expected that ideally 100% of the alarms that are considered in a layer of protection analysis would have first been identified as a safeguard / recommendation in the PHA. Figure 10 shows that this is far from the case in practice. Only 12.4% of the respondents indicated that all (100%) of the alarms in the LOPA had come from the PHA. Furthermore, 33.6% indicated that less than 50% of the time was the LOPA alarm identified during the PHA. It is certainly possible that the LOPA may legitimately identify some alarms that were not considered during PHA. Allowing for this, one could consider the 75-99% response as acceptable. This leaves 51% of the respondents which appear to frequently identify new alarms during LOPA that were missed during the PHA. This would seem to indicate poor PHA practices are being used. Failing to identify alarms during a PHA could signal various issues, such as a lack of thoroughness, lack of documenting all safeguards in order to save time, or a lack of understanding of the process.

100% Percentage of LOPA Alarms identified during a PHA - by Region 90% 80% 70% 60% 50% 40% 30% <50% 50-74% 75-99% All (approximately 100%) 20% 10% 0% North America Europe Asia Pacific Figure 10. Percentage of LOPA Alarms Originating during a PHA By Region The results in Figure 10 demonstrate a significant disparity in the quality of the PHAs being conducted based on region. The percentage of poor PHA applications ranged from 41% in North America to 79% in Asia Pacific (where poor is defined as those that answered <50% or 50-74%). Note that the variation was much less significant when analyzed by industry. 4.2 Typical Risk Reduction for a Safety IPL Alarm Survey respondents answered the following question: What level of risk reduction (RRF) do you typically take for a Safety IPL alarm? The effectiveness of an independent protection layer is typically characterized by assigning a probability of failure on demand (PFD), which is defined as the probability that it will fail to perform a specified function when called upon [8]. The risk reduction factor (RRF), which is a measure of how much a protective function reduces the frequency of the hazardous event, is the inverse of PFD [7]. RRF= 1 / PFD [Eq. 1]

What level of risk reduction (RRF) do you typically take for a Safety IPL alarm 43.0% 20.0% 10.4% 14.8% 3.0% 1.0 (no risk reduction) Up to 2.0 2.0-9.9 10.0 >10.0 Figure 11. Typical Level of Risk Reduction for a Safety IPL Alarm The level of risk reduction that can be taken for a Safety IPL Alarm (an alarm used as an independent protection layer) is an area of debate in the safety community. The debate originates because of the significant disparity that exists from plant to plant, unit to unit and person to person in the ability of an operator to prevent a hazardous situation form developing into an accident. Figure 11 shows that the most common risk reduction factor taken for a Safety IPL Alarm is 10.0 (with 43% of the respondents). This corresponds to the risk reduction that is most commonly cited in the literature [4, 8]. Table 3 shows the correspondence between RRF, PFD, and Safety Integrity Level (SIL). It should also be noted that 10 % of the respondents claim no risk reduction for a Safety IPL alarm, while 20% claim a RRF between 2.0 and 9.9. Table 3. Correspondence between RRF, PFD and SIL [4] Risk Reduction Factor (RRF) Probability of Failure on Demand (PFDavg) Safety Integrity Level (SIL) potentially achievable 10 10-1 to 10 0 SIL 0 > 10 to 100 10-2 to < 10-1 SIL 1 > 100 to 1,000 10-3 to < 10-2 SIL 2 > 1,000 to 10,000 10-4 to < 10-3 SIL 3

100% Typical Risk Reduction (RRF) for a Safety IPL Alarm - by Region 90% 80% 70% 60% 50% 40% 30% Risk Reduction Factor (RRF) >10.0 10.0 2.0-9.9 Up to 2.0 1.0 (no risk reduction) 20% 10% 0% North America Europe Asia Pacific Figure 12. Typical Level of Risk Reduction for a Safety IPL Alarm By Region 100% Typical Risk Reduction (RRF) for a Safety IPL Alarm - by Industry 90% 80% 70% 60% 50% 40% 30% 20% Risk Reduction Factor (RRF) >10.0 10.0 2.0-9.9 Up to 2.0 1.0 (no risk reduction) 10% 0% Chemical Engineering & Consulting Oil & Gas Figure 13. Typical Level of Risk Reduction for a Safety IPL Alarm By Industry

Figures 12 and 13 show that the level of risk reduction varies considerably by region and by industry. For example in North America a clear majority (72%) use an RRF of 10, while in Asia Pacific only 30% use an RRF of 10. In Asia Pacific, a large percentage of the respondents are either very conservative (26% claim no risk reduction) or very aggressive (13% claim a risk reduction greater than 10). It is also interesting to note that numerous respondents in the Engineering & Consultancy sector claimed to use RRFs that are not powers of 10 (i.e. in the 2.0~9.9 range). This suggests that quantitative LOPA techniques which can make use of such RRF values may be used frequently within this sector. Risk reduction factors greater than 10.0 (PFD < 0.1) should be used sparingly if ever for Safety IPL alarms. As shown in Table 4, there are very few situations when it would be appropriate to use such a value. When it is believed to be appropriate, it is necessary to document a sound technical basis for that conclusion. Table 4 Simplified Technique for Estimating Operator Response [6] Category Description Probability PFD RRF that Operator responds successfully 1 Normal Operator Response In order for an operator to respond 90% 0.1 10 normally to a dangerous situation, the following criteria should be true: Ample indications exist that there is a condition requiring a shutdown Operator has been trained in proper response Operator has ample time (> 20 minutes) to perform the shutdown Operator is ALWAYS monitoring the process (relieved for breaks) 2 Drilled Response All of the conditions for a normal operator intervention are satisfied and a drilled response program is in place at the facility. Drilled response exists when written procedures, which are strictly followed, are drilled or repeatedly trained by the operations staff. The drilled set of shutdowns forms a small fraction of all alarms where response is so highly practiced that its implementation is automatic This condition is RARELY achieved in most process plants 99% 0.01 100 3 Response Unlikely / Unreliable NOT ALL of the conditions for a normal operator intervention probability have been satisfied 0% 1.0 1

Some alarm management practitioners have proposed that even a risk reduction factor of 10 should not be applied blindly without ensuring that specific alarm management requirements are / will be met, such as the following: The alarm system must be rationalized. Alarm system performance must be measured and proven to be adequate (based on industry-accepted KPIs) [9]. 4.3 Maximum Risk Reduction for a Safety IPL Alarm Survey respondents answered the following question: In your experience, what is the maximum level of risk reduction (RRF) that has been taken for a Safety IPL alarm? In your experience, what is the maximum level of risk reduction (RRF) that has been taken for a Safety IPL alarm? 48.1% 8.1% 11.9% 10.4% 10.4% 2.2% 1.0 (no risk reduction) Up to 2.0 2.0-9.9 10.0 100.0 >100.0 Figure 14. Maximum Risk Reduction for a Safety IPL Alarm Figure 14 shows that a risk reduction factor of 10 was again the most popular response (48%). It is interesting to note that the percent of respondents that indicated 10.0 was the maximum risk reduction taken (48%) was slightly greater than the amount that indicated 10.0 was the typical value taken (43%) from the previous question. It also of interest to note that 12.6% of the respondents indicated a maximum RRF of 100.0 or greater.

4.4 Considerations for Determining When an Alarm Can be Credited with Risk Reduction Survey respondents answered the following question: What considerations are used to determine whether an alarm can be credited with risk reduction? Check all that apply. 80.0% 70.0% 60.0% 50.0% 40.0% 30.0% 20.0% 10.0% 0.0% What considerations are used to determine whether an alarm can be credited with risk reduction The alarm is completely independent from the cause of the upset The alarm is auditable (proof tested at appropriate frequency) The operators have been trained on the causes, potential consequence s, and corrective actions for the alarm The alarm is specifically designed to prevent the consequence s under consideratio n by the operator There is not more than one alarm credited with risk reduction per layer of protection The alarm is dependable (based on calculating the Probability of Failure on Demand for the annunciation of the alarm and successful Alarm system performance (# of alarms / per hour, nuisance alarms, alarm floods) is measured and determined to be acceptable All alarms in the system (safety and non-safety) have been rationalized Series1 73.3% 67.9% 63.4% 59.5% 48.9% 42.7% 38.9% 32.1% Figure 15. Considerations for Determining When an Alarm can be Credited with Risk Reduction

The general criteria for determining when a safeguard can be considered an IPL are well established in the literature and include the following: Table 5 Survey of General Criteria Used to Determine when a Safeguard can be used as an IPL Layer of Protection Analysis [8] Practical SIL Target Selection [6] Guidelines for Safe & Reliable Instrumented Protective Systems [7] Independent Independent Independence Auditable Auditable Auditability Effective Dependable Reliability, Integrity Specific Functionality Access Security Management of Change The generic criteria above have been used to create specific considerations that should be taken into account to ensure that an alarm can be credited with risk reduction (represented in Figure 15). A more detailed discussion about criteria applied to alarms can be found in Appendix A. The presence of nuisance alarms which are alarms that annunciate excessively, unnecessarily, or do not return to normal after the correct response is taken can interfere with the operator s ability to detect and respond to safety IPL alarms. Standing alarms (lasting > 24 hours) and chattering alarms (points that go needlessly in and out of alarm on a frequent basis) are nuisance alarms that clutter the operator s display making it more difficult to detect a new alarm and increasing the chances that they might miss a critical alarm. Alarm rationalization, which is the process of reviewing potential or existing alarms to justify that they meet the criteria for being an alarm, is a technique for ensuring the integrity of the alarm system and eliminating problems such as nuisance alarms, alarm overload and alarm floods. It includes defining and documenting the design attributes (such as priority, limit, type and classification) as well the cause, consequence, time to respond, and recommended operator response. Since all of the criteria shown in Figure 15 have been cited as recommended best practices in the literature, it can be concluded that a large portion of safety practitioners are NOT following industry recommended practices (else the scores would be close to 100% for each consideration). 4.5 Invalid & Ineffective Safety IPL Alarms Survey respondents answered the following question: How often do you find that an alarm identified as an IPL is not valid, or is ineffective (does not provide the level of risk reduction expected)?

How often do you find that an alarm identified as an IPL is not valid, or is ineffective (does not provide the level of risk reduction expected)? 38.9% 26.0% 17.6% 14.5% 4.6% Never (0% of the time) Infrequently (< 1% of the Safety IPL Alarms) Sometimes (between 1 to 5 % of the Safety IPL Alarms) Frequently (> 5% of the Safety IPL Alarms) Unknown Figure 16. Frequency of Ineffective Safety IPL Alarms Figure 16 shows how often a Safety IPL Alarm is found to be ineffective at providing the expected level of risk reduction. 65% of the respondents indicated that sometimes / frequently they find that an alarm is an ineffective IPL. This could create a situation where the actual risk reduction no longer meets or exceeds the company-defined tolerable risk level. Figure 17. Risk Reduction through the use of multiple protection layers [10] Figure 17 illustrates how the loss of risk reduction from an ineffective IPL alarm can have a ripple effect on the requirements for other layers of protection such as a safety instrumented function in an SIS. The higher the SIL, the more complicated and expensive is the Safety Instrumented System (SIS). A higher SIL may also require more frequent proof testing, which adds cost and can be burdensome in many plants [11].

One could surmise that this finding is partly caused by the gap in following best practices that exists as illustrated by Figure 15. A detailed discussion of failure modes of Safety IPL alarms is the subject of another paper [9]. 4.6 Prioritizing Safety IPL Alarms Survey respondents answered the following question: What statement best describes how the priority of Safety IPL alarms are assigned? 35.0% 30.0% 25.0% 20.0% 15.0% 10.0% 5.0% 0.0% What statement best describe how the priority of Safety IPL alarms are assigned Based on company defined risk matrix, taking into consideration consequence to economic, safety, environmental and Public Image aspects Based on the ultimate consequence defined in the HAZOP / PHA Automatically set to the highest priority allowed in the system (e.g. Critical, Emergency, etc) Based on the direct & immediate consequence (assuming all other layers of protection operate as expected) and the amount of time available for the operator to respond Not Applicable Figure 18. Methodology for Prioritizing Safety IPL Alarms Based on the assumption that the associated SIF and other associated IPLs fail Series1 30.2% 22.5% 21.7% 17.1% 4.7% 3.9% Alarm priority represents the importance assigned to an alarm within the alarm system to indicate the urgency of response. It helps the operator to know to which alarm to respond to first. Alarm priority is typically determined based on the severity of the potential consequences (in areas such as personnel safety, equipment damage, environmental, economic loss) and the time available to respond as shown in Table 6. Analysis of the severity of consequences is an activity that is common within the safety lifecycle. For a safety IPL alarm it is important to work with the direct (proximate) consequences and not the ultimate consequences which could occur after a series of failures [12, 13].

Table 6. Example Alarm Priority Matrix Figure 18 provides a view into how Safety IPL alarms are prioritized. As shown by Table 7, 48% of the respondents indicated that they use prioritization criteria which do not follow alarm management best practices. Prioritization Criteria % Compliance with Best Practices Based on company defined risk matrix, taking into consideration consequence to economic, safety, environmental and public image aspects 30.2% YES Based on the ultimate consequence defined in the HAZOP / PHA 22.5% NO Automatically set to the highest priority allowed in the system (e.g. Critical, Emergency, etc) 21.7% NO Based on the direct & immediate consequence (assuming all other layers of protection operate as expected) and the amount of time available for the operator 17.1% YES to respond Based on the assumption that the associated SIF and other associated IPLs fail 3.9% NO Table 7. Alarm Prioritization Results and Compliance with Best Practices

5. Human Machine Interface (HMI) Practices for Safety IPL Alarms Safety IPL alarms are communicated to the operator through the Human Machine Interface (HMI). Once the alarm is annunciated, a series of steps must be performed by the operator to prevent escalation of the hazardous scenario and bring the process back to the normal operating range (reference) as shown in Figure 19. Figure 19 Feedback Model of Operator Process Interaction [3] For a successful outcome, the operator must proceed quickly through three stages of activity: a) the deviation from desired normal operation is detected, b) the situation is diagnosed and the corrective action determined, c) the action is implemented to compensate for the disturbance. The operator also continues to monitor the measurement as it returns to normal. A well designed HMI should support situation awareness and ensure that the operator is able to quickly and repeatably detect, diagnose, and respond within the operator response time. Operator response time represents the time from the activation of the alarm until the last moment the operator action will prevent the consequence (i.e., time available) [3]. Poor graphics, including alarm depiction deficiencies, have been identified as contributing factors to several major industrial accidents (such as Buncefield). 5.1 Display of Safety IPL Alarms Survey respondents answered the following question: What statement(s) best describes your current practice for display of Safety IPL alarms? Check all that apply.

70.0% 60.0% 50.0% 40.0% 30.0% 20.0% 10.0% 0.0% What statement(s) best describes your current practice for display of Safety IPL alarms? 64.1% They are annunciated through the same HMI as the BPCS 31.3% They are annunciated through hardwired light boxes or panel boards 21.4% 20.6% 18.3% They are annunciated through light boxes or panel boards and the same HMI as the BPCS The are annunciated through dedicated HMIs Figure 20. Display of Safety IPL Alarms They are part of a standalone system Series1 64.1% 31.3% 21.4% 20.6% 18.3% Safety IPL alarms and information can be presented to the operator in a number of different ways, including: Graphic displays on the basic process control system (BPCS) operator interface, Dedicated graphic displays on stand-alone video display units, Panel mounted graphic displays, and Panel mounted annunciators. Figure 20 illustrates that a variety of architectures are used for the display of safety IPL alarms, with the most popular (64.1%) being annunciation through the same HMI as the BPCS. Selected recommended best practices for display design include the following: Lightbox alarms, which provide an independent alarm display that can be typically seen by multiple operators within the control room, should be replicated in the BPCS interface for acknowledgement and logging purposes. Lightbox annunciators should be located close to the operator s work station or work areas so that it is visible from all locations where its information would be considered important [7]. Graphic displays should be designed to maximize operator situation awareness and "pattern recognition" to aid in operator response. Graphic displays should be designed so the visibility of information is related to its operational importance; background information should be given low visibility, normal

plant measurements a medium visibility and abnormal conditions (values and states) should have the highest visibility. It is important that alarm state indications represent the presence of an alarm using not only color, but also symbols, patterns and/or text (8-12% of the male population is color blind). Alarm colors should be reserved for alarms only and not used for other functions within the HMI (such as process piping or equipment status). Alarm color coding should reflect the priority of the alarm. 5.2 Alarm Response Procedures Survey respondents answered the following question: Do you provide Alarm Response Procedures to the operator for safety IPL alarms? If Yes, please indicate the format: Alarm Response Procedures for Safety IPL alarms (Yes) - Provided 74% (No) - Not Provided 26% % Format % Table 8. Use of Alarm Response Procedures for Safety IPL Alarms Paper manuals 54% On screen display called up in context within the HMI 28% Call up files or displays on a dedicated computer (other than the HMI) 18% Alarm response procedures typically include the following information: Likely cause(s) of the alarm Potential consequences of inaction Corrective action that is required by the operator to prevent the consequence Time available to respond Confirmation / Verification of the alarm condition [9, 11]. As shown in Table 8, 26% of the respondents do not provide alarm response procedures to the operator to help them respond to Safety IPL Alarms. This is inconsistent with the practices that should be followed to ensure that the operator response is effective and reliable / dependable [6, 7, 8]. For those that do provide alarm response procedures, 54% of the respondents indicated that they are provided in paper format. The use of printed (paper) manuals can be ineffective if they are not within immediate reach of the operator, are not kept up-to-date or require significant time for the operator to locate the relevant procedure. The ability to display the alarm response procedures in context within the HMI, which was selected by 28% of the respondents, is the most effective format and should be considered a best practice [9, 11]. Best practices assert that, Operator response integrity can be improved by displaying operator action on request [7].

6. Conclusion Operator response to alarms can be used to reduce risk as a safeguard or as an independent protection layer. Survey results indicate that there is significant variation in the practices employed within industry for the management of safety-critical alarms. In some cases these variations are more significant when analyzing based on industry or region. Analysis of survey results also revealed that there is significant room for improvement when it comes to the adoption of, and compliance with, industry best practices. In particular the following areas were identified: Improving the rigor and thoroughness of PHAs so that, for example, all alarm safeguards are identified and documented Verifying that an alarm identified as a safeguard or recommendation is likely to be valid and effective Ensuring that alarms credited with risk reduction meet the criteria established for them to be independent protection layers as cited in industry best practices [6, 7, 8] Understanding the implications and guidelines for assigning a risk reduction factor or probability of failure on demand to a Safety IPL alarm Prioritizing safety IPL alarms based on the ISA-18.2 standard and alarm management best practices Consider providing operators with alarm response procedures for Safety IPL alarms in context and within the HMI Safety practitioners are encouraged to compare their own practices against the benchmark survey results and the best practices cited in this paper. This should highlight areas of improvement that can help improve the safety of the people and the processes they work with. It is also recommended that safety practitioners increase their knowledge of alarm management best practices such as those in ISA-18.2. 7. References [1] The Buncefield Incident; The final report of the Major Incident Investigation Board, Volume 2, Crown publishing, United Kingdom, (2008). [2] Occupational Health and Safety Administration (OSHA), Petroleum Refinery Process Safety Management National Emphasis Program, Directive CPL-03-00-010, Washington, DC, (2009). [3] ANSI/ISA 18.00.02-2009 Management of Alarm Systems for the Process Industries. [4] ANSI/ISA-84.00.01-2004 Part 1 (IEC 61511-1 Mod) Functional Safety: Safety Instrumented Systems for the Process Industry Sector.

[5] OSHA, Process safety management of highly hazardous chemicals, 29 CFR 1910.119, Washington, DC, (1992). [6] Hartmann, H., Scharpf, E., and Thomas, H., Practical SIL Target Selection: Risk Analysis per the IEC 61511 Safety Lifecycle, exida, Sellersville, PA, (2012). [7] CCPS. Guidelines for Safe and Reliable Instrumented Protective Systems. Center for Chemical Process Safety. New York, NY. (2007). [8] CCPS. Layer of Protection Analysis: Simplified Process Risk Assessment. Center for Chemical Process Safety. New York, NY. (2001). [9] Stauffer, T. and Clarke, P., Using Alarms as a Layer of Protection, AIChE 8 th Global Congress on Process Safety, Houston, TX (2012). [10] Hatch, D, and Stauffer, T., Operators on Alert: Operator response, alarm standards, protection layers keys to safe plants, Intech, (September 2009). [11] Stauffer, T. Making the Most of Alarms as a Layer of Protection, Safety Control Systems Conference IDC Technologies (May 2010) [12] Stauffer, T., Sands, N., and Dunn, D., Get a Life(cycle)! Connecting Alarm Management and Safety Instrumented Systems, ISA Safety & Security Symposium (2010). [13] Hollifield, B., and Habibi, E., Alarm Management A Comprehensive Guide (2 nd Edition), ISA, Research Triangle Park, NC, (2011). Additional references not cited: [14] EEMUA 191, Alarm Systems: A Guide to Design, Management and Procurement Edition 2. The Engineering Equipment and Materials Users Association (2007). [15] Nimmo, I., The Operator as IPL, Hydrocarbon Engineering, September 2005. [16] Stauffer, T., Sands, N., and Dunn, D., Alarm Management and ISA-18 A Journey, Not a Destination, Texas A&M Instrumentation Symposium (2010). [17] Suttinger, L. and Sossman, C., Operator Action within a Safety Instrumented Function, WSRC-MS-2002-00091 (2002). [18] The Explosion and Fires at the Texaco Refinery, Milford Haven, 24 July 1994, HSE Books, Sudbury, U.K. (1995). [19] BP America Refinery Explosion U.S. CHEMICAL SAFETY BOARD www.chemsafety.gov/investigations (2009).

Appendix A. Survey of Criteria for using Alarms as Layers of Protection A.1 Guidelines for Safe and Reliable Instrumented Protective Systems [7] Protection layers are known as IPLs are designed and managed to meet the following seven core attributes: Independence the performance of a protection layer is not affected by the initiating cause of a hazardous event or by the failure of other protection layers; Functionality the required operation of the protection layer in response to a hazardous event; Integrity related to the risk reduction that can reasonably be expected given the protection layer s design and management; Reliability the probability that a protection layer will operate as intended under stated conditions for a specified time period; Auditability ability to inspect information, documents and procedures, which demonstrate the adequacy of and adherence to the design, inspection, maintenance, testing and operation practices used to achieve the other core attributes; Access Security use of administrative controls and physical means to reduce the potential for unintentional or unauthorized changes; and Management of Change formal process used to review, document, and approve modifications to equipment, procedures, raw materials, processing conditions, etc., other than replacement in kind, prior to implementation. Applying the seven core attributes to alarms, allows definition of specific recommendations and best practices. Alarms should only be used when the operator is expected to take a specified action, which is covered by operating procedure. Operators should be trained on how to respond to the alarm according to a written procedure. For most hazardous events, only one protective function can be claimed in the supervisory layer, irrespective of the number of indications or alarms. For an alarm to be classified as an IPL, it must meet the following three criteria: The alarm is independent of the initiating cause and other protective layers addressing the identified hazardous event. The alarm function, including inputs and outputs, is designed to provide the allocated risk reduction. There is sufficient time for the operator to detect a problem exists, to determine what to do and to take appropriate action necessary to return the process to normal operating limits.

The total operator response time should be less than one-half of the available process safety time. For a protective alarm, the process safety time is the time between the alarm occurrence and the hazardous event occurrence. A.2 Practical SIL Target Selection: Risk Analysis per the IEC 61511 Safety Lifecycle [6] The sensor and logic solver used to activate the alarm must be at least 90 percent reliable and independent of the initiating event and other IPLs (independent) The alarm must be part of a well-rationalized alarm annunciation system such that the operator is not overwhelmed with too many alarms The alarm setpoint must be within the operating range of the sensor and may not be changed without permission and a change management procedure (dependable and auditable) The alarm must not be capable of being bypassed or inhibited and it must be annunciated in a control room that is continually manned when the process is operating (dependable) The operator must have adequate time to respond to the alarm. This response time includes the time it takes him to detect the alarm, diagnose what should be done, physically move to the final elements to be manipulated and execute the manipulation (dependable). For example, a high level alarm on a compressor suction drum will require the control room operator to acknowledge the alarm, determine the need to drain the drum, call the field operator, and request the action. Then the field operator must stop their current activity and physically go to the compressor, locate the correct drain valve, and then open the valve. This response time must also include the time it takes the operator to recover from making an incorrect decision or process manipulation or come back into the control room to get a wrench to move a stuck valve! An alarm response procedure detailing the actions required by each type of operator (control room and field) must exist and be available to the operators. All operators must be trained, drilled and periodically audited on the procedure and its required actions (auditable) All operators must be capable, and willing, to make the correct intervention actions at least 90% of the time (dependable) The operators must have a final element to manipulate that is independent of the initiating event and other IPLs, including any SIFs (independent) The alarm must reveal the dangerous condition under all circumstances (specific) The proper functionality of the alarm must be periodically verified and documented (auditable)

Alarm system performance must be measured and proven to be adequate (dependable). To ensure performance is acceptable it must be measured and compared to key performance metrics (targets) such as those defined in the ISA-18.2 standard. Alarm Performance Metrics Based upon at least 30 days of data Metric Annunciated Alarms per Time: Target Value: Very Likely to be Acceptable Target Value Target Value: Maximum Manageable Annunciated Alarms Per Day per Operating Position ~150 alarms per day ~300 alarms per day Annunciated Alarms Per Hour per Operating Position ~6 (average) ~12 (average) Annunciated Alarms Per 10 Minutes per Operating Position ~1 (average) ~2 (average) Metric Percentage of hours containing more than 30 alarms ~<1% Percentage of 10-minute periods containing more than 10 alarms ~<1% Maximum number of alarms in a 10 minute period 10 Percentage of time the alarm system is in a flood condition ~<1% Percentage contribution of the top 10 most frequent alarms to the overall alarm load Quantity of chattering and fleeting alarms Stale Alarms Annunciated Priority Distribution Unauthorized Alarm Suppression Unauthorized Alarm Attribute Changes Target Value ~<1% to 5% maximum, with action plans to address deficiencies. Zero, action plans to correct any that occur. Less than 5 present on any day, with action plans to address 3 priorities: ~80% Low, ~15% Medium, ~5% High or 4 priorities: ~80% Low, ~15% Medium, ~5% High, ~<1% highest Other special-purpose priorities excluded from the calculation Zero alarms suppressed outside of controlled or approved methodologies Zero alarm attribute changes outside of approved methodologies or MOC Table 10. ISA-18.2 Alarm Performance Metrics [3] A.3 Layer of Protection Analysis: Simplified Process Risk Assessment [8] The indication for action required by the operator must be detectable. The indication must always be: o Available for the operator, o Clear to the operator even under emergency conditions, o Simple and straightforward to understand. The time available to take action must be adequate. This includes the time necessary to decide that the action is required and the time necessary to take the action. The longer the

time available for the action, the lower the PFD given for human action as an IPL. The decision making for the operator should require: o No calculations or complicated diagnostics, o No balancing of production interruption costs versus safety The operator should not be expected to perform other tasks at the same time as the action required by the IPL, and the normal operator workload must allow the operator to be available to act as an IPL. The operator is capable of taking the action required under all conditions expected to be reasonably present. As an example, consider a proposed IPL where an operator is required to climb a platform to open a valve. If a fire (as the initiating event) could prevent this action, it would not be appropriate to consider the operator action as an IPL. Training for the required action is performed regularly and is documented. This involves drills in accordance with the written operating instructions and regular audits to demonstrate that all operators assigned to the unit cab perform the required tasks when alerted by the specified alarm. The indication and action should normally be independent of any alarm, instrument, SIF or other system already credited as part of another IPL or initiating event sequence.