Real-Time Root Cause Analysis for Complex Technical Systems

Similar documents
Alarm Analysis with Fuzzy Logic and Multilevel Flow Models

ALARM SYSTEMS. Abstract

Experiences from Intelligent Alarm Processing and Decision Support Tools in Smart Grid Transmission Control Centers

GO BEYOND OTDR LEAVE NO FAULTY NETWORK BEHIND. intelligent Optical Link Mapper

Reduced Order WECC Modeling for Frequency Response and Energy Storage Integration

Practical Distributed Control Systems (DCS) for Engineers & Technicians. Contents

D Implementation of a DEMO system related to Smart Fault Management in LV network (normal/storm conditions)

Intelligent alarm management

XA ALARM HANDLING SYSTEMS AND TECHNIQUES DEVELOPED TO MATCH OPERATOR TASKS

Advanced Pattern Recognition for Anomaly Detection Chance Kleineke/Michael Santucci Engineering Consultants Group Inc.

Electric Power Group Presents. Welcome!

Martin Huber 26September 2017 F&G SOLUTIONS FOR THE PROCESS INDUSTRY

Because Safety is not found in a Box

Synchrophasor Technology in Control Centers Clemson University, SC March 12, 2014 Vikram S. Budhraja

Effective Alarm Management for Dynamic and Vessel Control Systems

Performance Neuro-Fuzzy for Power System Fault Location

SIMPLE METHODS FOR ALARM SANITATION

Oracle Real Application Clusters (RAC) 12c Release 2 What s Next?

Fire Risks of Loviisa NPP During Shutdown States

USING A REAL TIME SIMULATOR IN A LARGE AND COMPLEX ROAD TUNNEL FOR TIME AND COST SAVINGS

Alarm Management: EPRI Survey on the State of Technology

For Complete Fire and Gas Solutions

Product introduction Layers of Protection Layer 3: Safety System Instrumented & Mechanical. Layer 2: Alarms Manual action needed

KELTRON LS 7000 ALARM MANAGEMENT SYSTEM Keltron Alarm Monitoring, Dispatch, and Reporting Software

Table of Contents PART I: The History and Current Status of the Industrial HMI PART II: Fundamentals of HMI Design and Best Practices

intelligent fire alarm ID60series panels

BUSINESS PLAN CEN/TC 142 WOODWORKING MACHINES SAFETY EXECUTIVE SUMMARY

Gear Unit Condition Monitoring. Portfolio overview

Integrated Knowledge Base Tool for Acquisition and. Verification of NPP Alarm Systems

IEC61511 Standard Overview

ADVANCED MONITORING AND DIAGNOSTIC SYSTEMS FOR INDUSTRIAL GAS TURBINES

TROUBLESHOOTING AT THE SPEED OF LIGHT EMBEDDED OTDR FOR OPERATIONAL EXCELLENCE IN PASSIVE OPTICAL NETWORKS

Training Fees 4,000 US$ per participant for Public Training includes Materials/Handouts, tea/coffee breaks, refreshments & Buffet Lunch.

Enhancing Quality of Gas Distribution Tests in Electrostatic Precipitators

A new view on building protection. Cerberus DMS makes building protection smarter, easier and more flexible. Answers for infrastructure.

Ridley Terminals Canada. Site HMI Station. In Central Office. Prepared by: Wayne Leung Controls Specialist. QCA Systems Limited

Brazil. Current status in NPP I&C and Discussion Topics. 24th Meeting of the IAEA TWG-NPPIC

Architectural and Engineering Specification for a Security Management System. StarNet 2

PRODUCT REVIEW Honeywell HC900 Hybrid Vacuum Furnace Control System

AUTROSAFE 4 A new level in interactive fire detection systems

Statistical cluster analysis and visualisation for alarm management configuration. Butters, T D and Güttel, S and Shapiro, J L and Sharpe, T J

AUTROSAFE. An interactive fire detection system for larger vessels

Regional Training. Seminar. » EasyPower Hands-On» Protective Device Coordination» Arc Flash Hazard Analysis. March 12-16, 2018 Austin, TX

Dryer Controller M720

2012 Honeywell Pacific Users Group. Sus tain.ability.

Numerical Standards Listing

Proceedings Design, Fabrication and Optimization of a Silicon MEMS Natural Gas Sensor

AN10943 Decoding DTMF tones using M3 DSP library FFT function

A STUDY ON THE BEHAVIOR OF STEAM CONDENSATION IN U-SHAPED HEAT TUBE

United States Oil and Gas Pipeline Leak Detection System (LDS) Market (By Equipment, By Application, By Region) - An Analysis: 2017 Edition

ID50series. intelligent fire alarm panels

Risk Management of Electromagnetic Disturbances

Safety Instrumented Systems The Smart Approach

United States Engineered Quartz (E-Quartz) Market By Type, By End Use-Sector, By Region: Opportunities & Forecast ( )

Compact Product Suite Compact HMI 6.0 Overview ABB

Serie 3 Interaction Design Basics, HCI in the Software Process & Design Rules

Loss Prevention Standard

Overcoming the Challenge of Implementing Profit Controller on a Third Party DCS. Ramiro Andrés Cremaschi Julia Alejandra Perinetti Repsol YPF

Troubleshooting & faults guide

Licensing of FPGA-based Safety Platform RadICS: Case Study

5. How and Why the Blackout Began in Ohio

Basic Input Data Needed to Develop a High Performance Fan/Blower with Low Noise, Energy Saving and High Efficiency

RLDS - Remote LEAK DETECTION SYSTEM

02 21st Century Trends Resources 8/24/2017. Trends in Human Machine Interface (HMI) Old School Trends. The Future of HMI

C&I SYSTEM DIAGNOSTICS WITH SELF MONITORING AND REPORTING TECHNOLOGY (SMART)

TOOLS 8000 SOFTWARE FOR THE ENTIRE LIFE CYCLE OF YOUR FIRE DETECTION SYSTEMS

System Design & Diagnostic Analysis of Group 2 Systems

Field Products. Experion LX. Proven DCS for a wide range of industrial applications

SMB/4676/R STRATEGIC BUSINESS PLAN (SBP) Title of TC TC 62: Electrical equipment in medical practice

Liebert PPC Second Generation Power Distribution Cabinet

Fuzzy Controller for Adjust the Indoor Temperature and Preservation the Buildings

Next Generation Alarm Management With DynAMo Alarm and Operations Management

Computer-based Alarm Processing and Presentation Methods in Nuclear Power Plants

1S25. Arc Fault Monitor 4 Zones, 8 Sensors. Features. Introduction. ARC Fault Protection

To appear in the IEEE/IFIP 1996 Network Operations and Management. Symposium (NOMS'96), Kyoto, Japan, April TASA: Telecommunication

Process Control Systems Engineering

Where Technology Shapes Solutions. Alarm management : Wasn t that problem already solved years ago?

GAS DETECTOR LOCATION. Ø.Strøm and J.R. Bakke, GexCon AS, Norway

Different types of Fire Alarm System

TOP R4 Unknown Operating State

Compression of Fins pipe and simple Heat pipe Using CFD

Operational Overview and Controls Guide

AmpLight Lighting Management System

6367(Print), ISSN (Online) Volume (IJCET) 3, Issue 2, July- September (2012), IAEME

DS9400 Series. Release Notes for Firmware V2.07. Fire Alarm Control Panel

ABB Ability System 800xA Alarm Management

Failure Modes, Effects and Diagnostic Analysis

Improved Dryer Control

Optimizing Strategy for Boiler Drum Level Control

DeltaV Operate. DeltaV Operate. Introduction. DeltaV Product Data Sheet. Robust and secure plant operations

Numerical Standards Listing

FUNCTIONAL SAFETY: A PRACTICAL APPROACH FOR END-USERS AND SYSTEM INTEGRATORS

SIL DETERMINATION AND PROBLEMS WITH THE APPLICATION OF LOPA

A REPORT TO THE BOARD OF COMMISSIONERS OF PUBLIC UTILITIES. Electrical. Mechanical. Civil. i Protection & Control. b^n Telecontrol.

DeltaV Operate. Product Data Sheet DeltaV Operate December 2006 Page 1. Introduction. Benefits

Fire Engineering in High Rise 15 November 2013

WHITE PAPER FEBRUARY Oracle Retail: Optimize Performance and Reduce Business Risk

Alarm System Example

User Manual. Dryer Controller M720

Failure Modes, Effects and Diagnostic Analysis

Transcription:

Real-Time Root Cause Analysis for Complex Technical Systems Jan Eric Larsson, Joseph DeBor IEEE HPRCT Monterey, CA, August 29 th 2007

Handling Alarm Cascades Root cause analysis: find the initiating events in a complex fault situation 2

GoalArt s Solution We provide Software system For industrial plants and complex technical products Helps operators understand current fault situation In unexpected and dangerous situations Quick and correct actions Benefits Increased availability Less downtime Efficient diagnosis and repair Less risk for large blackouts Better environment for operators Power plant control room 3

Customers Control systems Nuclear power Medical equipment Power and heating Power and heating Power grid research Power grid Nuclear power Nuclear power Dialysis Nuclear research Power and heating Power and heating Vallviks Bruk Power grid Power grid Nuclear power Nuclear power Power and heating Power grid Nuclear power Pulp and paper Power and heating Vehicles Pulp and paper research 4

Multilevel Flow Models Multilevel Flow Models (MFM) Formal graphical representation with high abstraction level Well established and tested from research Describes goals and functions of industrial systems Goals: why do we run the system Functions: the capabilities of the system Functions describe flows of mass, energy, and information Describes to the process should work and not fault situations! Much less modeling work, handles unpredicted situations Most algorithms based on one single model Good real time properties, enables fast algorithms 5

Multilevel Flow Models Line 1 trips from internal fault Line 2 overloads Analysis Line 1 is a root cause Line 2 is a consequence Root cause analysis can reduce large alarm cascades to single root cause alarms 6

How Does It Work? Pump Tank Simple example Pump and closed tank Described as transport and storage Four consequence propagation rules are valid for this connection 7

Simple Cases 8

Principal Algorithm Consequence propagation rules defined for each legal MFM connection Diagnostic state is updated incrementally as new observations come in Look upstream and downstream to next observation only Gives low computational complexity Typical case is constant, independent of model size / inputs / diagnoses Worst case is linear in model size / inputs / diagnoses Complete analysis of 200 000 input system takes ~ 50 milliseconds We have several tricks to make the algorithm useful in practice Handle all types of loops and multiple root causes Switch causations on and off 9

Simple Power Grid Example 10

Needed Tables 11

Example Situation Analysis 12

Presentation GoalArt provides information Primary / consequences Dynamic priority / suppression Sensor faults Can be presented in different ways Alarm lists Process schematics Text messages Advanced graphics Sound GoalArt can help with design Some examples in demo 13

Vallvik 1998 Pulp and paper plant Alarm cascade Tube leak > 100 alarms during first minute Explosion ten minutes later GoalArt system MFM model with 500 inputs Built and tested in 4 days Comprises entire boiler Result Correctly identifies root cause alarms There would have been time (!) 14

August 14, 2003 Largest blackout in the world so far Started in First Energy area, Ohio Three independent lines sagged into trees Electric vegetation management Developed slowly 3:05 PM to 4:05 PM Large cascade over eastern US and Canada 4:05 to 4:15 PM Very costly 50 million people 61 800 MW Up to 4 days to restore everything Estimated 4 10 million USD GDP down 0.7 % in Canada 15

Conclusions from August 14 Demo Independent problems Three lines short-circuited First Energy s alarm system software stopped Root cause analysis would have helped to identify the overload Three 345 kv lines tripped Underlying 138 kv lines overloaded and tripped More 345 kv lines tripped When Sammis Star tripped, the cascade became irreversible Time to avoid the blackout The First Energy overload and trip development lasted 3:05 to 4:05 16

Nuclear Project Diagnostic station connected to training simulator Excellent results System size 6 500 signals On-line real-time to training simulator Possible to visit / get reference Barsebäck, Forsmark, Oskarshamn, Ringhals, Loviisa, Olkiluoto Total of 15 reactors from 500 2000 MW 17

Knowledge Engineering Knowledge acquisition process Major effort Good documentation Process expertise required IFE Halden valuable source Model construction Quite efficient Large task because of number of signals Testing and correction Large task All systems must be tested at least once Complex system needs simulator Formal testing and validation Not done in this project Current model size 6 500 signals used, of 9 600 total signals 60 % of signals handled 40 % of signals tested 12 000 MFM objects Largest MFM model ever built by hand Corresponding expert system Alarm project demo: 50 000 70 000 rules Authorizer s Assistant: 35 000 rules More than 100 man-years effort (!) Five people for permanent maintenance Toshiba s largest system: 5 000 rules We are already past the limit of world s largest knowledge-based system 18

Hambo Simulator Developed at HAMMLAB, IFE Halden, Norway True model of Forsmark 3, largest Swedish BWR Around 9500 status indications Several uses Operator training HMI design Human factors experiments Sponsored by all Scandinavian nuclear power plants 19

Hambo Overview 20

After Scram 21

Computational Efficiency I Midwest ISO power grid 21 400 stations, 32 300 lines, 230 000 signals, 42 000 faults Complete computation time ~ 1.4 seconds on standard PC (3 GHz) No other diagnostic method can handle this size of system 22

Computational Efficiency II Rule-based expert systems Charles Forgy: RETE III, Blaze Advisor: 2.5 million RPS, 300 % faster than competitors NASA SHINE: 230 million RPS according to Viaspace own press release Fuzzy logic hardware chip design (research, Australia): 1.2 billion RPS Midwest ISO knowledge base corresponds to ~ 2.5 million rules 230 000 signals => 2.5 million rules 42 000 faults => 42 000 executions Total calculation time ~ 3 seconds of CPU 2 500 000 * 42 000 / 3 = 35 000 000 000 Apparent GoalArt rule speed 35 billion RPS More than a factor of 100 10 000 compared to fastest rule-based systems? 23

Computational Efficiency III Well-known large rule-based systems Toshiba electric power plant, largest technical expert system: 6 000 rules American Express, Authorizer s Assistant, (> 200 man-years): 30 000 rules Classical forward or backward chaining New, web-based or database-like systems OpenCYC, world s largest general knowledge base, first version: 60 000 rules OpenCYC, current version: > 1 000 000 assertations World Wide Web? No reasoning engines or algorithms for advanced diagnosis available GoalArt knowledge bases Forsmark 3 nuclear power plant, largest hand-built (4 man-months): 65 000 rules Midwest ISO, largest auto-generated power grid knowledge base: 2 500 000 rules GoalArt root cause analysis algorithm 24

Integration GoalArt Diagnostic Station (GDS) Reads data from control system Performs diagnostic calculations GoalArt Engineering Station (GES) Design and maintenance of models Configuration of GDS Standard interfaces OPC for alarm and event data CIM for grid topology Benefits Well-defined interface to control system Minimal changes in control system Safe, because control system is not affected 25

Blackout September 23 rd, 2003 Large blackout in Scandinavia September 23 rd 2003, 12.35 PM Root causes 12:30 OKG 3 nuclear reactor trip (east) 12:35 Internal station short-circuit (west) Consequences Two lines for all of southern Sweden Southern Sweden collapsed (5-15 min) Eastern Denmark collapsed Lasted 1-5 hours Actions Cost Second root cause unknown for 4 hours Helicopters looking for line faults Lost ~ 10 000 000 kwh Cost ~ 500 000 000 USD Largest disturbance in 20 years 26

GoalArt Demo of Sep 23 Blackout Demo contains Auto-generated MFM model from Oracle databases Alarm cascade from control system Covers all main events Time order is approximately correct Some alarms arrive in wrong order Demo does not contain No alarm grouping/filtering Conclusions The cascade is correctly analyzed No manipulation of MFM model needed Proof of concept on real system Commercial delivery now ongoing 27

Presentation GoalArt provides information Primary / consequences Dynamic priority / suppression Sensor faults Can be presented in different ways Alarm lists Process schematics Text messages Advanced graphics Sound GoalArt can help with design Some examples in demo 28

Alarms Lines kv Hz 29

Alarms Lines kv Hz 30

Alarms Lines kv Hz 31

Alarms Lines kv Hz 32

Summary Alarm cascades tend to obliterate situation awareness This happens exactly when larger incidents develop Root cause analysis can solve this problem Makes the alarm system useful during incidents Now there is a new technology for root cause analysis Plug-and-play solution a realistic solution for alarm cascades 33