7 th June, 2002 Notes from the 2 nd Meeting on LHC Alarms/Logging/Post-Mortem Relationships Held on 22nd May 2002 Present Ronny Billen, Robin Lauckner, Maciek Marczukajtis, Maciej Peryt, Nicole Polivka, Mark Tyrrell Summary Mark Tyrrell presented the CERN Alarm System Databases, copies of his transparencies are appended. The alarm team established the user s s () to create a homogeneous view of the equipment for the alarm system. There are 22 s corresponding to 22 identified users who have been asked to maintain their respective data. Inside the s there are 45 systems which correspond to the alarm fault families of the alarm naming scheme, this is driven by hierarchical structure of alarms within the system - relating essentially to functional grouping of alarm events. The provides static alarm information such as geographic locations, responsible people and telephone numbers. There is a main table that is not normalized and contains a row for every alarm message known to the system. There are about 100 000 of these. Maintaining this information causes problems. MT points out that this arises because of the lack of data management effort provided by the alarm users. The information sources are extremely fragmented and often erroneous. He cited the new SPS Vacuum PVSS application. Configuration information is kept is Excel and is used to configure PVSS as well as the alarm system. Need for the information is not restricted to the alarm system. During the SPS start up other software systems also suffered from the poor management of equipment data. For example the Power Converter description is also needed by the Transfer Line software (C. Niquille) and the ROCS diagnostic software (C. Arimatea). All 3 systems build a private copy of the power converter information. RB stated that each group should be given an Oracle for their equipment together with an appropriate tool to manage the data. The groups must be responsible for the contents of these descriptions. RL pointed out that a well designed description of the equipments would not meet the current alarm system practice. The accdb is built from the 22 s. During amalgamation of the 22 s internal coherency of the information is verified. The alarm system will not accept unknown [fault family, fault member] sets or wrong identification of alarm transmission processes. The alarm accdb then contains all the static information needed for the distributed alarm managers and the central alarm servers. The central alarm servers access the static information and feed an online local archive recording faults as they occur and are cleared. Once per day the online archive is
exported to a long term Oracle 8i data warehouse for permanent storage. It is here that the history of the static information is tracked to enable historical alarms to be completely reconstructed. The online archive contains a single record for each alarm with 2 timestamps indicating the times that the alarm was generated and cleared. This is unlike a typical logging record that contains a single time stamp. The data warehouse is structured to achieve good performance taking into account the long-term nature of this repository. Conclusions 1. The equipment description necessary for integration and hardware commissioning is being assembled by EST-ISS. We need to check if this service will include all alarm clients. 2. The on-line and permanent archives could be considered as logging clients. 3. A definition of Equipment Names for hardware access and Control Room displays is needed. In the Control Room the list will include abstract identities. 4. A standard template should be provided to capture information to be logged or alarmed. Reported by R.J. Lauckner Appendix Copies of Mark Tyrrell s slides
! What am I talking about? " Where do I find the Fault State (FS) definitions " Where do I find all the static information about a FS? " Who is responsible for these definitions/information? " Where do I find the FS relationships for reduction? " Who is responsible for these definitions? " These are questions about FS! " WHAT ABOUT CONTROL INFORMATION? " WHERE IS THE CONTROL DATABASE? " Equipment group s data? Equipment group s DB s? Talk given to the LHC-CP, CP, Logging Group May 21st. 2002 M. W. Tyrrell. 3 June 2002 3 3 June 2002 1! Naming for SPS (a reference to the PS)? " Used the control name: # (SPS): Data Module Name, N value # (not the actual names of the equipment!) # (e.g. control name: AUXPS, 12033 # real name: MDSH1197 ) # (LEP): Family, Member # (LHC): Class, Device # (PS): Equipment Module, N value # Class, Device The alarm display polls regularly a set of equipment in front-end DSC. This is done by calling the alarm property for the equipment in the Control Module (CM). The set of equipment to be scanned is given by the Working Set (WSet) which is a structured list of CM classes and equipment numbers in these classes.! Talk Outline: " What am I talking about? " Naming for SPS (a reference to the PS)? " What we found for LEP? " The solution adopted for LEP exists today " Comments concerning SPS start-up 2002! " What Chris Roderick found (Ronny Billen s section)? " Conclusion -? LHC? " Logical and physical names, need both 3 June 2002 4 3 June 2002 2
ALARM DATABASE LAYOUT Control system: (front-ends HP servers 35 in total, 130 data driven processes) pcrsrv1 data files Haifa development Test CAS user level: Oracle s, Excel, user s hpslz30 test CAS for reduction user specific scripts 22 s 45 systems Alarm accdb static archive havana user s Oracle8 static Archive merge (24hrs.) archive hamar - Data (1997-2002) Warehouse - Webdb (statistics) (24hrs.)! What we found for LEP: " Equipment Groups that had Oracle s: # VAC, CV, SL-PO, ST-MO, ST-EL, SL-NET, RAD " Equipment groups that did not have s: # SL-CO, SL-EA, (SL-BI), (SL-BT), SL-RF #!!!!! --- 45 different systems / equipment types ---!!!!! " The existing DB s were ALL different " We required alarm information from ALL these systems!!!!!!!!!!!!!!!!!!!!!! " What did we do?! The solution adopted for LEP exists today 3 June 2002 7 3 June 2002 5! Comments concerning SPS start-up 2002! " SL-PO: # It took several meetings and 8 people to get it right! # They re in a mess!!!!! " LHC-VAC: " ST-EL: # Lots of our time, 2 DB updates, and still not right! # NEW SPS PVSS pressure data source is Excel! # They re in a mess!!!!!! # Given up the Oracle DB, now changing the IT directly! # The new EFACEC SCADA uses INGRES -> ST / DB Databases Oracle Information layer of the Equipment and operation groups IT s ACCESS Excel Files Specialised Oracle scripts between the user layer and their s The alarm team created a new User () for each alarm user (22) IT s ONE standard Oracle script between the s and the alarm DB New Alarm Layer 3 June 2002 8 Alarm Database 3 June 2002 (copy) 6
! What Chris Roderick found (Ronny Billen s section)? " The SL equipment group s DB s are fragmented! " Produced a report on SL/PO. In a very bad state. " SL/BI, made a new scheme, waiting, for comments. " Currently working on DB for SL/MS: # Made a new scheme # Apathy " The current s are cracking : # Above, already mentioned LHC-VAC. # Current alarm maintenance put out to contract:!in 1 ½ years, passed through 4 hands!!each pass, loose expertise! 3 June 2002 9! Conclusion -? LHC?: " Looking at logging seems to be jumping in the deep end. " We do not have any control LHC DB infrastructure or even guidelines!! " Each time some one has something new, they create a new DB solution: SPS vacuum PVSS Excel, LHC cryo. Excel / Word. " What about QRL??? Juan asked for help last week! " Herve is also looking for a common approach for cryo.. He will emphasise this at the June meeting. At the moment they use Excel and Word. " We need an equipment / control DB structure EDMS? " I understand Josi and Thomas Pettersson are working together. " With the new re-organisation we need a:! CONTROLS DATABASE SUPPORT CENTER " Check out Josi s Survey of PS/SL DB.. report CERN/PS 99-015 (CO) 3 June 2002 10