Real time Video Fire Detection using Spatio-Temporal Consistency Energy

2013 10th IEEE International Conference on Advanced Video and Signal Based Surveillance Real time Video Fire Detection using Spatio-Temporal Consistency Energy Panagiotis Barmpoutis Information Technologies Institute, ITI-CERTH, 1st Km Thermi-Panorama Rd, Thessaloniki, 57001 Greece panbar@iti.gr Kosmas Dimitropoulos Information Technologies Institute, ITI-CERTH, 1st Km Thermi-Panorama Rd, Thessaloniki, 57001 Greece dimitrop@iti.gr Nikos Grammalidis Information Technologies Institute, ITI-CERTH, 1st Km Thermi-Panorama Rd, Thessaloniki, 57001 Greece ngramm@iti.gr Abstract In this paper a new algorithm is proposed for detecting fire from video data in real time based on a combination of features, including a new spatio-temporal consistency feature. A significant challenge in flame detection systems is to discriminate between actual fire and false alarms caused by fire colored objects. Towards this aim, we propose an algorithm that has significant advantages: a) it is fast since rectangular non-overlapping blocks are used as basic elements (instead of arbitrary-shaped regions) b) a spatio-temporal consistency feature is used in addition to color probability, spatial energy, flickering and spatio-temporal energy features and c) an improved rule-based classification approach is proposed, after evaluating seven different classification approaches. Experimental results are presented that confirm the efficiency of the proposed approach. 1. Introduction The evolution of imaging technology has positively affected several areas of computer vision. In the case of fire detection for security applications, video based techniques based on image processing have significant advantages. Specifically, visual cameras as opposed to thermal (or smoke) sensors have a small response time, since there is no need to wait for the heat (or smoke) to diffuse, before it can be detected. Furthermore, they have low cost and ability to cover large areas, especially if rotating (PTZ) cameras are used. Important properties of a vision based detection system are its response time and the probability of detection (true positives) versus the number of false alarms (e.g. caused by fire colored moving objects). However, a big challenge is to decrease the high false alarm rates often due to a) natural objects which have the same characteristics with flame, b) large variations of flame appearance in videos and c) environmental changes that complicate fire detection including clouds, sun, shadows, dust particles, light reflections, etc. [1] Toreyin et al. [2] proposed an algorithm in which flame and fire flickering are detected by analyzing the video in the wavelet domain, while in [3] a hidden Markov model was used to mimic the temporal behavior of flame. Zhang et al. [4] proposed a contour based forest fire detection algorithms using FFT and wavelets, whereas Celik and Demiral [5] presented a rule-based generic color model for fire flame pixel classification. Ko et al. used hierarchical Bayesian networks for fire flame detection [6] and a fire flame detection method using fuzzy finite automata [7]. More recently, Dimitropoulos et al. [8] proposed a video flame detection algorithm using a combination of five spatio-temporal features to detect fire. Furthermore, an algorithm that uses covariance matrix descriptors for feature extraction from video is proposed in [9], while in [10] an algorithm based on feature fusion using a fuzzy classifier is presented. In this paper, a new block-based video algorithm is presented with significant improvements: a) it is faster since blocks are used as basic elements instead of blobs (arbitrary-shaped regions). b) a spatiotemporal consistency feature is added which is seen to significantly improve performance and c) an improved rule-based classification approach is proposed, after evaluating seven different classification approaches. More specifically, background subtraction and color analysis using a non-parametric model are first used in order to select candidate fire blocks. Then, four additional features are extracted from each block to reduce false alarm rates, namely color probability, spatial energy using 2D wavelet analysis, spatio-temporal energy and a flickering factor. Then, a novel feature is introduced, which measures spatiotemporal consistency by taking into account the existence of neighboring a) candidate blocks in the current and previous frames and b) neighboring flame fire blocks in the previous frames. As a final step, each candidate block is classified into two classes ( fire or non-fire ). In order to maximize efficiency, seven different classifiers are investigated (two rule based approaches, a Support Vector Machine (SVM), a Bayes, a Neural Network, a Decision Tree and a Random Tree classifier) and the bestperforming approach is used. 978-1-4799-0703-8/13/$31.00 2013 IEEE 365

The rest of the paper is organized as follows: Section 2 describes in detail the different processing steps of the proposed approach. In Section 3, experimental results with fire and non-fire video sequences are presented. Finally, conclusions are drawn in Section 4. 2. Methodology Every frame of the video sequence is divided into NxN blocks (in our experiments N=16). As seen in Figure 1, the proposed methodology consists of several steps. Initially, background subtraction is performed to detect moving blocks. Color analysis is then applied to each moving block to decide whether it will be considered as a candidate fire block or not. For each candidate fire block, a vector of five features is then computed: a) fire color probability b) spatial wavelet energy, c) spatiotemporal energy d) flickering feature and e) spatiotemporal consistency energy. Finally, a classifier is used to determine whether the block is actually a fire block or not. For distant fires we keep the same percentage but, we either use smaller blocks or modify the zoom factor. Furthermore, for each candidate block, a color probability feature is then computing by averaging the color probability of each pixel within this block. 2.3. Spatial Wavelet Analysis As pointed out in [8], regions containing real fires exhibit a higher spatial variation than those in fire-like colored objects. Spatial wavelet analysis is a powerful image processing technique that has considerable potential to quantify this property. Therefore, a two dimensional wavelet filter is applied on the red channel of each frame and the spatial wavelet energy at each pixel is calculated by following formula:,=, +, +, (1) where HL, LH and HH are the high-frequency subbands of the wavelet decomposition. For each block, the spatial wavelet energy is estimated as the average of the energy of the pixels in the block. =,, (2) where is the number of pixels in a block. Figure 1: Overview of the proposed approach. 2.1. Background Subtraction Background subtraction is used as a first step to identify moving objects in the video. Based on the evaluation of thirteen background extraction algorithms in [8], we chose to use the Adaptive Median algorithm, which is fast and very efficient. However if there are moving objects in the background (e.g. moving branches) other more robust algorithms may be used. 2.2. Color Analysis In the second processing step, only blocks that contain a large number of fire-colored moving pixels are selected as candidate fire blocks. To filter out non-fire moving pixels, we compare their values with a predefined RGB color distribution that is created by non parametric estimation from a number of real fire-samples from video sequences captured from different cameras, similar to [8]. If the percentage of fire-colored pixels within a block is over a certain threshold, then the block is considered a candidate for the next steps (12,5%, in our experiments). 2.4. Spatio-Temporal Analysis The shape of flame changes irregularly due to the airflow caused by wind or due to the type of burning material. Spatio-temporal analysis is useful for discriminating between real fire and fire-like colored objects. The temporal variance of the spatial energy of a pixel (i,j) within a temporal window of N last frames is:,=,, (3) where is the spatial energy of the pixel in time instance t and the average value of this energy. For each block, the total spatio-temporal energy,, is estimated by averaging the individual energy of pixels belonging in the block. 2.5. Temporal Processing =,, (4) The flickering property yields a very powerful feature for discriminating between actual fire and fire-like objects. This feature for a specific pixel can be mathematically expressed as a function of the number of changes c(i,j) from flame to non flame status and vice-versa within a 366

specific time interval. The following formula was found to be efficient (after experimental tests):,=2, 1 (5) The flickering feature for each block is calculated as the average the individual flickering contributions of the pixels in the block. 2.6. Spatio-Temporal Consistency Energy To measure the spatio-temporal consistency of a block we define an energy cost consisting of two terms: one for smoothness and one for data consistency. Figure 2: Calculation of $)* term using neighboring candidate and fire labeled blocks from current and previous frames. #$$%#& '= $)* '+ +,, ' (6) Specifically, to define $)* for each candidate fire block, we use the number of fire neighboring candidates blocks in the current frame as well as the number of neighboring blocks (either fire candidates or permanently labeled as fire blocks) in the previous frames in video sequence (Figure 2). Thus, $)* consists of two terms: one depending on whether the neighboring blocks are fire candidates or not and another on whether they are actually labeled as fire blocks or not: Figure 3: Calculation of candidate energy (,#+ ) using neighbor candidates blocks from current and previous frame. $)* '=,#+ '+,% ' (7) If we consider n previous frames for candidate blocks and m previous frames for fire-labeled blocks (n=1 and m=3 in our experiments), (7) can be re-written as: # ) (8) $)* = [.,#+ / ]+ [1,% / ] where,#+ is the candidate consistency energy which is calculated as the weighted average of the number of candidate fire-blocks out a) of the 8 neighboring blocks in the current frame and b) its 9 neighboring blocks in the previous frame (Figure 3). Use of the candidate blocks from the previous frame, in addition to those in the current frame, was seen to improve efficiency. Similarly, the labeling consistency energy,% is calculated as the weighted average of the number of spatio-temporal neighboring blocks (out of its 9 neighboring blocks) in each of the three previous frames, which have already been labeled by the algorithm as fire blocks (Figure 4). In our experiments, the coefficients for the weighted average, after experimental tests on the training set, are chosen as a 0 =a 1 =1 while b 1 =2, b 2 =1,5 and b 3 =1. By allocating a larger weight (b i >a i ) to the blocks that permanently labeled by the algorithm as fire blocks, we improve the efficiency and reduce the number of false alarms. Figure 4: Calculation of label energy (,% ) using neighbor blocks which identified as fire from three previous frames. Finally, to measure the data consistency +,, of the block neighborhood are, we use a weighted average of the other 4 computed features (Subsections 2.2-2.5) as: +,, =2 +2 +2 4 +2 5 (9) where c i is a factor which multiplied with each feature of data cost function (after experimental tests on the training set c 1 =c 2 =c 3 =1 and c 4 =6; the flickering feature has larger weight because it performs better). E data is then calculated as average value of E dblock for this block and its 8 neighboring blocks. Using this spatio-temporal consistency energy feature is advantageous since it takes into account the combination of a) neighboring candidates blocks in the current and in 367

Figure 5: Experimental results with videos containing actual fires. Figure 6: Experimental results with videos containing fire colored objects. the previous frames, b) already fire-labeled neighboring blocks in the previous frames and c) computed fire features in the block neighborhood. Moreover results obtained for this block are also propagated neighboring blocks in future frames, if they are candidate fire blocks. 2.7. Classification As a last step, classification is used to obtain the final decision about whether a block is a fire block or not based on the extracted features from each block. For this step, we evaluated a number of different classifiers: Support Vector Machine (SVM) classifier, Bayes classifier, Neural Network classifier, Decision Tree classifier, Random Tree classifier. Classifiers implemented using OpenCV [11]. In addition to these classifiers, two additional rule-based classification approaches were also evaluated. In the first approach (NR-1), a threshold th i is empirically defined for each feature i after conducting a number of experiments (Color probability: th 1 =0.0024, Spatial wavelet energy: th 2 =68, Temporal energy: th 3 =44, Spatio-temporal variance: th 4 =48, spatio-temporal consistency energy: th 5 =108). Then the following classification technique is applied for each block: first, the value of a metric C is computed by the following equation: 8 6 = /h,' (10) where F is a function defined as follows: 0,' </h /h,'=9 1,' >/h = (11) Based on the above metric, two rule-based classification techniques were defined: In NR-2, a block is classified as fire if C>=3 and as non-fire otherwise. In NR-1, additional emphasis is given to spatio-temporal consistency energy, by classifying the blocks as fire when C>=3 AND f 5 >T holds (T=75 in our experiments) and as non-fire otherwise. In our experiments, the training of classifiers was based on 42240 blocks from fire and non-fire video sequences. 368

Figure 7: Comparing proposed method with other methods [8], [9], [10] in videos containing actual fires. Figure 8: Comparing proposed method with other methods [8], [9], [10] in videos containing fire colored objects. Comparing the results of the seven classifiers that were used, we realized that the best results are highlighted using first rule based approach as analyzed above. Results are summarized in Figures 5 and 6. Figure 5 illustrates the ratio of the number of frames in which fire was correctly detected out of total number of frames in each fire test video sequence. The average recognition rates are: 96,18% for NR-1, 95,79% for NR-2, 98,99% for SVM classification, 90,11% for Bayes classifier, 98,67% for Neural Network classification, 98,50% for Decision Tree classification and 98,92% for Random Tree classification. Similarly, Figure 6 shows the ratio of the number of frames in which fire was erroneously detected (false alarms) out of total number of the number of frames in each non-fire test video. Respectively, average ratios of false alarms are: 0,07% for first rule based approach (NR- 1), 4,53% for second rule based approach (NR-2), 29,48% for SVM classification, 9,84% for Bayes classifier, 29,49% for Neural Network classification, 23,09% for Decision Tree classification and 26,2% for Random Tree classification. True fire detection True fire detection True rejection of fire colored moving object True fire detection True rejection of fire colored moving object True rejection of fire colored moving object Figure 9: Images from results of proposed algorithm. 369

When considering all videos (both fire and non-fire), the best accuracy (proportion of the total number of samples that were correctly classified) is 98,05% using the rule based NR-1 approach. Screenshots of video sequences from results of proposed algorithm are represented in Figure 9. 3. Experimental results In this section we present indicative experimental results of the proposed method. Eleven videos containing actual fires and ten videos containing flame-colored moving objects were used for the evaluation. Test videos can be downloaded from the FIRESENSE video database [12]. To evaluate the performance of the proposed method, results are compared with results of the rule-based and SVM-based approach proposed in [8], the covariance matrix descriptors algorithm and approach using featuresfusion [9] and the fuzzy classifier approach [10]. As seen in Figures 7 and 8, results obtained using the first rule based proposed method (NR-1), are superior when compared to those obtained by the other methods. For fire videos, average percentage for recognition rates of fire is 82,43% for covariance matrix-based algorithm [9], 96,31% for the rule-based approach from [8], 99,64%, for the SVM classifier approach from [8], 92,76% for features-fusion approach [10], while it is 96,18% when using our first rule based proposed approach (NR-1). For non-fire videos containing fire colored objects, the average recognition rates are 0,4% for covariance matrixbased algorithm [9], 13,8% for blobs ruled based approach [8], 41,14% for blobs approach using SVM classifier [8], 44,82% for features-fusion approach [10], while it is only 0,08% for the proposed NR-1 approach. In total, the proposed rule-based approach (NR-1) achieves a total accuracy of 98,05%, which is superior than all other methods. Finally, in terms of speed, since blocks are used as basic elements, the proposed algorithm is 2,1% faster (by counting FPS) than the approach in [8], which used blobs (arbitrary-shaped regions) instead. 4. Conclusions In this paper, we have presented a novel framework for real time video fire detection. The framework consists of several processing steps involving background subtraction, color analysis, spatial wavelet analysis, spatio-temporal analysis, flickering feature, spatio-temporal consistency energy and classification. In order to solve the difficult problem of discriminating between fire-colored moving objects and actual fire, we used novel features that can improve performance and decrease detection errors. For this reason, we introduced a new feature (spatio-temporal consistency energy), which exploits prior knowledge about the possible existence of fire in neighboring blocks from the current and previous video frames. We applied our framework to twenty one videos containing actual fire and moving flame colored objects. As seen by the experimental results, the method compares favorably with four other recent techniques reported in the literature [8], [9], [10]. In the future, we will employ a similar framework to detect smoke, with the final objective of implementing a reliable platform for detecting flame and smoke that is very accurate and robust to environmental factors. References [1] Stipaničev, D., Vuko, T., Krstinić, D., Štula, M., Bodrožić, Lj., Forest Fire Protection by Advanced Video Detection System - Croatian Experiences, Third TIEMS Workshop Improvement of Disaster Management System, Trogir, 26-27 September, 2006 [2] Töreyin, B., Dedeoglu, Y., Gudukbay, U. Cetin, A.E., Computer vision based method for real-time fire and flame detection, Pattern Recognition Letters, 27, pp. 49 58, 2006 [3] Töreyin, B., Dedeoglu, Y., Gudukbay, U., Cetin, A.E, "Flame detection in video using hidden markov models," IEEE Int. Conf. on Image Processing, pp. 1230-1233, 2005 [4] Zhang, Z., Zhao, J., Zhang, D., Qu, C., Ke, Y., and Cai, B. Contour Based Forest Fire Detection Using FFT and Wavelet, In Proceedings of CSSE (1). 2008, 760-763 [5] Celik, T., Demirel, H., Fire detection in video sequences using a generic colour model, Fire Safety Journal, Vol. 44, pp. 147-158, 2009 [6] Ko, B. C., Ham, S. J., Nam, J. Y., Modeling and Formalization of Fuzzy Finite Automata for Detection of Irregular Fire Flames, IEEE Transactions on Circuits and Systems for Video Technology, May 2011 [7] Ko, B., Cheong, K., Nam, J., Fire detection based on vision sensor and support vector machines, Fire Safety Journal, Vol. 44, pp. 322-329, 2009. [8] Dimitropoulos K., Tsalakanidou F., Grammalidis N., Flame Detection For Video-Based Early Fire Warning Systems and 3D Visualization of Fire Propagation, IASTED International Conference on Computer Graphics and Imaging, 10.2316/P.2012.779-011, 2012 [9] Habiboglu H., Gunay O., and Cetin, A. E. "Covariance matrix-based fire and flame detection method in video", Machine Vision and Applications, Vol. 23 No 6, pp. 1103-1113, November 2012 [10] Dimitropoulos K., Gumay O., Kose K., Erden F., Chaabene F., Tsalakanidou F., Grammalidis N., Cetin E., Flame Detection for Video-Based Early Fire Warning for the Protection of Cultural Heritage, 4th International Euro- Mediterranean Conference on Cultural Heritage (EuroMed 2012), Lemesos, Cyprus, 29 October-3 November 2012 [11] OpenCV documentation, http://docs.opencv.org [12] Firesense video database, http://www.firesense.eu/ index.php?option=com_remository&itemid=136&func=file info&id=1406&lang=en (registration is required) 370