Supporting Information

Similar documents
Applied Data Science: Using Machine Learning for Alarm Verification

1066. A self-adaptive alarm method for tool condition monitoring based on Parzen window estimation

Video Smoke Detection using Deep Domain Adaptation Enhanced with Synthetic Smoke Images

A Statistical Analysis of a Liquid Desiccant Dehumidifier/Regenerator in an Air Conditioning System

ON-LINE SENSOR CALIBRATION MONITORING AND FAULT DETECTION FOR CHEMICAL PROCESSES

Development of Bionic Air Cooler Used in High Temperature Coal Mine

EE04 804(B) Soft Computing Ver. 1.2 Class 1. Introduction February 21st,2012. Sasidharan Sreedharan

Best Row Number Ratio Study of Surface Air Coolers for Segmented Handling Air-conditioning System

Research on Decision Tree Application in Data of Fire Alarm Receipt and Disposal

solution, due to the partial vapor pressure difference between the air and the desiccant solution. Then, the moisture from the diluted solution is rem

STUDY ON THE CONTROL ALGORITHM OF THE HEAT PUMP SYSTEM FOR LOAD CHANGE

Ecography. Supplementary material

AS a principal element of a large-scale power plant, it is essential

Shortcut Model for Predicting Refrigeration Cycle Performance

The effect of write current on thermal flying height control sliders with dual heater/insulator elements

Performance-based Fire Design of Air-supported Membrane Coal Storage Shed

Research on Fault Detection and Diagnosis of Scrolling Chiller with ANN 1

Alarm Correlation Research and Implementation Based on Similar Data Sources

Performance Neuro-Fuzzy for Power System Fault Location

Soil Classification and Fertilizer Recommendation using WEKA

Prediction of Soil Infiltration Rate Based on Sand Content of Soil

CROWD-SOURCED REMOTE ASSESSMENTS OF REGIONAL-SCALE POST-DISASTER DAMAGE

Heat Transfer in Evacuated Tubular Solar Collectors

A comparison of numerical algorithms in the analysis of pile reinforced slopes

Performance of Water-in-Glass Evacuated Tube Solar Water Heaters

Conference Paper Decision Support System for Alarm Correlation in GSM Networks Based on Artificial Neural Networks

Intrusion Detection System: Facts, Challenges and Futures. By Gina Tjhai 13 th March 2007 Network Research Group

A Forest Fire Warning Method Based on Fire Dangerous Rating Dan Wang 1, a, Lei Xu 1, b*, Yuanyuan Zhou 1, c, Zhifu Gao 1, d

Improving rail network velocity: A machine learning approach to predictive maintenance

2012 International Symposium on Safety Science and Technology Investigation on compressed air foams fire-extinguishing model for oil pan fire

Real time Video Fire Detection using Spatio-Temporal Consistency Energy

Supporting Information

Australian Journal of Basic and Applied Sciences. Leak Detection in MDPE Gas Pipeline using Dual-Tree Complex Wavelet Transform

Benefits of Enhanced Event Analysis in. Mark Miller

Two-Pipe Fiber Bragg grating Temperature Sensor Applied in Cable Trench and Cable Pit

Optimized Finned Heat Sinks for Natural Convection Cooling of Outdoor Electronics

Harmful Intrusion Detection Algorithm of Optical Fiber Pre-Warning System Based on Correlation of Orthogonal Polarization Signals

MODELLING AND OPTIMIZATION OF DIRECT EXPANSION AIR CONDITIONING SYSTEM FOR COMMERCIAL BUILDING ENERGY SAVING

An improved Algorithm of Generating Network Intrusion Detector Li Ma 1, a, Yan Chen 1, b

Heat Transfer Enhancement using Herringbone wavy & Smooth Wavy fin Heat Exchanger for Hydraulic Oil Cooling

THE EFFECTS OF HUMIDITY, COMPRESSOR OPERATION TIME AND AMBIENT TEMPERATURE ON FROST FORMATION IN AN INDUSTRIAL FRIDGE *

Flexibility, scalability andsecurity

Andrzej Kesy Kazimierz Pulaski University of Technology and Humanities, Radom, Poland, and

Computer Modelling and Simulation of a Smart Water Heater

Measure Lifetime Derived from a Field Study of Age at Replacement

Finned Heat Sinks for Cooling Outdoor Electronics under Natural Convection

Modeling of Ceiling Fan Based on Velocity Measurement for CFD Simulation of Airflow in Large Room

A Study on the 2-D Temperature Distribution of the Strip due to Induction Heater

Improved occupancy detection accuracy using PIR and door sensors for a smart thermostat

INFLUENCE OF DIFFERENT TEMPERING PERIOD AND VACUUM CONDITIONS ON THE RICE GRAIN BREAKAGE IN A THIN LAYER DRYER

Development of Small-diameter Tube Heat Exchanger: Circuit Design and Performance Simulation

Stability of Inclined Strip Anchors in Purely Cohesive Soil

Data Mining. Brad Morantz PhD.

Research on the Monitor and Control System of Granary Temperature and Humidity Based on ARM

The Effect of Quantity of Salt on the Drying Characteristics of Fresh Noodles

Statistical Detection of Alarm Conditions in Building Automation Systems

STACK EFFECT IN LIGHT WELL OF HIGH RISE APARTMENT BUILDING

Australian Journal of Basic and Applied Sciences

Crop Management Details Start from Parameters settings Emergence Planting Seed brand: Settings Parameter settings Hybrid-specific Generic

INDOOR HUMAN THERMAL COMFORT OPTIMAL CONTROL WITH DESICCANT WHEEL COOLING SYSTEM

ESTIMATING DRYING TIME FOR A STOCK PEANUT CURING DECISION SUPPORT SYSTEM

FIRE DYNAMICS IN FAÇADE FIRE TESTS: Measurement, modeling and repeatability

EFFECTS OF AIR VELOCITY AND CLOTHING COMBINATION ON HEATING EFFICIENCY OF AN ELECTRICALLY HEATED VEST (EHV): A PILOT STUDY

Experimental Study on Performance of Double pipe Length on Instantaneous Air Source Heat Pump Water Heater. Yin Shaoyou 1, a

Simulation of Evacuation Process in a Supermarket with Cellular Automata

Thermodynamic analysis of air cycle refrigeration system for Chinese train air conditioning

Optimization of Moisture condensation rate of Liquid Desiccant Dehumidifier through Genetic Algorithm

Efficiency of Application the Condensing Heat Utilizers in the Existing Boiler's Unit in Heat Power Station

SIMULATION ANALYSIS ON THE FRESH AIR HANDLING UNIT WITH LIQUID DESICCANT TOTAL HEAT RECOVERY

The Experimental Study and Simplified Model of. Water Mist Absorbing Heat Radiation

An Alarm Correlation Algorithm for Network Management Based on Root Cause Analysis

PULLOUT CAPACITY OF HORIZONTAL AND INCLINED PLATE ANCHORS IN CLAYEY SOILS

Curriculum Vitae. Dr.Congfu Xu

Effect of Water and Nitrogen Stresses on Correlation among Winter Wheat Organs

Design Procedure for a Liquid Dessicant and Evaporative Cooling Assisted 100% Outdoor Air System

Assessment of aluminium stress tolerance of triticale breeding lines in hydroponics

2. HEAT EXCHANGERS MESA

A New Approach for Modeling of Hot Air-microwave Thin Layer Drying of Soybean

Link loss measurement uncertainties: OTDR vs. light source power meter By EXFO s Systems Engineering and Research Team

Implementation and testing of a model for the calculation of equilibrium between components of a refrigeration installation

Test One: The Uncontrolled Compartment Fire

Predicting Drying Efficiency during Solar Drying Process of Grapes Clusters in a Box Dryer using Artificial Neural Network

Performance Prediction of Refrigeration Systems by Artificial Neural Network

The design of automatic remote monitoring system for the temperature and humidity based on GPRS Hui ZHANG 1, a, Honghua WANG 2,a

International Forum on Energy, Environment Science and Materials (IFEESM 2015)

Reducing energy consumption of airconditioning systems in moderate climates by applying indirect evaporative cooling

Compression of Fins pipe and simple Heat pipe Using CFD

Experimental Study on the Effects of Compression Parameters on Molding Quality of Dried Fish Floss

FIMD: Fine-grained Device-free Motion Detection

Performance of solid desiccant-based evaporative cooling system under the climatic zones of India

Speech parameterization using the Mel scale Part II. T. Thrasyvoulou and S. Benton

Design and Research of the Commercial Digital VRV Multi-Connected Units With Sub-Cooled Ice Storage System

PRELIMINARY ANALYSIS OF THE NUMBER OF OCCUPANTS, FIRE GROWTH, DETECTION TIMES AND PRE-MOVEMENT TIMES FOR PROBABILISTIC RISK ASSESSMENT

Development of TRNSYS Models for Predicting the Performance of Water-in-Glass Evacuated Tube Solar Water Heaters in Australia

Thin-layer drying of some Sri Lankan paddy varieties under low humid conditions

Corresponding author, Dept. of Urban & Rural Planning, Zhejiang Gongshang University, Hangzhou , CHINA,

ISO 7183 INTERNATIONAL STANDARD. Compressed-air dryers Specifications and testing. Sécheurs à air comprimé Spécifications et essais

TSI AEROTRAK PORTABLE PARTICLE COUNTER MODEL 9110

Review of progress in soil inorganic carbon research

Available online at Energy Procedia 6 (2011) MEDGREEN 2011-LB

Transcription:

Electronic Supplementary Material (ESI) for RSC Advances. This journal is The Royal Society of Chemistry 2016 Supporting Information Modeling and Optimizing Performance of PVC/PVB Ultrafiltration Membranes Using Supervised Learning Approaches Lina Chi a,b, c*, Jie Wang b, Tianshu Chu b, Yingjia Qian a, Zhenjiang Yu a, Deyi Wu a, Zhenjia Zhang a, Zheng Jiang c, James O. Leckie b a School of Environmental Science and Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China b The Center for Sustainable Development and Global Competitiveness (CSDGC), Stanford University, Stanford, CA 94305, U.S.A c Faculty of Engineering and the Environment, University of Southampton, Southampton, SO17 1BJ, UK * Correspondent author. Tel: +86 13816632156; E-mail address: lnchi@sjtu.edu.cn Text S. Comparison of four supervised learning algorithms S.1 Linear regression (LR) LR is a simple yet powerful statistical model that approximates the response as a linear combination of predictors. It estimates the corresponding coefficient for each predictor, so that m the mean squared error (MSE) 1 m i = 1 ( y (i) y (i) ) 2 is minimized over the training set. LR has some typical advantages. (1) It accepts mixed types of predictors, such as numerical, categorical 1, or even missing values 2. (2) It is robust for predictors with different scaling ranges, or even irrelevant predictors. (3) It is interpretable regarding how the response is related to each predictor, and here, LR can be combined easily with other statistical tests and analysis. However, major disadvantage of LR is that it uses the very strong assumption that the response has a linear relationship on the predictors and an identically, independently, and normally distributed noise. Thus, our predictions could have a very high bias if the data is not distributed in this way. S1

S.2 Multiple additive regression tree (MART) MART is an improved tree-based model that combines several classification and regression trees (CARTs) with a stochastic gradient boosting technology 3. Similar to LR, CART estimates the response based on the contributions of each predictor. However, instead of assuming the response as a linear function of predictors, CART performs multiple partitions in the space of predictors. Then for each new predictor vector, its response is estimated as the mean of all responses of the training set in the same region of the predictor space 4. However, a single CART usually has high bias due to the piecewise partitions, as well as high variance due to the greedy search strategy. Therefore, instead, we apply MART to improve the prediction accuracy. MART has considerable advantages. (1) It can naturally handle mixed predictor types. (2) It is robust to irrelevant predictors, different scales, and outliers. (3) It is interpretable from the trained and selected decision tree. However, MART has some disadvantages, such as high bias and variance for data with low order of correlations. S.3 Multiple additive regression tree (NN) NN is a popular brain-analog SL model structured with a hierarchical set of logistic regressions 5. The prediction of logistic regression is simply a sigmoid function of that of LR, so that each prediction is inside the range (0,1). An example n-3-1 NN is shown in Fig. S1, where n is the number of predictors, a and b are weights for the input and hidden units, and S and + denote the sigmoid function and linear combination, respectively. Formally, the predicted response is: y (i) = N j = 1 b j S(a 0j + n k = 1 a kj x (i) k ) + b 0, i = 1,,m (S1) where N is the number of hidden units and x is the predictor array of each training data point. The NN can be represented as all weights in the network, which can be trained with an appropriate number of epochs. During each epoch, we apply stochastic gradient descent on weights over the training set 3,6. The unique advantage of NN is that it can handle multiple correlated responses together and find complex relationships. Also, NN is more complicated than LR and MART, and we can adjust several controlling parameters, such as the number and size of hidden layers, to achieve the optimal performance. However, NN has more limits than either LR or MART. (1) It cannot accept S2

categorical or missing predictors 7. (2) It is a black box, and we do not know how each predictor contributes to responses from the model parameters. Fig. S1 Structure of an n-3-1 NN S.4 Support vector machine (SVM) SVM is a popular SL model using the kernel method. In particular, it replaces the predictor array with an appropriate basis function, and then to predict the response with a kernel function, indicating the similarity between the new basis function and all critical measurements (support vectors) in the training set, which can determine the decision hyper-plane in the classification problem 8. SVM is represented as the set of all support vectors, which can be estimated by minimizing the regularized loss function, defined with a certain kernel function 9. Vapnik et al. first introduced SVM to regression problems using the -insensitive loss function 10. The advantage of SVM is that it can solve complex problems by mapping the predictors to the space with higher dimension, leading to a low bias of predictions. SVM can handle both numerical and categorical predictors, but it has some limitations, such as that it is not interpretable and that it is sensitive to irrelevant predictors and different scales. S3

Table S1 Fabrication process parameters and performances of the resulting PVC/PVB membranes Sam Concentrati ple No. Wt% of Polyme rs Wt% of PVC Wt% of PVB Wt% of DMAc Wt% of Additive s Type of Additives Temperatur e of Casting Solution Evaporatio n Time Temperature of Blade Type of Coagulat ion Bath on of Solute in Coagulation Rejectio n Rate of BSA Flux Bath 1 15 13.5 1.5 85 0 none 60 5 60 water 0 15.4 112.4 2 15 12 3 85 0 none 60 5 60 water 0 12.6 221.52 3 15 10.5 4.5 85 0 none 60 5 60 water 0 18.3 267.67 4 15 9 6 85 0 none 60 5 60 water 0 20 308.97 5 15 7.5 7.5 85 0 none 60 5 60 water 0 22.6 356.86 6 16 14.4 1.6 84 0 none 60 5 60 water 0 35 17.8 7 16 12.8 3.2 84 0 none 60 5 60 water 0 41.6 20.43 8 16 11.2 4.8 84 0 none 60 5 60 water 0 52.4 82.3 9 16 9.6 6.4 84 0 none 60 5 60 water 0 58.5 148.54 10 16 8 8 84 0 none 60 5 60 water 0 58 151.2 S4

11 17 15.3 1.7 83 0 none 60 5 60 water 0 61.3 5.78 12 17 13.6 3.4 83 0 none 60 5 60 water 0 62.6 13.64 13 17 11.9 5.1 83 0 none 60 5 60 water 0 65.7 82.5 14 17 10.2 6.8 83 0 none 60 5 60 water 0 70 109.4 15 17 8.5 8.5 83 0 none 60 5 60 water 0 82.4 112.7 16 18 16.2 1.8 82 0 none 60 5 60 water 0 90.8 3.86 17 18 14.4 3.6 82 0 none 60 5 60 water 0 91.6 11.67 18 18 12.6 5.4 82 0 none 60 5 60 water 0 92.1 76.57 19 18 10.8 7.2 82 0 none 60 5 60 water 0 93.8 101.22 20 18 9 9 82 0 none 60 5 60 water 0 94.3 106.73 21 13 9.1 3.9 82.5 4.5 PEG600 60 5 60 water 0 90 46.2 22 15 10.5 4.5 80.5 4.5 PEG600 60 5 60 water 0 95 114.8 23 17 11.9 5.1 78.5 4.5 PEG600 60 5 60 water 0 92.1 63.5 24 18 12.6 5.4 77.5 4.5 PEG600 60 5 60 water 0 85.2 118.1 25 20 14 6 75.5 4.5 PEG600 60 5 60 water 0 80.3 65.4 26 18 18 0 77.5 4.5 PEG600 60 5 60 water 0 100 80 27 18 14.4 3.6 77.5 4.5 PEG600 60 5 60 water 0 95.2 90.8 S5

28 18 12.6 5.4 77.5 4.5 PEG600 60 5 60 water 0 86.7 120.6 29 18 9 9 77.5 4.5 PEG600 60 5 60 water 0 88.1 104.8 30 18 5.4 12.6 77.5 4.5 PEG600 60 5 60 water 0 58.5 196.6 31 18 1.8 16.2 77.5 4.5 PEG600 60 5 60 water 0 12.1 387.7 32 18 0 18 77.5 4.5 PEG600 60 5 60 water 0 10.3 578.4 33 18 12.6 5.4 77.5 4.5 PVPk90 60 5 60 water 0 82.4 26.5 34 18 12.6 5.4 77.5 4.5 Ca(NO3) 2 60 5 60 water 0 92.5 13.3 35 18 12.6 5.4 81 1 PEG600 60 5 60 water 0 94.95 33.85 36 18 12.6 5.4 79 3 PEG600 60 5 60 water 0 99.7 46.32 37 18 12.6 5.4 77 5 PEG600 60 5 60 water 0 99.3 118.92 38 18 12.6 5.4 81 1 PVPk90 60 5 60 water 0 91.58 14.25 39 18 12.6 5.4 79 3 PVPk90 60 5 60 water 0 96.6 11.32 40 18 12.6 5.4 77 5 PVPk90 60 5 60 water 0 83.35 17.82 41 18 12.6 5.4 81 1 Ca(NO3) 2 60 5 60 water 0 98.63 17.8 42 18 12.6 5.4 79 3 Ca(NO3) 60 5 60 water 0 95 117.1 S6

2 43 18 12.6 5.4 77 5 Ca(NO3) 2 60 5 60 water 0 87 19.35 44 18 12.6 5.4 77 5 PEG600 30 5 60 water 0 95.1 30.6 45 18 12.6 5.4 77 5 PEG600 40 5 60 water 0 94.7 74.3 46 18 12.6 5.4 77 5 PEG600 50 5 60 water 0 85.4 35.7 47 18 12.6 5.4 77 5 PEG600 60 5 60 water 0 82.3 71.8 48 18 12.6 5.4 77 5 PEG600 70 5 60 water 0 87.64 82.5 49 18 12.6 5.4 77 5 PEG600 80 5 60 water 0 93.53 12.6 50 18 12.6 5.4 77 5 PEG600 40 10 60 DMAC 30 43.58 140.8 51 18 12.6 5.4 77 5 PEG600 40 30 60 DMAC 30 15.84 156.7 52 18 12.6 5.4 77 5 PEG600 40 50 60 DMAC 30 19.2 178.5 53 18 12.6 5.4 77 5 PEG600 40 70 60 DMAC 30 16.74 180.9 54 18 12.6 5.4 77 5 PEG600 40 80 60 DMAC 30 18.77 208.9 55 18 12.6 5.4 77 5 PEG600 40 120 60 DMAC 30 4.28 182.7 56 18 12.6 5.4 77 5 PEG600 40 10 60 DMAC 50 66.57 85.3 57 18 12.6 5.4 77 5 PEG600 40 10 60 DMAC 10 88.9 53.1 S7

58 18 12.6 5.4 77 5 PEG600 40 10 60 DMAC 20 82.87 56.4 59 18 12.6 5.4 77 5 PEG600 40 10 60 DMAC 40 86.97 58.6 60 18 12.6 5.4 77 5 PEG600 40 10 60 DMAC 60 84.97 68.6 61 18 12.6 5.4 77 5 PEG600 40 10 60 DMAC 70 83.33 45.2 62 18 12.6 5.4 77 5 PEG600 40 10 60 DMAC 80 82.07 122.7 63 18 12.6 5.4 77 5 PEG600 40 10 30 water 0 89.03 78.2 64 18 12.6 5.4 77 5 PEG600 40 10 40 water 0 90.07 80.6 65 18 12.6 5.4 77 5 PEG600 40 10 50 water 0 88.66 63.5 66 18 12.6 5.4 77 5 PEG600 40 10 60 water 0 87.69 101.6 67 18 12.6 5.4 77 5 PEG600 40 10 70 water 0 87.07 68.7 68 18 12.6 5.4 77 5 PEG600 40 10 80 water 0 90.67 53.8 S8

References for supporting information 1 J. S. Long and J. Freese, Stata Press books, 2006. 2 R. J. A. Little, Journal of the American Statistical Association, 1992, 87, 1227 1237. 3 J. H. Friedman, Computational Statistics & Data Analysis, 2002, 38, 367 378. 4 D. Steinberg and P. Colla, The Top Ten Algorithms in Data Mining, 2009, 179 201. 5 J. Sargolzaei, M. H. Asl and A. H. Moghaddam, Desalination, 2012, 284, 92 99. 6 M. Mohri, A. Rostamizadeh and A. Talwalkar, Foundations of Machine Learning, Mit Press, 2012. 7 C. M. Ennett, M. Frize and C. R. Walker, Stud Health Technol Inform, 2001, 84, 449 453. 8 T. Hofmann, B. Schölkopf and A. J. Smola, Ann. Statist., 2008, 36, 1171 1220. 9 K. R. Müller, S. Mika, G. Rätsch, K. Tsuda and B. Schölkopf, IEEE Trans Neural Netw, 2001, 12, 181 201. 10 V. Vapnik, S. E. Golowich and A. Smola, Neural Information Processing Systems, MIT Press, Cambridge, 1997, 169-184. S9