Impact Of Machine Learning Models In Pneumonia Diagnosis With Features Extracted From Chest X-Rays Using VGG16

Pneumonia is a viral, bacterial, or fungal infection that leads to the accumulation of pus or fluids in the alveoli of lungs causing breathlessness, lung abscess, or even death at later stages. Pneumonia is affecting a huge population across the globe. A quite large number of child deaths due to pneumonia are recorded which is significantly greater than death due to AIDS, malaria, and measles. Pneumonia diagnosis is considered one of the high priority research areas in Biomedicine. In this paper, a detailed comparative study was performed using various machine learning algorithms namely Random Forest (RF), Logistic Regression (LR), and Support Vector Machine (SVM). These models are trained with features extracted by a pre-trained deep convolutional neural network (DCNN), VGG16 for the diagnosis of pneumonia from chest x-rays. The combination of VGG16 along with Machine learning models witnessed a considerable improvement in accuracy with reduction in time consumed for training against the usage of DCNN models for prediction. The results of various machine learning models are fine-tuned by modifying the hyper parameters. By comparison, SVM with RBF kernel is identified to perform better than other classifiers.


Introduction
Pneumonia is a lung infection caused due to the accumulation of fluids or pus in the alveoli (air sacs) of the lungs. Alveoli are the structures in the lungs where the exchange of carbon dioxide and oxygen takes place during the process of exhalation and inhalation. Accumulation of pus or fluids in alveoli causes difficulty in breathing, chest pain, headaches, fatigue, vomiting, fever, and loss of appetite. Pneumonia is caused due to microorganisms like viruses, bacteria, or fungi. Bacterial and viral pneumonia is contagious, as they can spread from person to person. There are more than 250000 people hospitalized in the United States (US) due to pneumonia out of which 50,000 thousand of them die each year [1]. The child mortality rate of pneumonia is greater than the sum of malaria, AIDS and measles [Rudan et-al 2008] [Adegbola RA]. According to a study, it was estimated that around 17 lakhs of children will die by 2030 due to pneumonia [4] and also stated that around 4 million people could have saved if a proper diagnosis was present. Thus an accurate and précised diagnosis is more important to reduce the death toll due to pneumonia. Pneumonia can be diagnosed with chest x-rays, blood tests, bronchoscopy, pulse oximetry, CT scan, and lung ultrasound [5].
Chest x-rays are the most widely used imaging techniques than CT scans for the diagnosis of pneumonia [6 2001]. Chest x-rays are preferred more than chest CT scans because x-ray imaging takes less time than CT imaging and high-resolution CT scans may not be available in all regions across the globe. On the contrary, x-ray imaging is the most widely used technique that is playing an important role in epidemiological studies and clinical care [Cherian et-al  In this paper, a study was performed on using machine learning algorithms namely random forest (RF), logistic regression (LR) and support vector machine (SVM) trained with features extracted by a DCNN named VGG16 to diagnose pneumonia from chest x-ray images. The chest x-ray dataset used in this study consists of 4273 x-ray images diagnosed with pneumonia and 1583 x-ray images of healthy people. The study was performed by implementing these classifiers by varying specific parameters like kernel function (for SVM), penalty (for LR), and the number of trees for (RF). In this study, SVM with RBF kernel reported the highest performance than other model configurations.

Related Works
With the increasing need for accurate and précised diagnosis in medicine, machine learning and deep learning methods are being used. Several methodologies were proposed by researchers across the globe to cater to this With this motivation, a comprehensive study of using ML algorithms namely Random Forest (RF), Logistic Regression (LR), and Support Vector Machine (SVM) with different parametric values trained with features extracted by VGG16 DCNN is presented in this paper.

Dataset Description and Image Preprocessing
The chest x-ray images used for this study are extracted from the ChestXRay2017 dataset developed by Daniel Kermany et-al [Kermany et-al 2018]. This dataset consists of 5863 images classified into two classes namely pneumonia and normal. There are 4273 chest x-ray images diagnosed with pneumonia and 1583 chest xrays of healthy persons. Few images from the dataset are shown in figure 1. The chest x-ray images are pre-processed by converting to PNG (portable network graph) format, grey-scale representation, and resizing to 224 x 224 pixels. After preprocessing the images, the dataset is split in the ratio 90:10 for training and testing the classifiers. Few images from the dataset are shown in Figure 1.

Methods
In this paper, the performance of machine learning classifiers trained with features extracted by VGG16 DCNN diagnose of pneumonia from chest x-ray images is studied. This section is divided into two subsections where subsection 4.1 describes the feature extraction methodology and subsection 4.2 describes the classification algorithms used for the study.

Feature Extraction
In this study, VGG16 DCNN with weights trained on the ImageNet database is used for extracting features from the images. ImageNet database consists of more than 14 million images categorized into 1000 classes. Since pre-trained models like VGG16, InceptionV3, etc have already learned to extract features from the images and also to distinguish images of different classes, these models have shown magnificent performance when applied on datasets of similar domains. VGG16 DCNN has shown excellent performance than other pretrained models in medical imaging [Yadav et-al 2019], so it is used for feature extraction in this study. VGGNet or VGG16 DCNN contains 16 convolutional layers with receptive fields of size 3 x 3, 5 Max-Pooling layers of pool size 2 x 2 for spatial pooling, and 3 fully connected layers with the last layer activated with Soft-max function and it has 144 million parameters. In this DCNN the hidden layers are activated with Rectified nonlinearity (ReLU) activation function and dropout regularization is used in the fully connected layers. The structure of VGG16 architecture is shown in Figure 2. For this study, in order to use VGG16 as a feature extractor the fully connected layers of the DCNN are removed (highlighted with red color in figure 3). The features extracted by the VGG16 in the form of feature vectors are used for training ML classifiers namely Random Forest (RF), Logistic Regression (LR), and Support Vector Machine (SVM). The structure of the proposed study is shown in Figure 3.

Support Vector Machine (SVM) is a machine learning algorithm developed by Cortes and Vapnik [CortesC,
Vapnik V 1995] used to solve regression and classification problems. In this study, SVM is used for binary classification. To solve a classification problem, SVM plots the data items in the dataset in an N-dimensional space where N is equal to the number of attributes or features in the data and then finds an optimal hyperplane(line) to separate these two classes of data. SVM works well for linearly separable data, but the image data used in this study are non-linearly separable. So SVM's have to be tuned using kernel functions.
Kernels are nothing but mathematical functions that transform the non-separable training data into separable data of different classes. The kernel functions used in this study are 'linear', 'RBF', 'polynomial', and 'sigmoid'. The equations of these functions are given below. In the above equations, K is the kernel function, = ( 1 , . , ) is the features, = ( 1 , . . , ) is the labels of the training data, σ is an adjustable parameter greater than zero, is the slope, is the degree of the polynomial and c is a constant.

Random Forest (RF) or Random decision forests is an ensemble learning algorithm developed by Tin Kam
Ho [Ho, T.K 1995] used for regression and classification tasks. The training of RF is done using a technique known as bootstrap aggregation.
RF algorithm works by constructing many decision trees during the training time and this algorithm outputs the value that is mean of values predicted by the individual trees in the forest for regression task and for classification this algorithm outputs the class that is the mode of classes predicted by the individual trees. For this study, the performance RF constructed with 50, 100, 200, and 400 trees trained with features extracted by VGG16 DCNN is studied.

Logistic Regression
Logistic Regression (LR) is a statistical model developed by Tolles J and Meurer WJ [Tolles J, Meurer WJ 2016] used for regression and classification. This algorithm works based on the logistic function. Logistic function or sigmoid function is an S-shaped curve that takes real-valued input and maps it between 0 and 1, but never 0 or 1. The equation of the sigmoid function is shown in equation 2.
In the above equation, k represents steepness, L represents the curve's maximum value, x is a real number and 0 is the x value of the sigmoid midpoint. In this study, LR with 'L1' and 'L2' penalty is studied. The only difference between LR with 'L1 regularization' and 'L2 regularization' is that L2 regularization adds squared magnitude of coefficient ∑ | 2 | =1 as penalty term to the loss function and L1 regularization adds an absolute value of the magnitude of coefficient ∑ | | =1 as penalty term to the loss function.

Experiments and Results
In this study, the performance of DCNN transfer learning for the diagnosis of pneumonia from x-ray images is evaluated and analyzed. The VGG16 DCNN is used for extracting features from the chest x-ray images. To use VGG16 DCNN as a feature extractor the fully connected layers of the VGG16 are removed. For classification, ML algorithms namely Random Forest (RF), Logistic Regression (LR), and Support Vector Machine (SVM) is employed. The reason behind choosing these algorithms for classification is that they have shown promising results in medical diagnosis. For an extensive study, specific parameters of the classifiers are varied within reasonable limits. For example, for SVM the experiments were conducted by comparing 'RBF', 'linear', 'Polynomial' and 'sigmoid' kernels, for experiments related to RF the number of trees was varied from 50 to 200, and LR is compared based on 'L1' and 'L2' penalizations. Except for the parameters stated above the other parameters of the classifiers are fixed at their default values. The base model configurations that were evaluated in this study for the diagnosis of pneumonia are shown below (the parameters that were changed for the study are enclosed in parenthesis). These models are evaluated based on accuracy, error rate, sensitivity, F1-Measure, precision, and specificity. The performance of these classifiers is shown in Table 1 and are pictorially represented in Figure 4.  The three best performance classifiers that are selected for further investigation are TL+LR(L2), TL+SVM(RBF), and TL+RF (200). Among these three classifiers, Support vector machine (SVM) with RGB kernel trained with features extracted by VGG16 DCNN reported higher performance then followed by Logistic regression with the L1 penalty and Random Forest with 200 trees. This is further confirmed by Area-under-ROC-curve (AUC) values, ROC curve, Intersection over the Union (IoU), and Kappa score. The confusion matrices of the best three classifiers are shown in Figure 5. The AUC, IoU, and Kappa scores of the selected configuration of classifiers are shown in Table 2 and the ROC curves are shown in Figure 6. The comparison of these models based on the positive predicted value (PPV), negative predicted value (NPV), and balanced accuracy is shown in Table 3. Thus the AUC, IoU, Kappa score, accuracy, sensitivity, F1-Measure, and the error rate of TL+SVM(RBF) seems satisfactory for this study.

Conclusion
Pneumonia is a pulmonary infection affecting the alveoli of a single or both the lungs. Pneumonia causes severe illnesses like shortness of breath, bacteremia (bacteria in the bloodstream), pleural effusion (fluid accumulation around the lungs), lung abscess, and even deaths. So an accurate and précised diagnosis of pneumonia is very important in today's world. Considering this as the highest priority, a comparative study was performed using Logistic Regression, Support Vector Machine, and Random Forest trained with features extracted by VGG16 DCNN for the diagnosis of pneumonia. For an extensive study, the performance of the machine learning algorithms are compared by varying specific parameters within reasonable limits like for SVM kernel functions ('linear', 'polynomial', 'sigmoid' and 'RBF') are used, for LR penalty term ('L1' and 'L2') is used and for RF the number of trees (50, 100, 200 and 400) are used. From the study, it was found that SVM with RBF kernel trained with features extracted by VGG16 DCNN reported the highest performance than other classifiers.