An Efficient ensemble of Brain Tumour Segmentation and Classification using Machine Learning and Deep Learning based Inception Networks

In recent times, Brain Tumor (BT) has become a common phenomenon affecting almost all age group of people. Identification of this deadly disease using computer tomography, magnetic resonance imaging are very popular now-a-days. Developing a Computer Aided Design (CAD) tool for diagnosis and classification of BT has become vital. This paper focuses on designing a tool for diagnosis and classification of BT using Deep Learning (DL) models, which involves a series of steps via acquiring (CT) image, pre-processing, segmenting and classifying to identify the type of tumor using SIFT with DL based Inception network model. The proposed model uses fuzzy C means algorithm for segmenting area of interest from the BT image acquired. Techniques like Gaussian Naïve Bayes (GNB) and logistic regression (LR) are used for classification processes. To ascertain all the techniques for its efficiency a benchmark dataset was used. The simulation outcome ensured that the performance of the proposed method with maximum sensitivity of 100%, specificity of 97.41% and accuracy of 97.96%.


Introduction
In human body, brain is a vital organ which acts as a central nervous system. It controls and directs the body to function properly. Since brain is an important organ, it has to be covered from harm and ailments. Few of the brain tumors are Meningioma, Glioma, and Pituitary. Firstly, Meningiomas are prominent diseases; however, it is a non-cancerous type of tumors developed in narrow walls around the brain tissues and cells (Aruna Kiruthika, 2020; Aruna Kiruthika, 2020; Aruna Kiruthika, 2020; Fu.J, 2012). Brain Tumors (BTs) are considered to be most dreadful disease which mitigates the lifetime of a human being within a short span of time. Earlier prediction of BT is highly essential and significant to extend the patient's lifespan. This is accomplished by using Magnetic Resonance Imaging (MRI) scanning model which is applied extensively by radiologists in order to examine the BT. Finally; the scan report shows whether the brain is healthy or unhealthy. Followed by, it also finds the class of tumors when it is affected by a disorder. Under the application of Machine Learning (ML), MRI reports should have a précised image for predicting BT. Initially, developer's assumed 3 portions namely, Pre-processing of MRI, Feature generation, and extraction as well as Classification.
Ultimately, Median Filter (MF) has been applied to enhance the superiority of images and to conserve the edges in pre-processing phase (Talo. M 2019). Then, image segmentation is performed with the help of K-Means, Fuzzy C Means (FCM), and so on offers more advantageous features from applied images. It is one of the viable and important phase which helps in image examination and interpretation. Also, it is employed extensively in brain imaging functions like tissue classification, tumor position, evaluating the volume of tumor, blood cell inclination, surgical plans, and matching. In (Alqazzaz S, 2019), BT segmentation was utilized by a Convolutional Neural Networks (CNN) to 3D MRI. Automated prediction of brain's anatomical structure by using Deep Neural Network (DNN) was projected in (Sugimori H, 2019). In (Garikapati, P., 2020), a voting scheme for ensemble of transparent structures like intensity and adaptive shape modes takes place with the integration of discrete Gaussian as well as higher order patterns like Markov-Gibbs random field classification was developed. The hybridization of deep auto-encoder in conjunction with Bayesian fuzzy clustering-relied segmentation mechanism has been established in (Balamurugan, K, 2018).
In (Gumaste PP, 2020), 2D MRI is divided as left and right hemisphere along with some statistical properties was estimated for SVM classification approach. As there are massive features, feature extraction is performed with valid data under the application of Principal Component Analysis (PCA), Scale Invariant Feature Transform (SIFT), and Speed-up Robust Features (SURF) descriptors. In , after computing hybrid feature extraction and covariance matrix, a regularized extreme learning has been employed for classifying the brain disorder. Evolutionary Algorithms (EA) namely Particle Swarm Optimization (PSO) was utilized in (Hong KS, 2018) deciding combination of features. Moreover, well-known ML approaches are applied for image analysis.
This study introduces a novel BT diagnosis model using SURF and Inception networks. The presented model consists of pre-processing, segmentation, feature extraction, and classification. The proposed model uses FCM as a segmentation model to determine the affected tumor regions in the brain area. Besides, the SURF and Inception v3 model is employed to perform feature extraction. Finally, Gaussian Naïve Bayes (GNB) and Logistic Regression (LR) classifiers are employed as classifier models to determine the distinct class labels. In order to validate the results analysis of the proposed model, a series of experiments take place on the benchmark test dataset.

Literature Review
This section performs a short survey of different ML and deep learning (DL) based BT diagnosis models available in the literature. In (Sharif M, 2020), feature extraction was applied where brain system interface which undergoes classification using support vector machine (SVM) and Linear Discriminant Analysis (LDA). In recent times, CNN is one of the popular mechanisms with respect to feature extraction under various studies like clinical images, video examination, and natural language processing (NLP). The key objective of CNN is to forecast the chief patterns and data from training images. For example, VGG Net, Google Net, and Alex Net are some of the effectual structures applied in image classification which is also employed for BT prediction.
In (Ezhilarasi, T. P.,2020), pre-processing as well as data preparation using 3D-filters and CNN with multipath and cascaded structures has been presented. In pixel, CNN structure was utilized for generating diverse portraits of same person with distinct poses. In (Seetha J, 2018), a pretrained CNN was employed for BT classification with DNN and SVM. Then, in (Ranjeeth, S., 2020), cascade CNN produced a room decoration. As CNN is expensive, developers concentrated in developing cost-effective methods with exact tumor classification. The common technique is to apply ensemble of tiny collaborative learners rather than using a hectic system, in order to deal with robust training execution as well as convergence. Therefore, learning process of peer networks could be autonomous.
In (Zhang Y, 2018), a Kullback Leibler divergence has been applied for matching the probability estimates of peers in supervised learning. Besides, in (Kushibar K, 2018), multipath learners are involved in the outputs of shared layers. The main aim of this model is detecting the disorder robustly and maintains tumor development within a limited extent. A major challenge in ML model is to evaluate the data distribution. For instance, hardcoded associations between every image pixel and the neighbours are complicated to identify with no advanced knowledge. Additionally, autoregressive approaches are data-driven estimators used to identify these associations with typical information. Next, the produced results have enhanced images with limited noise and outlier. The density estimator tries to resolve various classifications, regression, missing data, and issues. In (Loganathan, J., 2016), a quantum variation Auto Encoder (AE) was presented where the latent generative computation which acts as a quantum Boltzmann machine. By the estimation of BT from MRI, tiny training inputs, various shapes of tumors, and irregular information could be identified for every class. Neural Autoregressive Distribution Estimation is one of the density estimators evolved from Restricted Boltzmann machines (RBM). It is used in estimating the density of binary, real-value data, and alternate network structures like CNN. Afterward, DNN is capable of handling nonlinear conversion, sequence modelling, representation learning and it is also stretchy for learning data from real-time classification as well as recommender systems.

The Proposed Method
Fig. 1 depicts the block diagram of the presented model, involving its different sub processes. Primarily, the input image is pre-processed for stripping the skull, remove the noise, and increase the contrast level. Then, FCM based segmentation technique is employed to identify the diseased portions in the image. Afterward, the SURF and Inception v3 models are applied to extract a useful set of feature vectors. At last, GNB and LR models are utilized in classification processes.

Image Pre-processing
Initially, the preprocessing of input images takes place in three different ways: skull stripping, noise removal using bilateral filtering (BF), and contrast limited adaptive histogram equalization (CLAHE) based contrast enhancement. Next to image preprocessing, segmentation task is done to identify the affected tumor regions. The segmentation process is carried out as a separate work as indicated. A vivid explanation of the pre-processing is given earlier module.

Fuzzy C Means based Segmentation
The Fuzzy C Means (FCM) technique is applied to segment the pre-processed image. FCM is a well-known approach evolved from unsupervised ML method that is extensively used for image segmentation. Fuzzy clustering ensures to be more flexible to overcome the inaccuracy of geographical data with remote sensing data.
It is significantly applied in massive data analysis, Data Mining (DM), Vector Quantization (VQ), image segmentation, and pattern detection with real-time and theoretical values.

Figure 1.Working process of presented model
According to this mechanism, fuzzy clustering contains a fuzzy set and an image pixel with a membership value related with a cluster from 0 and 1 where it measures the pixelswhich belong to specific cluster. Traditionally, diverse optimization models of fuzzy clustering were projected in which random projection as well as autonomous component analysis is employed for enhancing the efficiency of FCM and Meta heuristic approaches are integrated with FCM to maximize the clustering performance. Fig. 2 shows the flowchart of FCM model.
Consider that X = {X 1 , X 2 , ⋯ X n , } implies a collection of n data pointsand objective function of the FCMmodel is expressed below: Where, c denotes the clustercount, u ik refers the membership degree of x k in j th cluster. At the same time, the measure of u ik is ranged from [0,1], mrefers to the weighting exponent on all fuzzy memberships with a measure of 2, ν i represents j th cluster center, d ik exhibits the Euclidean distance among a cluster center ν i and object x k , and ‖ ⋅ ‖signifies the Euclidean norm. Furthermore, the membership function showcases the possibility of a cluster when pixels are placed away from cluster centers with limited membership values and pixels in local neighbourhood of cluster centers with maximum membership value, and minimization condition has been attained (Loganathan, J., 2017). In case of FCM approach, it depends upon the primary parameter set and computes the lower objective function J m (U, V)in all iterations. The U and V is described as follows: Where,u ik , ν i implies the membership function and cluster centers, correspondingly.

Feature extraction
This section explains the two main feature extraction techniques namely SIFT and Inception v3 model.

Scale Invariant Feature Transform
It is a local feature extraction technique, which makes use of a local invariant fast key point detection process to extract the key points of the image features.
The major phases in SIFT feature extraction is defined in the following: • The initial phase is scale-space extreme extraction: here, the interests points are scale as well as rotation invariant have been explored. The Difference of Gaussian (DoG) function has been employed. • Followed by, key point localization as well as filtering is carried out. Here, position and scale for output interest points are identified. Key points are decided as it is fast in image distortion. • Then, Orientation Assignment is performed where maximum orientation is allocated for every key point position according to the local image-gradient directions. • Finally, feature description has been performed. Local image gradients are estimated at decided scale in a neighbourhood of a key point. Also, 128D feature descriptor has been attained.
In past decades, CNN has gained maximum concentration from most of the developers due to its effective performance on image classification. It is combined with Transfer Learning (TL) and hyper-parameter tuning (Fu, J., 2012). Moreover, AlexNet, VggNet, GoogleNet, ResNet have been employed in this literature for advanced Deep CNNs (DCNN), and TL is performed to show the classification of MRI datasets. The studies based on ICH prediction is provided in this literature. Also, TL is a technique used in FC layer of existing CNN trained are deleted and viewed as a feature extractor. The attributes of hyper-parameter-tuning scheme is not initialized by a system which is essential for tuning and optimizing the attributes on the basis of simulation outcome attained from MRI computation and make the performance more viable. Fig.4 shows the key points that has been extracted by the SIFT feature extraction algorithm used on various ICH images.

Inception v3 model:
CNN is developed as a multi-layer interconnected NN, where powerful low-, intermediate-, and high-level features were extracted hierarchically. A common CNN model is composed of 2 layers namely, Convolutional and pooling layers which are jointly named as Convolutional bases of a system (Latchoumi, T. P., 2010). Few modules like Alex Net and VGG are implanted with Fully Connected (FC) layers. First, the Convolutional layer is applied to extract the spatial characteristics from the images. Typically, initial Convolutional layers filter out the low-level features like edges and corners whereas the final Convolutional layers filter the high-level features like image structures. It is recommended by its maximum efficiency of CNNs to learn the spatial hierarchical patterns. Also, it is operated on 2 elements namely, convolution patch size as well as and depth of last feature map which represents the filter count.
A non-linearity function like Rectified Linear Unit (ReLU) is typically employed as an element-wise nonlinear activation function for an element in the feature map.
Under the application of typical pixels, various feature maps are generated with identical function where it recommends repeated details. Thus, pooling layers are employed after Convolutional layer for reducing the variance of extracted features by applying typical procedures like major averaging pooling. Initially, max-and average-pooling layers are responsible to compute both maximum and mean scores, correspondingly, with the help of fixed-size sliding window as well as classical stride across feature maps and thus it is conceptually same as Convolutional layer. Unlike the Convolutional layers, a stride 2 has been utilized in pooling layers for down sampling the feature map. It is evident that pooling or sub sampling layer normalizes a simulation outcome of Convolutional layer as higher level and decides robust as well as abstract features for upcoming layers. Hence, pooling layer reduces the processing complexity at the time of training phase by limiting the feature maps.
Followed by, few systems are comprised of FC layers in prior to classifier layer which links the final outcome of various stacked Convolutional as well as pooling layers to a classifier layer. As a result, over-fitting is contributed in FC layer as it fills maximum parameters. Therefore, a dropout model is an effective regularization approach which is applicable to reduce the issues related with over-fitting. In case of training, a method which randomly drops few neurons and its connections over the system which eliminates neurons from additional coadaptation and involves in making useful independent attributes. Consequently, classifier layer is responsible to compute the posterior probabilities to all classes. Also, softmax classifier is named as normalized exponential which is typically employed when compared with DL model.  For a network output of former layer, the channel is collected next to convolution mechanism, and the nonlinear combination is performed. Likewise, the representation of a system and flexibility to distinct scales are enhanced and over-fitting could be eliminated. Fig. 3 depicts the overall structure of Inception. At the same time, Inception v3 is defined as a network structure deployed by Keras. The default image has an input size of 299*299 with 3 channels. The Inception v3 network structure applied in this study is depicted in Fig. 4.

Figure 6. Different sizes of Inception
On comparing with existing Inception models, Inception v3 architecture applies a convolution kernel splitting scheme for dividing massive integrals as tiny convolutions. For instance, a 3*3 convolution is divided as 3*1 and 1*3 convolutions. Under the application of splitting scheme, the parameter count could be limited; thus, the network training speed should be stimulated whereas spatial features are obtained significantly. Concurrently, Inception v3 optimizes the Inception network module by 3 various size area grids, as illustrated in Fig. 5.

Image Classification
At last, the extracted feature subsets are fed as input to the Gaussian Naïve Bayes and Logistic Regression models to perform the classification process.

GNB Model:
A Naive Bayes (NB) classification model measures the viability of the applied samples which belongs to a specific class (Zhang, H., 2016). Some of the instance X is defined by the corresponding feature vector (x 1 , … , x n ) as well as class target y , conditional probability P(y|X) is depicted as a combination of simple probabilities under the application of Naive independence assumption based on the Bayes' theorem: In this model, the target yis composed of 2 values in which y = 1shows presence of BTas well as y = 0 implies absence of BT. Next, X for single residue is defined as a feature vector with identical size which defines the features under the application of high-frequency modes produced by GNM. When 3 high-frequency modes are considered as u 1 , u 2 , and u 3 , where the vector X(u 1i , u 2i , u 3i ) for residue i is present in a protein sequence. Additionally, while the window size is 3 interms of residue, X is assumed as (u 1i−1 , u 1i , u 1i+1 , u 2i−1 , u 2i , u 2i+1 , u 3i−1 , u 3i , u 3i+1 ).
As P(X)is a constant for given function where the following rule has been applied for classifying the instance of unknown class: Where, "arg" refers a measure of y; which means that if P(y = 1) ∏ P i (x i |y = 1) is greater than P(y = 0) ∏ P i (x i |y = 0), ŷ = 1; else, ŷ = 0. Furthermore, if the likelihood of features (P(x i |y))are considered as a Gaussian, an NB classification model named as GNB. Because of the simplicity and robust processing when compared with alternate sophisticated models, GNB is employed extensively for prediction issues involved in bioinformatics. The central premises of GNB are to train the presented methods by using high-frequency modes for the purpose of identifying BT.

Logistic Regression Classifier:
LR is defined as a commonly employed classifier, which is used to predict a binary related parameter. The dependent variable ranges from 0 or 1 value. Hence, the conditional probability for dependent attribute is provided in the following: Where, β ′ X = β 0 + β 1 X 1 + ⋯ + β k X k , and k implies count of autonomous variables. This expression is represented as π(X)a S-Shaped function of independent parameters. Therefore, the probability distribution of dependent parameters is expressed as: The likelihood function is defined as a combination of above-mentioned probabilities and logarithm of possibility function is demonstrated as follows: The variables of LR are evaluated through maximizing logarithmic probability function. Additionally, nonlinear optimization models are employed in maximizing logarithmic likelihood function. Moreover, an issue in LR is deciding autonomous variables (Tunç, T., 2012). Consequently, step-wise, backward and forward selection methodologies were preferred in this study.

Performance Validation
In this section, the simulation result analysis of the presented model is discussed. The simulation takes place on GeForce 1050Ti 4GB, 16GB RAM, 250GB SSD, and 1TB HDD. The simulation tool used is Python -3.6.5 with different python packages namely tensorflow (GPU-CUDA Enabled), keras, numpy, pickle, matplotlib, sklearn, pillow, and OpenCV-python. The dataset involved, measures, and the results are discussed in the subsequent sections.

Dataset used:
In order to test the classifier results analysis, a benchmark MRI brain image dataset is utilized that comprises an entire of 147 tumor images. A set of 34 and 113 images comes under benign and malignant classes respectively. The image size varies between 630*630 and 192*192 pixels. Few of the sample benign and malignant class images are shown in Fig. 6.   Fig. 8. The input original image is depicted in Fig. 8a, the resultant preprocessed and segmented images are displayed in Figs. 8b and 8c respectively. The figure shows that the presented model effectively preprocesses and identifies the tumor regions properly.
The confusion matrices produced by the different sets of proposed models are shown in Table 1. exhibits that the SIFT-LR model has effectively classified no images as benign and 113 images as malignant. Similarly the SIFT-GNB model has proficiently classified a total of 18 images as benign and 101 images as malignant. Followed by, the near optimal results of the DLIM-GNB model by classifying a total of 34 images as benign and 109 images as malignant. At last, DLIM-LR model has resulted to the classification of 31 images as benign and 113 images as malignant. Table 1 and Fig. 9 summarize the classifier results analysis of the four proposed models in terms of distinct evaluation parameters. On looking into the table, it is observed that the SIFT-LR model has led to a least specificity of 76.83%, precision value of 87.44%, f-score of 54.21% and sensitivity of 74.78%. In addition, the SIFT-GNB model has surpassed the SIFT-LR model with the certainly higher sensitivity of 60%, specificity of 86.32%, accuracy of 80.95%, precision of 52.23%, and accuracy of 53.25%. Though the DLIM-GNB model has exhibited satisfactory classification outcome with a high sensitivity of 89.47%, specificity of 100%, accuracy of 97.28%, precision of 100%, and F-score of 94.44%. But the DLIM-LR model has shown proficient performance with the maximum sensitivity of 100%, specificity of 97.41%, accuracy of 97.96%, precision of 91.18%, and Fscore of 95.38%.    Table 2 and Figs. 10-11 implies the comparative results analysis of the DLIM-LR method with previous approaches (Toğaçar, M., 2020; Çinar, A, 2020; Gupta, T., 2017; Selvapandian, A. 2018; Gupta, S., 2020) by means of various metrics. Fig. 10 examines the classifier results analysis of the DLIM-LR approach with respect to sensitivity, specificity, and accuracy. The experimental outcome implies that the proposed DLIM model has shown sensitivity of 100%, specificity of 97.41%, accuracy of 97.96%, f -score of 95.38% and precision of 97.91% which is much greater performance against the existing models.