Feature Extraction Of Ultrasound Prostate Image Using Modified Vgg-19 Transfer Learning

Feature extraction plays a vital role in classification, clustering, diagnosis, and recognition. For the classification process, feature extraction is very important to embody the content of images accurately. The extraction of features from Transrectal Ultrasound (TRUS) is complex as it contains speckle noise, low dissimilarity, the fuzzy region between the object and background. To solve this problem, Ultrasound (US) prostate images are preprocessed and segmented by the Ant Colony Optimization-Boundary Complete Recurrent Neural Network (ACO-BCRNN) method. To extract the relevant features many techniques are applied by the researchers. Recently, transfer learning methods are used for feature extraction. Transfer learning is a machine learning technique in which a trained model in one problem is used in the development of another related problem. Transfer learning consists of various pre-trained models such as VGG16 (Visual Geometry Group), VGG19, Resnet50, InceptionV3, Xception, MobileNetV2, DenseNet, and ResNetV2. The modified VGG-19 model is proposed to extract the features of the prostate ultrasound image. The proposed method was compared with other pre-trained models and its performance is evaluated by using Support Vector Machine (SVM), Extreme Learning Machine (ELM), and Grid Search. The experimental results demonstrated that the pre-trained model VGG19 is superior to other models in terms of Precision, Recall, Accuracy


INTRODUCTION
Prostate Cancer is the second leading common disease of men in the United States.Prostate cancer is more likely to develop in older men with an average age at diagnosis is about 66.The American cancer society estimated about 1,91,930 new cases of prostate cancer in 2020 with a fatality rate of 17.3% [1].Prostate cancer is produced by the continuous deposition of protein in the prostate gland.Ultrasound (US) is the most widely used technique for prostate biopsy.Usually, the symptoms of prostate cancer can be identified only by the extension of the prostate gland.Prostate cancer is treatable if it is diagnosed early.It becomes life-threatening if the tumor has extended beyond the prostate gland [2].The computer-based automated image analysis method involves various steps such as image enhancement, segmentation, feature extraction, selection, and classification [3].To distinguish between cancer and normal image, it is important to explore the feature, or set of features to obtain accurate quantifications of the characteristics of tumors [4].Since, each image has its unique features such as shape, color, texture, edges of the image, etc. [5], the extraction and selection of effective features play a vital role in classification [6].The feature extraction process acquires essential as well as relevant information to achieve the highest accuracy.
The traditional quantitative techniques such as GLCM, GLRLM, GLDM, Histogram, Shape, etc, are applied to extract features from prostate US images [7,8].In the past decade, Artificial Neural Network (ANN), used to extract features has shown good classification performance with Machine Learning-based classifiers [9,10,11,12].The Artificial Neural Network (ANN), consists of several inputs, multiple hidden and output layers.The primary drawback associated with ANN is that for every interconnected layer, the number of weights is increased which in turn affects the learning rate.For processing images, the arrangement of several filters on the input layer which activates high-level features is called Convolutional Neural Network (CNN) [13].Currently, it is considered an effective approach for image recognition and classification [14].The CNN, used for large-scale image classification achieved superior performance with image databases such as ImageNet [15,16].Girshick et al. [17] applied supervised pre-training for large data set with tuning in a specific domain.Donahue et al. applied a pre-trained CNN, to extract the feature and used it for various image classification techniques [18].Based on the transfer learning, pre-trained CNNs are utilized by using previously learned knowledge for acquiring high accuracy [19].In our approach, eight existing pre-trained CNN models are used on ImageNet to extract the features from prostate tumors [20].
By applying the Transfer Learning method, the system can utilize the knowledge learned from a previous errand to a new errand [21].ImageNet is the largest image database in which the images are arranged 7598 according to the WordNet hierarchy [22].Currently, the database consists of around 14 million images of 1000 object categories.Several CNN architectures that are already trained on the subset of the ImageNet database are utilized on the small dataset available to train a CNN that can have millions of weights to learn.While using CNN as a feature extractor, the last fully connected layer; the output layer has been removed.The extracted feature values contain raw values which are then transformed by a ReLU (Rectified Linear Unit).Both preReLU features (deep features) and postReLU features are obtained from the last hidden layer.The extracted deep features are useful for training and classification.
The CNN is a kind of multilayer feed-forward neural network that contains several layers that can be classified into 3 categories.The convolutional layer is utilized to compute the output for the connected input, the max-pooling layer is used for subsampling the inputs and fully connected layers are used to compute class activation [ 23,24,25].In a CNN, an input image can be applied in the format h w c , where h is the height, w is the width, and c is the number of channels (3 for RGB and 2 for binary).There could be k convolutional filters of size nn  in the convolutional layer, where nh  .In this article, the pre-trained CNN architectures used to extract the feature from the prostate image are VGG16, VGG19, ResNet50, InceptionV3, Xception, MobileNet, DenseNet, and ResNetV2.
The rest of the paper is organized as follows: Section 2 depicts the various transfer learning models for feature extraction.The comparative experimental analysis using various evaluation metrics is presented in Section 3 and the conclusion is drawn in Section 4 along with future directions.

TRANSFER LEARNING
Generally, Transfer Learning is a process of utilizing a model trained in one problem to solve another related problem.In this method, a neural network is used for training and is applied to a similar problem.Resnet50: Residual Network which consists of 50 layers is named ResNet50.The input size of the image is 224 x 224.Initially, convolution with the kernel size of 7x7 and 64 different kernels with a stride size of 2, gives 1 layer as output.It is max-pooled with a stride size of 2. In the next convolution, 1x1 64 kernels are followed by 3x3 64 kernels.
InceptionV3: An Inception V3 model has been introduced by Szegedy et al. in 2014 [27] with new factorization ideas.The concept of factorization is to reduce the number of parameters without decreasing the network efficiency.
Xception: Xception model was proposed by Francois Chollet in 2017 [28].The term Xception stands for Extreme Inception.This model has 36 convolutional layers, which are structured into 14 modules, all of which have linear residual connections except for the first and last modules.
MobileNetV2: Andrew G.Howard et.al [29] introduced the MobileNet model in 2017.This model uses depthwise separable convolutions and a single filter is applied to each input channel in MobileNetV2.To combine the outputs of the depthwise convolution, a 1x1 convolution is applied.In standard convolutions filtering and combining processes are the inputs to the next layer.However, in depthwise separable convolution, a separate layer is used for filtering and combining.This factorization reduces computational time as well as the model size.
DenseNet: DenseNet is a logical extension of ResNet, introduced by Gao Huang et al. in 2016 [30].In this model, each layer connects to every other layer in a feed-forward fashion.Moreover, the layers are very narrow with a small set of feature maps to the collective knowledge of the network while the remaining feature maps remain unchanged.Based on all feature maps, the final classifier makes a decision.
ResNet152: Residual Network consists of 152 layers is named ResNet152.The input size of the image is 224 x 224.In this model, batch normalization and the ReLU activation function are applied before the multiplication of convolution operation (weight matrix).Then, the value is given as input to the next block for further processing.In this work, the modified VGG-19 pre-trained models are implemented to extract the features from the prostate image.

The Basic Architecture of VGG19 Model
The VGG19 is a successor of the VGG16 model which contains 19 layers.The input size of the image and architecture is the same as that of the VGG16 model with few exceptions.The VGG19 has four convolution layers from the third block to the fifth block.The output size and activation functions are the same.The architecture model of VGG19 is shown in Figure -1.The increased number of convolution layers leads to greater accuracy and efficient performance.

The Proposed Model
The modified architecture of the VGG-19 model is shown in Fig. 2. The proposed model is similar to the original VGG-19, modification in the final block instead of max pooling, global average pooling is used.Modified VGG-19 model consists of 16 convolutional layers, 4 max-pooling layers, 1 global average pooling layer, 2 fully connected layers, and an output layer.The model uses 512, 128, and 64 neurons on its hidden dense layers.Each dense layer is followed by a dropout.Dropout is used to reduce the capability of the model while training and guides against over fitting.ReLu activation functions are used in dense layers and the sigmoid activation function is used for classification.The training procedure uses learning rate decay by multiplying the previous learning rate with 0.5 for every four epochs having constant validation accuracy.Finally, the feature vector is collected from the fully connected layer 1.

EXPERIMENTAL RESULTS
A database consists of 4500 segmented prostate ultrasound US images are used to access the efficiency of the proposed methods and 70% of the dataset is used for training, the remaining 30% for testing.The various transfer learning models such as VGG16, modified VGG19, ResNet, Inception, Xception, MobileNet, DenseNet, and ResNetV2 are implemented in Python 3.7 for extraction of features from the prostate image.The performance of each model is evaluated by using classifiers such as Support Vector Machine (SVM), Grid Search, Extreme Learning Machine (ELM), and F1 Score.

Evaluation parameters
The Precision, Recall, Accuracy, and F1 Score are used as metrics for accessing the performance of the proposed methods.Precision is a measure of quality, it is the fraction of relevant instances among the retrieved instances.The recall is the fraction of the total amount of relevant instances, which are retrieved.The formula for the statistical parameters is given in Table 1.

+ +++
The overall performance measures of the feature extraction using pre-trained models are depicted in Table 2. From the table, it is observed that VGG16, VGG19, and Xception models acquired better performance than other models.While considering the ELM accuracy, VGG16 and VGG19 models provide a better result of 99.76%.
Table 3 shows the accuracy obtained by the pre-trained models.The graphical representation is depicted in Fig. 3.The accuracy produced by the pre-trained models is shown in Fig. 3. VGG16 and VGG19 models obtained an accuracy of 99% using ELM, SVM, and Grid Search methods.Resnet50 and InceptionV3 models obtained an accuracy of 98%, which is almost 1% lesser than the VGG16 and VGG19 models.The accuracy achieved by the DenseNet model using ELM, SVM, and Grid Search is 43%, 67%, and 94% respectively.ResNetV2 model obtained an accuracy of just 20% using all the methods which makes it not suitable for feature extraction.

Fig. 3: Accuracy Results
The quantitative performance of the Precision and pictorial representation is shown in Table 4 and Fig. 4 respectively.

Fig. 4: Precision Results
From Table 4, it is observed that the precision obtained by the modified VGG19 model using SVM, Grid Search, and ELM are 99%, 100%, and 99% respectively which are comparatively 1% higher than the VGG16 and ResNet50 models.The Xception model achieved 83%,92%, and 59% using SVM, Grid Search, and ELM respectively.However, the ResNetV2 model acquired a very low precision of 24%.
The recall results of the proposed methods are shown in Table 5 and depicted in Fig. 5. From the table, it is noticed that the recall values of the modified VGG19 model obtained through SVM, Grid search, and ELM are 99%, 100%, and 98% respectively.The VGG16 model achieved a 100% recall rate through search and 98% through SVM which is 1% lesser than that of VGG19.The Resnet50, Inception_V3, and MobileNet model achieved 99% recall using Grid search and the recall rate using SVM is poor.ResNetV2 model does not seem to be appropriate since it achieved only a 20 % recall value.The F1-score performance of the proposed methods is given in Table 6.The comparative performances are also depicted in the graph in Fig. 6.From the results given in Table 6, it is evident that the modified VGG19 model achieved superior performance to other models.The model acquired an F1 score of 100%, 99%, and 98% using Gridsearch, SVM, and ELM respectively.The VGG16 and Resnet50 models achieved 100%, 98%, and 98% using Gridsearch, SVM, and ELM respectively.The Inception_V3 and MobileNet model obtained an F1 score of almost 4% and 2% less than that of VGG19 using SVM and Gridsearch respectively.The ResnetV2 model achieved highly inferior poor performance compared to other models.
The comparative performances of the proposed models in terms of Accuracy, Precision, Recall, and F1-Score indicate that the modified VGG19 is the most appropriate model for extraction of features from prostate ultrasound images.

CONCLUSION
Feature extraction methods play a significant role to extract the features from the segmented image.However, choosing a suitable algorithm is not a minor task.Many researchers applied various methods to extract the features, the appropriate method is not identified for the prostate dataset.And, to improve the efficiency of the result, the pre-trained models of transfer learning are used for feature extraction of the prostate image in the proposed method.Initially, the TRUS prostate image has been enhanced with the ACO and segmented by ACO-BCRNN methods.The informative features from normal, benign, and malignant TRUS prostate images are extracted by various pre-trained models such as VGG16, modified VGG19, ResNet50, Inception_V3, Xception, MobileNet, Densenet, and RenseNetV2.Finally, The proposed method is evaluated using classifiers like SVM, Grid Search, and ELM.The experimental results reveal that the feature extraction based on the modified VGG-19 method attained better accuracy compared to other methods.The modified VGG19 model attained superior performance over all other models with an accuracy of 99% and an F1-Score of 100% and it is effective and appropriate for the extraction of features in a computer-aided diagnosis(CAD) system for clinics.In future works, the proposed algorithm is applied for signal processing, galaxy image classification, and video processing.
From the trained model, one or more layers are used in the new model.The benefits of transfer learning are to reduce the training time and also decrease the error rate.The weights of the reused layers are used as an initial point at the training process and applied to a new problem.Several high-performing models have been developed and used on Imagenet large-scale visual recognition challenge (ILSVRC) for image classification.Thus the challenge is referred to as ImageNet and it gives the source of the image and weights.It has resulted from several innovations in the development of architecture and training of convolutional neural networks.The pre-trained model can be used for classification, stand-alone or integrated feature extraction, and weight initialization.It has three popular models namely VGG(VGG16, VGG19), GoogLeNet (InceptionV3), and Residual Network (ResNet50).A brief description of the various models are hereunder VGG-16: Karen Simonyan and Andrew Zisserman introduced VGG16 architecture in 2014 [26].VGG is a successor of AlexNet.The 16 in VGG16 referred to the 16 layers with weights.It has been created with a 16-layer network consists of convolutional and fully connected layers.It has 13 convolutional layers, 2 fully connected layers, and 1 Softmax classifier layer.The default size of the image input of this model is 224 x 224.