A Review on Feature Selection Techniques in Digital Mammograms

: The most of the women in the world are suffering from a deadly disease called Breast Cancer (BC). Breast cancer is analyzed by using imaging modalities such as mammograms, magnetic resonance imaging, ultrasound, and thermograms. Among all, mammograms are the low dosage, less cost, more effective, and accurate method to detect BC in early stages. There are many Computer-Aided Detection (CAD) systems for the automatic detection of masses in mammograms. These techniques are helping radiologists and physicians in diagnosing disease. The objective of this paper is to overview different CAD systems in which mainly we focused on feature selection, as feature selection techniques are used to reduce the complexity of the classifiers and also increase the accuracy. We conclude that suitable optimization techniques should be chosen to increase the accuracy of the classifier so that we can increase the survival rate of the patient.

, the authors designed a model for finding breast masses based on the root mean square roughness is only the feature considered to describe the irregular degree of one dimensional signature. It is not possible to select one technique as the best feature extraction for finding breast tissue (Daniel O.Tambasco Bruno et.al, 2016). Local Binary Patterns with curvelet transformation features are extracted from mammograms to describe the breast tissues.
An efficient method for mass segmentation and classification is achieved by combining shape, texture, and intensity features(Dong M, Lu X, Ma Y, Guo Y, Ma Y & Wang K,2015). The features extracted are mean, standard deviation, smoothness, skewness, uniformity, entropy, kurtosis, pixel value fluctuation, and conspicuity and achieved good accuracy for random forest (BN Jagadesh1& L Kanya Kumari, 2021) compared to state-of-art methods. A CAD was designed to find abnormal breasts using weighted type Fourier transform to achieve the unified time-frequency spectrum. Good classification accuracy was achieved for SVM (

Feature Selection Techniques
Feature selection is an important task for analyzing the data to predict or classify the label for an image. Optimization algorithms use specific parameters with common parameters for evaluation. These parameters are playing an important role in selecting features and also in the performance of the classifiers. To design and develop an efficient classifier, feature selection techniques must be used to reduce both time and space complexities for the mammogram classification.
The main goal of this step is to remove unnecessary or irrelevant features from extracted feature vector (Shankar Thawkar & Ranjana Ingolikar ,2020 In recent days, the challenging task in the research area is feature selection and classification of breast cancer. In this, Biogeography based optimization was used to select the features from the DDSM dataset, and ANN and Adaptive Neuro-Fuzzy Inference System (Sri Hari  were used as the fitness function. The sensitivity, specificity, and Area Under the Curve(AUC) are 99.10%, 98.72%, and 0.99 respectively for Biogeography Based Adaptive Neuro-Fuzzy Inference System (BBO-ANFIS).
To classify the clusters of microcalcifications in the DDSM dataset, the authors (Khehra, B.S., Pharwaha, A.P.S, 2017) have used texture Fourier domain, shape and wavelet domain-based features were extracted. Totally 50 features were extracted. Optimal features were selected by using Genetic Algorithms (GA), Particle Swarm Optimization (PSO),and Biogeography Based Optimization (BBO). GA-based SVM, PSO based SVM, and BBO based SVM were used as classifiers. The accuracy was measured using random trails and crossvalidation. They have concluded that PSO and BBO based feature selection techniques were better than GAbased feature selection. ). The evaluation metric used was 10-fold cross-validation. The authors concluded that their proposed method helps the radiologist and has the advantage of an increase in the survival rate of the patient with early detection of breast cancer.
To classify the clusters of microcalcifications in the DDSM dataset, the authors (Khehra, B.S., Pharwaha, A.P.S, 2017) have used texture Fourier domain, shape and wavelet domain-based features were extracted. Totally 50 features were extracted. Optimal features were selected by using Genetic Algorithms (GA), Particle Swarm Optimization (PSO), and Biogeography Based Optimization (BBO). GA based SVM, PSO based SVM, and BBO based SVM were used as classifiers. The accuracy was measured using random trails and crossvalidation. They have concluded that PSO and BBO based feature selection techniques were better than GAbased feature selection. To reduce the dimensions of the feature vector, PCA was used and Forest Optimization Algorithm (FOA) was used to select the optimal features. Their methodology gained good accuracy for SVM, KNN, and C4.5 classifiers to classify the mammograms into benign or malignant and normal or abnormal.
A CAD system (Wang S, Rao RV, Chen P, Zhang Y, Liu A, Wei L ,2017) was designed using Weightedtype Fractional Fourier Transformation (WFRFT) to extract features from the mini-MIAS dataset. PCA was used to reduce the features. Jaya algorithm-based Feed Forward Neural Network (FFNN) was used to train the classifiers and compared the results with state-of-art classifiers. The authors concluded that the proposed algorithm was superior to the state of art methods.
A CAD system was constructed to classify mammograms based on contourlet features and optimal features were selected by using FOA from MIAS and DDSM benchmark datasets. The classifiers used were SVM, Naïve Bayes (NB), C4.5 and KNN. The classification accuracies obtained for SVM, KNN and C4.5 were 100% except NB to classify into normal or abnormal. Similarly, the authors achieved a maximum 98.74% for C4.5 to classify into normal or abnormal.
The authors(Shankar Thawkar, 2020) have chosen the optimal features by using Teaching Learning Based Optimization (TLBO) technique from the Wisconsin Diagnostic Breast Cancer dataset (WDBC). The performance is evaluated by using Discriminant Analysis, Naïve Bayes, Decision trees, Support Vector Machines (SVM), and KNN. SVM has given better results than other classifiers.
A new model (M. N. Sudha & S. Selvarajan,2016) was designed to classify mammograms. They extracted texture, intensity histogram, radial distance, and shape features. The optimal features were selected by using Enhanced Cuckoo Search (ECS). The performance was evaluated using k-fold cross validation for minimum distance classifier, KNN classifier, and achieved 98.75% and 99.13% accuracy respectively.
The authors ((https://sites.google.com/site/tlborao/) used shape, texture and intensity based features to extract the features from the DDSM dataset. Total of 25 features were extracted. From these 25 features, 11 features were selected by applying the genetic ensemble method. The parameters used were: number of iterations= 50, population sizes considered were 10, 20 and 30, crossover probability = 0.9 and mutation probability = 0.1. The experiment was done by using Adaboost, random forest, and decision tree to classify the masses in digital mammograms. The authors concluded that the Adaboost classifier has given better accuracy if optimal features were selected and RF is better if all the features were considered. The Misclassification rate of Adaboost, RF and decision tree was 3.85, 4.92 and 14.6 respectively. GLCM with Genetical Swarm Optimization (GSO) was designed to classify the mini-MIAS mammogram images into normal or abnormal. The classifier used was SVM to measure the performance of the proposed method. They have compared the results with GA-SVM and PSO-SVM and concluded that GSO-SVM has given good performance than compared GA and PSO (Jona J & Nagaveni N ,2012).
The patterns from segmented mammograms were classified using SVM and General Regression Neural Networks (GRNN) and obtained AUC as 0.98 and 0.9780 respectively (Fu J, Lee S, Wong S, Yeh J, Wang A & Wu H ,2005). To achieve this, the authors have used Sequential Forward Search (SFS) to select the features. The authors (Dheeba J, Singh NA & Selvi ST,2014) evaluated a model for diagnosing breast cancer using Particle Swarm Optimized Wavelet Neural Networks (PSOWNN). This algorithm extracted the features using laws texture energy measures from mammograms collected from screening centres. The performance was measured using AUC and also calculated sensitivity and specificity. Below table 3.1 gives a summary of different papers and limitations/ drawbacks they may possess that may have arisen in their methodology.
The authors (B. Bektaş, İ. E. Emre, E. Kartal & S. Gulsecen,2018)classified the mini-MIAS mammogram database into benign or malignant by applying Gaussian , median, wiener filters and increased the contrast of images using the CALHE technique. The features are extracted using GLCM and Linear Binary Pattern (LBP). The best features are selected using correlation. The classifiers applied are NB, CART and RF. They concluded that CLAHE+GLCM+CORR+NB give better results. The above literature provides information about breast cancer in the patient which can help to overcome certain limitations. From the above study, it was observed that most of the researchers were used meta-heuristic techniques to optimize the features to get better classification results. They have used BBO,GA,PSO, TLBO,FOA,PCA, and t-test on shape, texture and intensity-based features were extracted. In some literature, the authors have used to apply optimization techniques for feature selection to increase the classification accuracy. This paper gives information about the existing methods and also very much helpful to the researchers in the following aspects like: choosing of optimal feature selection technique for other modalities, combining different feature selection techniques to design a hybrid approach and selection of efficient feature selection technique based on the features extracted for better classification.

Conclusion
Early detection of BC can be done by finding the cancer cells in the breast. To achieve this, the general steps followed in every CAD system are image preprocessing, feature extraction, feature selection and classification. Different researchers have used different techniques for early detection.. This paper reviewed several feature selection algorithms like GA, PSO, PCA, FOA, TLBO and some authors have applied hybrid approaches too for achieving good classification accuracy. So, keeping all these observations, we can conclude that the classification accuracy is depending on choosing a suitable feature selection technique which helps the radiologists and physicians to detect the tumors so that the survival rate of the patient can be increased. We too extracted features from MIAS dataset based on texture, shape and intensity. From the literature, it is observed that meta-heuristic techniques plays a vital role in feature selection to classify mammogram images as they can improve the accuracy of the classifiers.