Computer Aided Diagnosis of ASD based on EEG using RELIEFF and Supervised Learning Algorithm

Article History:Received:11 november 2020; Accepted: 27 December 2020; Published online: 05 April 2021 ______________________________________________________________________________________________ Abstract : Autism Spectrum Disorder is diagnosed by physical examination of electroencephalography (EEG) signals that is very responsive to time consuming and bias. Diagnosing autism in existing research experiences low power and unsuitability for processing extensive datasets. An automated diagnosing is an essential assist to medical professionals to eliminate the problems mentioned above. In this article, a novel technique is propounded to diagnose autism from VMD, RELIEFF and supervised learning algorithms. A universal EEG dataset is adopted to explore the proposed method’s performance. The technique starts with the extraction of features from EEG signals via VMD, and to recognize the best features RELIEFF is employed. Then, to distinguish typical and autism signals, supervised learning (KNN, SVM, and ANN) methods is employed. The outcome illustrates that the proposed technique attains high accuracy, indicating a powerful way to diagnose and categorize autism.


Introduction
Autism spectrum disorder (ASD) is a predominant neurocognitive condition specified by communication and social interaction deficits, along with confined repeated behavioral patterns, preferences and interests [1]. All over the world ∼1.5% in developed countries, this state has a particular economic and social effect because of its high prevalence [2]. It has a major impact on daily family life and related to increased morbidity. The Brain-Computer Interfaces (BCI) based on EEG incorporate extensively analyzed communication technics [3]. Nonetheless, concerning the psychological approach in neuro-rehabilitation, the utilization of children with consideration deficiency hyperactivity issue that incorporates the existence of continuous absentminded, incautious, and hyperexcitable practices. A computer-aided detection (CADe) is a set up to support a doctor or a clinician in diagnosing a specific disease or disorder. A CAD system isn't purposive to detect by itself, even so as an aiding tool for the clinician to diagnose, saving them time, increasing accuracy, and providing a second opinion.
A virtual entityP300-based BCI standard with specialized execution is depicted. Intuitive virtual conditions with the consideration associated with P300 brain waveform nature to make a subjective preparing apparatus for ASD. The P300 signal is a renowned neurological sign of observation procedure for diagnosing unique objects in a stimulus sequence [4]. The preparation of joint consideration aptitudes is coupled to the P300 signal because this latter is generally utilized in controlled observational investigations and is identified with a combination of data with context and memory [5].

Fig. 1. Placement of Electrodes of an EEG
The electrical signal of an encephalon has a minimum amplitude of 100µV. The frequency is within 0.44 Hz and 80 Hz typically. The frequency ranges and amplitude for each type of wave is shown in Table 1.

Table I. Amplitude and Frequency Ranges of wave's
The residue of the article as follows. Existing work review of literature is demonstrated in Section II. Detailed information over the techniques employed (i.e., RELIEF, VMD, KNN, SVM and ANN) is in Section III. The performance parameters of the features extracted from each classifier concerning their confusion plot and the attained experimental results were analyzed in section IV. Section V finally concludes the state of the propounded method.

II.
Related Work An approach by Kleih et al., [6], The Brain-Computer Interfaces (BCI) based on EEG, incorporate extensively analyzed communication technics.
Mythili & Mohamed Shanavas et al., [7] examined and propounded the best ideal and optimum features to overcome the learning obstacles and to speed up the learning capability. The Principal Component Analysis operates as an evaluator for the Feature Extortion. PSO was deployed for a course of Feature Selection. The PSO appraises the parameter optimization to attain utmost effectual and finest features. The resulting best features were supplied to the SVM classifier. An ultimate outcome evinced that the propounded technique acquires higher accurateness.
Bosl, Tierney, Nelson et al., [8] underlying neurobiological disorders that consistently differentiate autistic and non-autistic brains. The multiclass SVM, NB and KNN algorithms have been implemented to analyze the autistic and typical signals and achieved 80% accuracy.
Ahmadlou et al. [9] examined the fractal dimension (FD) to estimate intricacy and dynamical transitions in the autistic brain. A radial basis function classifier and achieved 90% accuracy.
In work done by Sheikhani et al. [10], A short time Fourier transform (STFT) method employed for extracting features from EEG signals and then given to KNN classifier as input and achieved 82.4% of accuracy. Their next article enhanced the process, employed more extensive data for trial, and achieved up to 96.4% accuracy.
Abdulhay et al. [11] studied frequency 3D mapping and inter-channel stability of EEG signals to analyze the capability for recognizing abnormalities in EEG signals and their relation with ASD. The research originates that, for autism analyzing the order of frequency content and the inter-channel durability of pulsation plot all over the scalp were good measures.
Ridha Djemal et al. [12] proffered computer-aided detection (CADe) of autism grounded upon EEG signal analysis. The propounded method was grounded on entropy (En), artificial NN (ANN) and discrete-wavelet transform (DWT). DWT was implemented to decompose EEG signals to approximate and delineate the coefficients to attain EEG sub-bands. The FV was built by evaluating Shannon En values as of every EEG sub-band. ANN categorizes the equivalent EEG signal to autistic or normal grounded on the extorted features. The experiential outcomes evinced the efficacies of the propounded methodology for aiding the diagnosing of autism. A receiver operating characteristic curve (ROC) metric was deployed to gauge the propounded technique's performance.

III.
Proposed Work The article aims to develop an algorithm based on EEG signal processing to detect autism. The proposed method employed RELIEFF, VMD and supervised learning algorithms. For used techniques, a concise description and fundamental mathematical formulation given in the following section.

A. Variational Mode Decomposition
Variational mode decomposition (VMD) a novel adaptative signal fragmentation, it fragments each real-time signal to variational modes (uk) or a band limited functions. For the reconstructing an input signal, each method transpired concurrently and exhibited sparsity property. VMD fragments real-time signals into k modes (uk) surrounding its center frequency ( ). Frequency shifting property and Hilbert transform are beneficial variables in formulation and optimization of a problem. The constrained variational problem formulation is [13],  In this work, by making use of four-level variational decomposition method, 200 signals were decomposed into 800 signals, in which 70% (560) of the signals are used for training and remaining 30% (240) of the signals are used for testing. Each EEG signal is fragmented into four sub bands consisting of 1024 samples, from each fragmented part, 11 features (8-statistical, 3-spectral) are extracted. In total, 44 features are extracted from each EEG signal. By observing the extracted feature table, normal and autism signal features have the considerable variations. The highest variation is observed in IQR and standard deviation.

B. K-Nearest Neighbor Classifier (KNN)
Among the non-parametric approaches used for classification of electrophysiological signals, KNN is one. The input comprises K closest training samples (data points), and the output is a class member in the classification problem. A sample will be classified with most of the neighbours and allocated to most common classes among K-nearest neighbours. For classification in KNN class membership is the output. To achieve the classification results testing and training datasets of autistic EEG are employed to K-nearest neighbours. The processing function used is spearman distance [14].
The KNN is simple among all machine learning algorithms and depends on instance-based learning. It is a lazy classifier as its function approximated locally and the calculations are postponed until classification. A drawback occurs with the skewed class distribution, i.e., the most frequent class can dominate the prediction of the new data point. This drawback can be bypassed by limiting the impact of distance from the test data point to each of its k nearest neighbours. Assign a weight to each vote is one way to reduce the effect where weight is a function of the distance between known and unknown data points. If the weight is defined as the inverse squared The test data point represented with a circle and the data points of different classes served with squares and triangles. The solid line circle is the case for K=3 where the test data point is at the center and encloses only three data points on the plane. So the test point will be assigned to the class of triangles as there are two triangles and one square. Dashed line circle is another case (K=5) where it assigns the test point to the class of squares. The decision boundary becomes smoother with increasing K value.

Spearman Distance
The distance between the data vectors xs and yt are defined as

C. Artificial Neural Network (ANN):
ANNs are motivated with biological neural networks, i.e., animal central nervous systems especially brain. These are used to approximate or estimate functions that can depend on a high number of unknown inputs. Different connections have different numeric weights, which can be turned based on experience, makes the ANNs capable of learning and are more adaptive to inputs. The set input neurons get activated by the input data. The output neuron determines the target class to which the data belongs to. For testing the performance, using ANN for categorization of autism EEG dataset is used [15].
Types of ANNs are differ from those of with a single layer or two layers of one direction logic to multiple many-input layers and multi-directional feedback loops. The networks use some algorithms in programming to control and organize their activation functions. Most of the networks use weights for changing the parameters of throughput and different connections. ANNs can learn through external inputs or can perform self-learning. The abilities of self-learning and decision making make the ANNs suitable for a broad category of problems that may include a large amount of data.
In feed-forward neural network, the connections between the network units cannot form a directed cycle (no loops presented). The information moves only in a forward direction from input nodes to output nodes through some hidden nodes.
The inputs of the network contain some processing functions that convert user data into a form, which is suitable for a network. The outputs also possess corresponding processing functions that are utilized to convert user-provided target vectors. The network outputs are backward processed with similar functions to generate data (output) with comparable characteristics of original targets. The input and output processing functions used in the artificial neural networks. The processing function used is sigmoid function.

Sigmoid Function
Sigmoid function corresponds to the shape of "S" (sigmoid curve) and it is a mathematical function which belongs to a special incident of the logistic function. It can be expressed as ( ) = 1 1+ − (3) A sigmoid function has a positive derivative for all real input values that is defined, it is a bounded differentiable real function.

D. Support Vector Machines (SVM)
The classifier is given to above features for Categorization of typical EEG signal. The SVM is developed primarily for two-class classification and can be extended for multi-class optimization problems. The basic approach is to locate a hyperplane which separates the data correctly into two classes. Each object of the training set (set of known objects) contains a feature vector and its corresponding class value. Basis of the training data, the algorithm obtains a decision function for classifying the unknown data. The classifier can reduce the experimental classification error by maximizing geometric margin. Thus, it can be characterized as a maximum margin classifier also. The maximum margin classifier can give better results than other traditional classifiers.
A classifier with a linear decision boundary is called as a linear classifier. The main intention is to attain a decision boundary which can separate the training data. If the separation is not possible with a linear hyperplane, the classifier maps the data into high dimensional feature space using some pre-defined functions (kernel).
The selection of kernel is an important issue in support vector machine classifier. The kernels introduce several nonlinearities into a classification problem by mapping data X, implicitly to Hilbert space through a function φ(X). The SVM classifier requires only inner products of φ(X) features, though mapping to Hilbert space or the explicit computation of features may be cumbersome. Kernel functions applied to map data nonlinearly to a high dimensional feature space and this mapping is linearly separable. The mapping performed by replacing the inner product (x,y) with Φ(x). Φ(y) and the kernel function is K(x,y)= Φ(x).Φ(y).
Decision function in two class problem is expressed as g x sign[w T f x b] (4) Optimization problem is given as Here xi is N input with i th feature vectors, and yi is the class label of 1 or -1 for xi. γ is the parameter of regularization, αi is a Lagrangian multiplier and b is the bias term. its SVM classifier output derived as SVM classifier needs kernel for training. The Gaussian RBF kernel is the efficient one. RBF kernel is expressed as ( , ) = ( ) ( ) = (−‖ − ‖ 2 /2 2 ) (8) Parameter σ is an optimization kernel width [16]. E. RELIEFF RELIEFF algorithm is widely used to filtrate feature selection in a very effectual method. Distinct and numeric features with binary classification problems are associated with its high-end approach. RELIEFF algorithm use heuristic rule for inductive learning algorithms whereas, Inductive machine learning use greedy search.
The fundamental purpose of RELIEFF is to evaluate attributes to explain how their values differ among those samples which are nearer to each other. For a prescribed instance, RELIEFF seeks for its two nearby neighbors: out of this one from the same class is denominated as the nearest hit (H) and the other from the dissimilar class is denominated as the nearest miss (M). The actual RELIEFF algorithm [17] randomly chooses n training occurrences, where n is the user-defined parameter.
Algorithm is as follows.

Step-6: W[A] = W[A]diff (A, R, H)/n
Step-7: + diff (A, R, M)/n; Step-8: end; Step-9: end; The estimation of the quality of attributes is denoted by weights W[A]. The basic formulation for upgrading the weights is that a useful feature should have the same value of instances of the same classes (subtracting the difference diff (A; R; H)) and distinct values of instances of the distinct classes (adding the difference diff (A; R; M)).
The difference between the values of an attribute for two instances is calculated by the function diff (Attribute, Instance1, Instance 2). For distinct attributes, the difference is either 1 (different) or 0 (equal) and the actual difference normalized to the interval [0; 1] is the difference of continual attributes. Assigning n guarantees all weights W[A] to be in the interval [-1; 1], however, assigning n is not essential if W[A] is to be used for relative differentiation between the attributes.
The entire distance is just the sum of variations of all characteristics. Actual RELIEFF utilizes the squared difference, where the discrete attributes are equal to diff. In all the observations, there is no vital difference among the results using diff or squared difference. If N is the number of all training instances, then the intricacy of the preceding algorithm is O (n x N x #all_attributes).

IV.
Results and Discussion In this work, an effective technique comes up with ith RELIEFF, VMD and supervised learning algorithms is evolved and imposed to classify autism. The propounded classification algorithms of autistic signals are enacted and simulation was done in MATLAB, and this section presents the simulated results. The EEG dataset of autism adopted from Kaggle database [18]. The channels distribution in the data patterns is C3, Cz, C4, CPz, P3, Pz, P4, POz.   Table 5 illustrates the simulation results of the classification algorithm of autism along with SVM and RELIEFF. In accordance with the attained confusion matrix, classification algorithm based on SVM classifier accomplishes an overall sensitivity 97.50%, overall accuracy 95.41%, overall specificity 93.33%, overall G_mean of 95.39%, overall F_measure 95.51% and overall precision 93.60%. V. Conclusion A computer-aided detection (CADe) system has an enormous potentiality to aid therapist in the procedure of diagnosing, increasing the accuracy and to avoid time delay. This paper discussed the methods for diagnosing autism with EEG signals. Firstly, the proposed technique focused on VMD for extracting the features and the acquired