An Insight on Machine Learning Algorithms for Predicting Heart Diseases

: Heart Disease (HD) is one among the critical diseases that severely affects the human kind. The presence of heart disease arise insufficient blood supply to other body parts. Henceforth, diagnosing the HD on time prevents the heart failure. Traditional diagnosing procedure regarding HD detection and prediction becomes unreliable in many circumstances. Recent studies put forth the witness that implication of Machine Learning (ML) in traditional HD detection and prediction has resulted in superior performance. Further, Computer Aided Diagnosis using one-dimensional and multi-dimensional signals assists in diagnosing the HDs at an early stage, thereby saving the human life. The objective of this manuscript is to present an overview of HDs, symptoms and role of ML in HD predictions followed by various state-of-the-art ML algorithms that aids in identification and prediction of HD at an early stage to save the human life;


INTRODUCTION
The WHO manifested heart diseases or Cardio Vascular Diseases (CVD) as a universal rationale for death, with an estimation of 17.9 million lives (2016) per year [23].CVDs are cluster of heart ailments, blood vessels or arteries that influence the normal functioning of the heart.Ailments related to heart may include [24]: • peripheral arterial disease, cerebrovascular disease and coronary heart diseasediseases related to the supply of blood; • rheumatic heart diseaseimpairment of heart muscle and heart valves , • congenital heart disease -abnormality inherited by birth; • Pulmonary embolism and deep vein thrombosis-blood clots occur in veins of leg that can extricate to the heart and lungs.

Figure 1 Top ten global death causes (Source Courtesy [25])
Broadly, the human race often misconceives and annotates Heart attack and Heat failure for several cardiac related diseases.Figure 1 depicts the top ten global death causes summarized from [25].Actually, the two conditions are entirely different.Heart attack is the condition where the flow of blood is blocked when left unnoticed leads to heart failure.Thus, heart attack is one among the reason for heart failure.Other reasons may include consumption of alcohol or tobacco, smoking, diabetes, cholesterol, high blood pressure, aging factor and rarely genetics.

ANATOMY OF HUMAN HEART: A GLIMPSE
Figure 2 exhibits the anatomy of heart with its vital elements.A brief of every element mentioned in the diagram is discussed in the following section.• Arteries -ferry blood from the heart [37].
• Aorta -the chief artery transmits blood to other parts such as abdomen, chest and leg [27].
• Ventricle -a hollow lower muscle chamber of heart that insures the pumps of heart.
• Artium -a hollow superior bulk chamber of heart that fills blood in the ventricle [28] and [29].
• Pulmonary valve -permits blood to stream into the pulmonary artery and lungs respectively [30].
• Bicuspid valve -an aortic valve that contain two leaflets.
• Tricuspid valve -consists of three flaps (leaflets) to allow and stop the blood stream apart from preventing backward stream of blood [32].

REASONS OF HEART DISEASES
Irrespective of tremendous progress in medical diagnosis and treatment, the major cause of 31% of all global deaths was Cardiovascular Diseases (as per the year 2012) and by the year 2030 the number may grow to 23.6 million.Moreover, as per 2016 study, the direct medical costs spend for CVDs are expected to $749 billion in 2035 [33].Figure 3 demonstrate several behavioral risks such as consumption of alcohol or tobacco, smoking, diabetes, cholesterol, high blood pressure end up with CVDs like heart failure, heart attack, arrhythmia, cardiomyopathy and others.

ROLE OF ML IN HEART DISEASE PREDICTION (HDP)
The density of data is enormous in the medical science field.The utmost utility of data gotta predict the diseases beforehand to extend the probability of survival.The patient data in dataset includes several attributes such as age, sex, cholesterol, ECG, blood pressure and sugar.With respect to heart disease prediction, ECG or the heart rate is assumed as the vital data taken at specific intervals.With this focal data, the aforesaid attributes combined with Machine Learning algorithm predict and classify the heart diseases.As a fundamental phase, the data is retrieved with essential attributes from the dataset.The data in dataset may be huge nevertheless sketchy too.For example, any of the data among the attributes probably may not be available or otherwise excess details may be available.In such cases cleaning of data is obligatory.Moreover, the data format must be reinforced to match the ML technique being used.Such preparatory tasks ease the manipulation and processing of data with the respective ML techniques like Naïve Bayes (NB), Support Vector Machine (SVM), Fuzzy Neural Network (FNN), K-Means, Clustering, Random Forest (RF), Decision tree and others.In some specific cases, the entire data used is partitioned into training and test data to validate the model.Finally, the performance is evaluated to proclaim the efficiency if at all improvised iteratively to attain the benchmark.

HDP USING ML: AN EXPLORATION
The subsequent section illustrates the existing HDP approaches using ML techniques.Edin, Keeley, Henderson and Nannemann [1] compared and evaluated the anomaly detection of machine learning algorithms on physiological reading of datasets in supervised and unsupervised conditions.The machine learning algorithms -Local Outlier Factor (LOF), KNN, RF, Isolation Forest and SVM; chosen for the evaluation, support both labeled and unlabeled data apart from ubiquitous use in outlier detection.MIT-BIH data of a patient, that constitute 0.5% outlier and 2.5% anomalies (2 datasets) were tested.The Correct Rejection Rate (CRR) and the Hit Rate (HR) of 0.5% and 2.5% outlier dataset were computed for five ML algorithms.While KNN and SVM illustrate high CR and HR in 2.5% anomaly dataset; both the forests represent good values in 0.5% outlier dataset.Amongst the five algorithms, unsupervised LOF model detected clutch of anomalies at both ends of heart rate ranging from 60-100 bpm (beat per minute).The LOF model numbered low anomalies in spite of 20% trained test data on both the datasets.N. C. D. Adhikari [2] used Artificial intelligence in designing Hear Problem Prediction System (HPPS) counting on 9 major risk factors.The validation accuracy and the selection accuracy are computed by the machine learning algorithms through Confusion Matrix.The selection value exhibit minimal or diminishing False Negative Rate.Among the entire dataset 329 samples were tested for the designed model using the algorithms: Decision Tree, SVM-RBF, Logistic Regression (LR), RF, NB and SVM-sigmoid.Upon processing with sample data, Random Forest demonstrated an accuracy value of 72.34% and with increased frequency the selection value was 74.1 which were comparatively low when compared with values of other algorithms.HPPS pops out the heart prediction thus guiding the doctors in further treatment.
Haq, Jian Li, Md Hammad Memon, Nazir and Sun [3] devised a hybrid system employing Machine Learning (ML) Algorithms to predict HD.Cleveland dataset used for the learning purpose was preprocessed initially.The 13 features dataset of patients were deleted if missing data was identified.The study was conducted with all the features and with reduced features on machine learning classifiers -LR, SVM, Artificial Neural Network (ANN), K-Nearest Neibhour (KNN), Decision Tree (DT) and NB.To prove the efficacy of the model, the Feature Selection (FS) approaches such as mRMR, Relief and LASSO utilized only those features which had extensive impact on performance metrics -Accuracy, Specificity, Sensitivity, Matthews' correlation coefficient (MCC) with reduced processing time.The k-fold validation method used on the selected features improved the result.The ROC curves analyze the capacity of classifiers.
Hussain, Awan, Aziz, Saeed, Ali, Zeeshan and Kwak [4] identified Congestive Heart Failure (CHF) upon extracting multimodal features from subjects affected with CHF, Normal Sinus Rhythm (NSR) and Atrial Fibrillation (AF).The sample database involved subjects of various age groups obtained from MIT-BIH and Hotler Monitor respectively.The Heart Rate Variability (HRV) signals were processed for extracting the features.The extracted features were classified by the machine learning algorithms -SVM, KNN, DT and ensemble.The performance metrics such as True Positive, True Negative, Negative Predictive Value (NPV), Positive Predictive Value (PPV), and Accuracy along with area under receiver operating curve were computed.
The result of every method was compared with other methods.The result of the designed technique was compared with other CHF algorithms of other authors to prove the superior accuracy rate of 91.4% (Ensemble Classifier), 81.9% (DT), 93.1 % (SVM) in detection of HD.
Golande and Pavan Kumar [5] investigated several machine learning algorithms to predict heart diseases among the given dataset.A detailed analysis of several algorithms put forth by several author were analyzed.The authors suggested a method to predict heart disease accurately upon extensive analysis of earlier methods using DT, KNN, AdaBoost and K-Means algorithms.The output gathered by ML algorithms by training 80% of the data and the remaining 20% test data were classified for identification of heart disease.Adaboost with single decision tree stimulate the process.
Abdel-Motaleb and Akula [6]  Jabbar, Deekshatulu and Chandra [7] experimented and proved employment of ANN and feature subset selection classified the HD accurately.The efficiency of the technique was tried and true when applied on nine datasets.In addition the results were compared with other algorithms like NB, J48 to prove the efficacy.Moreover, the algorithm discards inappropriate data and takes into account only those attributes that affects the classification to aid accuracy in classification.
Bansal, Kumar, Bajpai, Tiwari, Nayak, Venkatesan and Narayanan [8] designed a remote health track system to identify cardiac abnormalities.The system was designed to receive input data i.e.ECG signals through a specific sensor fit in patient's smartphone by the Bluetooth technology.The transmission of data to the server with Clinical Decision Support System (CDSS) facilitated preliminary diagnosis to identify the abnormality of cardiac function.The point of care device connected remotely to the physician analyzed outcome of CDSS and appropriate initiatives were taken.Experiments on PTB database exclusively exhibited Sensitivity of 93.1% and Specificity of 79.5% while PTB combined with SM-RR-QRS dataset exhibited 91.8% and 91.9% of Sensitivity and Specificity respectively.
Prasad and Parthasarathy [9] used Fast Fourier Transform (FFT) based Multi Objective Genetic Algorithm (MO-GA) to detect and classify cardiac (arrhythmia) abnormalities from the 12 lead ECG signals of 11,000 samples that belong to MIT-BI arrhythmia database.The median filtered ECG signals were feature extracted by FFT algorithm.The MOGA algorithm based on patient's age, physical condition and other variants identified the abnormalities.The results were compared with other techniques to prove its superior accuracy of 98.7% both in terms of Signal Noise Ratio (SNR) and Mean Square Error (MSE).The comparative study also exhibited 20% increase in detecting the abnormalities than other conservative techniques.
Dutta, Dutta, Sikdar, Dutta, Sharma and Sharma [10] designed a Cardiac Health Monitoring System (CHMS) to identify cardiac abnormalities in less computational cost targeting low and middle masses.The input for CHMS was audio signal obtained through a digital stethoscope.The processing of signal like denoising, conversion of analog to digital signals enabled easy transmission of data to the mobile device.The application in the mobile processes the data and stored the data date wise.Further, as the application was automatic, the results were mapped as graphical representation.Moreover, the analysis and report generation indicated if the patient was in need of any medical guidance.The prototype of CHMS identified the abnormalities efficiently.
Ruban, Vivek and Krithi [11] demonstrated forecasting of heart disease using machine learning approaches such as KNN, SVM, DT, LR and RF on Kaggle dataset.Upon computation of accuracy by the aforesaid classifiers, the authors identified that Logistic Regression, KNN and SVM generated 0.95592%, 0.956194% and 0.9561945% respectively surpassing other algorithms in HD prediction.
Serhani, El Kassabi, Ismail and Nujum Navaz [12] designed and discussed key challenges of ECG Monitoring system to abet the contemporary technologies in Big Data analytics, Deep learning and Artificial Intelligence with a motive to provide economically affluent connected monitoring systems.The architecture of ECG Monitoring system and type of ECG Monitoring Systemtraditional and real time, service-based, performance based were elucidated in detail from taxonomy.Moreover, the Futuristic ECG Monitoring System highlighted detailed insight of ECG in providing vital information for implant, AI, Radar Cardiography and others.
Martin-Isla, Campello, Izquierdo, Raisi-Estabragh, Baebler, Petersen and Lekadir [13] diagnosed Cardiacvascular outliers from the heart resonance images using machine learning (ML) algorithms.The authors classified the dataset as population-based and clinical based after detailed survey of several datasets and respective softwares.The authors provided a detailed overview of Cardiac outliers -Myocardial infarction, Cardiomyopathies, Abnormal Wall motion, Heart Failure, Valuvular Heart Disease, Atherosclerosis, and Coronary Artery Disease; can be diagnosed using ML techniques such as SVM, RF, DT, ANN, CL, LR and others in terms of accuracy.Moreover, the performance of the algorithm rely upon the pace and reproducibility of images.
Alfaras, Soriano and Ortín [14] identified Cardiac Arrhythmia from ECG signals of patients by applying Echo State Network (ESN) classifier.For the purpose of study, the ECG signals of MIT-BIH AR and AHA datasets were pre-processedre-sampling, filtering, heartbeat detection, RR calculation, heartbeat segmentation and normalization.Finally, the ESN classifier classifies the processed heartbeat into abnormal or normal hear beat as VEB+(ventricular ectopic) or SVEB+(super ventricular ectopic) heartbeats respectively.Moreover, the evaluation metrics in terms of Sensitivity, PPV, Specificity and Accuracy were compared with other state of the art algorithms.ESN classifier surpassed other classifiers with respect to time consumption (<0.2 s).
Nashif, Md.Raihan, Md.Islam and Md.Imam [15] recognized cardiac outliers using SVM classifier.Additionally, the authors designed a real time health monitoring system to treat cardiac abnormalities.Upon analysis of SVM classifier on WEKA database, promising performance metrics such 97.53% accuracy, 97.5% sensitivity and 94.94% specificity were procured.The health monitoring system was designed using Arduino which could sense certain parameters like temperature, pressure, heartbeat and humidity were recorded and updated in the server once in every 10 seconds.The sensors enable the doctors to visualize the data and take remedial measures instantaneously during emergency by means of GSM.
Ahmed and Verma [16] amalgamated Back Propagation Neural Network (BPNN) and Genetic Algorithm (GA) to predict Heart diseases before it happens.Data set from UCI repository with 14 attributes were considered for the study.Out of 303 samples, 200 were used as training data set.With the intention to overcome the demerit of BP algorithm, GA does the crossover and mutation on the pre-processed data initially thus playing the role of optimizer.The BP algorithm based on the weight predicted severity of heart disease with the accuracy rate of 94.17%.
Lakkamraju, Anumukonda and Chowdhury [17] demonstrated a prognostic diagnosis model based on safety related 2oo2 Cardiac Health Monitoring system architecture.The fuzzy entropy technique computed the pulse rate exactly.Moreover, the safety segregation system identified the peaks to figure out the pulse rate accurately.Further, the processing of signals from phonocardiogram (PCG), electrocardiogram (ECG), photoplethysmogram (PPG) sensors diagnosed the artifacts thus boosted the performance of prognostic health diagnostics.
Kagiyama, Shrestha, Farjo and Sengupta [18] examined several Machine Learning algorithms in treating Cardiac diseases.Besides analyzing the various ML algorithms such as Logistic Regression, Decision tree, Neural Network, KNN, and SVM, the authors cited several papers with different inputs such as ECG signals, MRI images, Heart sound signals, Electro cardiographic images and its contribution towards the field of medicine by evaluating the performances of the model through the computation of Recall, Precision, F-Measure, Mean Square Error (MSE), RMSE values.
Sevakula, Au-Yeung, Singh, Kevin Heist, Isselbacher and Armoundas [19] studied Artificial Intelligence and its sub-field Deep Learning and further its sub-array Machine Learning in the field of medicine relevant to Cardio Vascular system.The authors addressed five major applications, namely, 1) Imaging 2) Electrocardiography 3) In-hospital monitoring 4) Mobile and Wearable technology and 5) Precision medicine.Moreover, in each application the promising contribution of ML classifiers like SVM in Imaging, CNN in Electrocardiography with more than 92% accuracy were discussed.Further, the success rate of ML algorithms in in-hospital monitoring and failures that arise when alarms were utilized was briefed.The benefaction of internet, sensors in treatment of Cardiac diseases that analyze and even warn the physicians about the critical illness of the patients were explored in Mobile and Wearable technology.Finally, Precision medicine, an evolving and progressive method where cardiac outliers were identified even before it occurs; its future in Cardiac monitoring system was addressed.
Uma Maheswari and Jasmine [20] predicted heart disease among patients with a hybrid approach -Logistic Regression and Neural Network approach.Initially, the 297 samples obtained from Cleveland Heart Disease Database with 14 attributes were processed for computing the p-value to investigate the risk factors involved.Then, the training and test dataset were processed by Artificial Neural network system to evaluate the performance metrics Accuracy-84%, Specificity-77.5%,and Sensitivity-91.4% respectively in identifying cardiac diseases.The advantage of logistic regression in interpreting the parameters was combined with the befit of Neural Networks in detection of complex relationships between dependant and independent variables.Sunil, Bansal, Tiwari, Nayak and Narayanan [21] developed a Remote health Monitoring system based on the subjects' ECG signals that detected fatal cardiac events beforehand.The mobile gateway of the subject acts as a primary means to acquire/transmit signals.The clinical decision support system in the remote server processed the signals in the annotation and processing engine viz a viz with ANN and SVM classifiers to the Point of Care Device (PCOD).The results in the second phase were meticulously analyzed and plotted if required (for series of data at regular intervals) and indeed communicated the physician visually the vital parameters depending on the criticality to the portable device of the doctor.In addition, the PCOD forwarded an alert message to registered number of the subject.The development of Remote system predicted the golden hour of the subject with cardiac outlier thus extending the survival period.
Rubin, Abreu, Ganguli, Nelaturi, Matei and Sricharan [22] combined time-frequency heat map and Deep Convolutional Neural Network (DCNN) to classify the normal and abnormal heart sounds.The single CNN architecture was trained with modified loss function balancing specificity and sensitivity.CNN achieved sensitivity-73% and specificity-95% with overall score of 84%.On comparing with other top algorithms, single deep CNN performed superior by a score of 0.02.
Venkata Hari Prasad and Rajesh Kumar [38] diagnosed heart diseases by auscultation using Phonocardiographic (PCG) signals.The features from PCG signals were extracted using Discrete Wavelet Transform (DWT) while the prominent features were identified and selected through the Evolutionary Algorithm.The Classification of selected features was performed through Naïve Bayes, C4.5 and KNN.The wavelet co-efficient computation was followed by feature selection techniques.The statistical computation scores such as Information Gain (IG), term frequency, mutual information and evolution algorithms were measured and compared.The experimental results revealed Evolution Algorithm combined with C4.5 performed better than other classifiers.
Venkata Hari Prasad and Rajesh Kumar [39] proposed Genetic Algorithm to enhance heart murmur classification.The process initiated with feature extraction followed by feature selection and classification.The Various machine learning algorithms used in the medical literature for diagnosing the diseases are summarized from [36].Table 1 summarizes the ML techniques discussed in the Section 5. Figure 6 illustrates the contribution of ML algorithms in diagnosing HDP.Precision medicine ---

CONCLUSION
An insight on ML algorithms on HDP was analyzed thoroughly in this paper.However, based on the requirement, the multi-stage or multi-dimensional makes the system more robust for detection and prevention of HDs at the outset.Moreover, security challenges need to be addressed to ensure the robustness of these applications.This paper addressed the overview of HDs along with the symptoms and role of ML in HDPs followed by existing ML adopted in diagnosing the HD at an early stage.This article will be an eye-opener for the novice researcher and provides platform to explore more about existing ML algorithms adopted in HDP along with their performances on various dataset images.Summarizing their performance may lead to identify the research potentials in the concern domain.

Figure 2
Figure 2 Human Heart: Anatomy (Source Courtesy [26])• Arteries -ferry blood from the heart[37].•Aorta -the chief artery transmits blood to other parts such as abdomen, chest and leg[27].•Ventricle -a hollow lower muscle chamber of heart that insures the pumps of heart.•Artium -a hollow superior bulk chamber of heart that fills blood in the ventricle[28] and[29].•Pulmonary valve -permits blood to stream into the pulmonary artery and lungs respectively[30].•Aorta valve -outflow oxygen-rich blood to the aorta[31].•Bicuspid valve -an aortic valve that contain two leaflets.•Tricuspid valve -consists of three flaps (leaflets) to allow and stop the blood stream apart from preventing backward stream of blood[32].

Figure 4
Figure 4 exhibits the general architecture of HDP using any of the ML algorithms.

Figure 3 Figure 4
Figure 3 Heart Disease types and its risk factors (Source Courtesy [34]) result of Genetic Algorithm based feature selection was compared with classifier based feature selection techniques such Naïve Bayes, KNN, C4.5 and SVM.The performance parameters such as Classification Accuracy (CA), Precision, Recall and F-Measure of Information Gain (IG) based classifiers and Genetic Algorithm based classifiers were evaluated and compared.The comparative study showed Classification accuracy of 88.9% when feature selection was carried out combining Genetic Algorithm and SVM.

Figure 5
Figure 5 ML algorithms adopted for HDP (Source Courtesy [36])Table1Summary of ML techniques adopted for HDP
manifested artificial intelligence algorithm to diagnose 3 types of heart disease depending upon the Phonocardiogram (PCD) signals.The dataset of 94 patients included 32 mitral regurgitation, 31 coarctation of the aorta and 31 with mitral stenosis.Further the dataset was detached into 66, 5 and 23 for training, validating and testing purpose, respectively.Four unique features plotted by Power Spectral Density plot such as Mobility, Complexity, Activity and the graded spectral peaks act as major inputs in this study.The filtered and extracted signals were ratified for further classification.The classical Radial Basis Function and Back Propagation Network classify the normal and abnormal functionality of heart through Receiver Operating Characteristic Curves.The experimental result proved the excellence of RBF over BPN with accuracy rate of 98%.