Identification of Cardio Diseases in Modern Healthcare using Machine Learning Classification

The health region produces a massive quantity of facts. This statistics is not always made use to the full quantity and is frequently underutilized the usage of this big quantity of statistics, a ailment can be detected, predicated or maybe cured. A large hazard to human type is caused by sicknesses like heart disease, most cancers, tumour, and Alzheimer’s disease prediction. Using machine getting to know strategies, the coronary heart ailment may be expected. Clinical data which includes blood strain, hypertension, diabetes, the quantity of each day cigarettes smoked, and so forth. Are used as input, so these traits are modeled to expect. This model can then be used to are expecting future clinical statistics. The algorithms like Decision Tree , k – Nearest Neighbor and Support Vector Machine are used. The accuracy of the model the use of every of the algorithm is calculated. Then the only with the good accuracy is taken because the version for predicting the coronary heart diseases.


Introduction
In our day by day lives, people keep on with a hectic, routine agenda that leads to pressure and tension. It results in illnesses like heart disease, most cancers, and many others. The assignment behind these sicknesses is the way to predict them. All of us have one of a kind cost of pulse fee and blood stress. However, it's been clinically demonstrated that the pulse ought to be among 60 and one hundred beats in line with minute and the blood pressure should be between 120/80 and one hundred forty/90. Coronary heart ailment is one of the important purposes of loss of life in the global. The coronary heart way cardio. Consequently, any heart ailment is assessed as a cardiovascular sickness. Coronary heart ailment or coronary artery sickness is the narrowing of the coronary arteries [24]. The coronary arteries deliver the coronary heart with oxygen and blood. It reasons ailment or loss of life in a large variety of human beings. It's miles one of the famous sort of heart sickness. Excessive blood glucose from diabetes can harm blood vessels and nerves that control coronary heart and blood vessels. If a person has had diabetes for an extended period of time, it's miles in all likelihood that they will have coronary heart disease in the destiny. The opposite hazard factors consist of age, gender, stress, and unhealthy diet. Chance of having a coronary heart sickness will increase while someone is growing old. Guys are at a better chance for coronary heart disease. So, in this paper based totally at the elements to expect the hazard of coronary heart disorder.

2.Technical background:
To predict a heart disease, the technologies like Machine Learning, Deep Learning and Artificial Intelligence are helpful.

Machine Learning:
Machine Learning can be defined as the ability of the automated learning without programming explicitly, it makes the ML model to learn like a human automatically. Learning with the guidance and labeled data. We have labeled data in this form of model, where the correct input and exact output that is expected are clearly defined. This supervised learning model has two subtypes: regression and classification.[3]

Unsupervised learning:
Learning from the data without a proper mark. The input is explicitly stated, but there is no indication of what we will receive as a result. Clustering and association are two of the unsupervised learning subtypes. [30]

Semi supervised learning:
Another form of learning is semi-supervised learning, which is a mixture of previous two learning types. [30]

Reinforcement Learning:
A reward based type, Where the model will based on the rewards, it's a type of feedback based learning. Here, the model will learn completely without any proper guidance.

Important Machine Learning Algorithms:
The majority of the algorithms in this project are classification algorithms [24].. Since classification algorithms allow you to work with data that has been classified. So, when we're working on prediction, we'll use the classification algorithms [5] that fall under Supervised learning. There are a variety of classification algorithms available. A few of them are mentioned further down.

Decision Tree :
Algorithm to find solution for classification problems, but it can also be used to solve regression problems. It's in the shape of a tree since it has decision nodes and leaf nodes. It can be used in data where input and output data are clearly separated for train. Activity of human decision making. [13] [27] and [28] .

K-Nearest Neighbor:
It's also a classification type applied toSupervised Learning methods. It has the potential to produce highly competitive outcomes. The key drawback of the KNN algorithm is that as the size of the data grows larger, it becomes slower.

Naive Bayes Classifier:
Based on Bayes theorem. This Nave Bayes Classifier can be used to create fast Machine Learning models. It's mainly used in a number of forecasting models. Some naive assumptions are made in the nave Bayes formulation. When we use it for classification purposes, it performs better than other versions. [26]

Support Vector Machine:
It's can classify, predict, and even detect outliers. It works by connecting two groups with a straight line.

Random Forest Classifier:
The random decision forests are another name for it. It is a form of ensemble learning [31]. It is achieved by creating several decision trees during the training phase and then combining the results.
In modern ML, the Random Forest Approach, this is the maximum used model. It's an example of ensemblebased methodology. It's made worse with the use of Decision Trees, which we've already looked at. [29]

Logistic Regression:
It's used to estimate the dependent variable among the independent variables. A discrete value between 0 and 1 is the product of logistic regression.

K-folds cross validation:
It's a technique for validating or evaluating machine learning models with handy amount of data. Sample numbers we need to divide is denoted by the letter k.

Artificial Neural Networks:
Artificial Neural Networks, or Neural Networks, are a form of artificial neural network. It was created using the biological neural networks model as inspiration. It is made up of a series of neurons, which are smaller units that are also known as nodes. [13][50]

Feature Selection Algorithms:
It is the process of reducing the variables as inputs into the prediction model that we are building. That is to say, it can also be defined as selecting the appropriate features. There are a number of function selection templates to choose from. The following are a few of them. [42]

Relief Feature Selection Algorithm:
The RFSA algorithm is a binary classification sensitive feature selection algorithm. It will be extremely beneficial in improving the classification algorithms' performance. [32][33]

Minimum Redundancy Maximum Relevance:
mRMR is commonly used with classification algorithms. As the name implies, the main objective is to minimise duplication while optimising relevance in the function we've selected. [32] [34]

Least Absolute Shrinkage and Selection Operator:
It is to perform the variable selection and regularization in order to accuracy improvement in prediction in statistics and in ML. It is initially made for the linear regression models, then it is also used for the classification models. [35]

Local Learning Based Feature Selection Algorithms:
By assigning weights to the functions, LLBFS reduces the complexity of Non-Linear Problems to Linear. The broad weighted and low weighted values are used to pick and reject candidates. [36]

Fast Conditional Mutual Information:
Another form of efficient feature selection model is the FCMIM. The design is based on Conditional Mutual Information in this case (CMI

Data Science :
Data Science is a discipline in which data is processed to obtain information and insight, which is then applied to a number of domains.

Artificial Intelligence :
Artificial Intelligence is the process of programming a computer to imitate human behavior. In this case, the computer would be intelligent in the same way as a person is.
[25] 2.7. Data Mining: Data mining means extracting information and ideas with data and applying them to useful purposes. Current programs, for example, HD finding with Data Mining. It is combined with a variety of hybrid intelligence and hybrid machine learning approaches to provide high precision in heart disease prediction.
The use of machine learning models in conjunction with data mining improves the performance of prediction and recommendation-based systems. The flow of steps in Machine Learning is displayed in figure 1 to find the flow of Machine Learning models.

• Choosing the dataset
It is necessary to choose a dataset that is appropriate for the model. It is available for free from a variety of websites on the internet. Kaggle and data.world are two examples of websites. Training and testing the model would be beneficial.

• Preprocessing the dataset
Following the selection of a dataset, the next step is to process it so that it is more appropriate for our model. Cleansing data, deleting unused data, and filing null values are all examples of data preprocessing. Since Machine Learning works with numerical data, this step also involves translating the characters to numerical values.

• Dataset split for test and train
The data is split for training and testing after it has been preprocessed for our model. Almost 70% of the data is used for preparation, while the rest is used for research.

• Fitting Model
After splitting the data, the next step is to pick the model that is best suited to our model. The accuracy will improve if you choose the right model. As a result, we must choose the appropriate model for our framework. [2] • Calculating accuracy and tuning After that, we must train and test our model. The model's accuracy is verified after it has been trained and tested. We need a best-fit model because, in Machine Learning, the model should not be the worst-fit or the most overfit. As a result, we'll need to fine-tune our model to boost its accuracy. • Change the model If the model we've selected isn't accurate enough after fine tuning, we'll need to move to a different model that's better suited to our framework. Mostly Used Algorithms in the previously studied models: ➢ Naive Bayes ➢ Support Vector Machine ➢ Decision Trees ➢ Random Forest Classifiers 4. Dataset Selection: Term "dataset" refers to a grouping of similar data. It's available on a variety of websites, including Kaggle, Data World, and others. The dataset used in different forms of proposed works is the Cleveland HD dataset. 14 main features of the dataset are used in different proposed models. Age of the patient, their gender and sugar level etc. The feature name, feature code and feature description of the mostly used Cleveland heart disease dataset is displayed in table 1.

Literature Study:
Various study works are listed. Some of the most important contributions that are used to forecast heart disease would be more useful in predicting heart disease sooner. Every year, a large number of new papers are published that attempt to predict heart disease using different technologies. Figure 2 explains the architecture studied from existing models.

Algorithm Usage:
There are various algorithms that are used in the HD identification. In a model proposed by Kahramanli et al. [6] uses the Artificial Neural Networks and Fuzzy logic to predict diabetes and heart disease. In an another model proposed by R. Das, I. Turkogluet al.
[7] uses ensemble type learning techniques for the predict of the HD. In an another model proposed by Palaniappanet al. [8] uses Naïve Bayes, DT and ANN technologies along with the data mining techniques for the effective diagnosis of the heart disease.
The model proposed by E. O. Olaniyiet al.
[9] uses the three phase ANN technique for the heart disease prediction using arbitration. In an another model proposed by E. O. Olaniyiet al.
[10] forms an integrated decision support system based on ANN and Fuzzy_AHP for heart failure risk prediction. In a model proposed by X. Liu et al. [11] forms A hybrid classification system for heart disease diagnosis based on the RFRS method for the feature selection. Mohanet al.
[12] forms effective heart disease prediction using hybrid machine learning techniques. Also in [39]. In a model proposed by Praveen Kumar Reddy Met al. [14] uses SVM or Decision Tree for predicting heart diseases. These two algorithms were taken from the Scikitlearn library of Python. Here the Pandas library is used for the purpose of Manipulation and for the purpose of Visualization, Matplotlib library and WEKA tool is used. From, that model it can be found that Decision Tree is better to use with the categorical data. These are the some of the models and the algorithms that are used in the forecast of the HD.
In model [18] proposed by NidhiBhatlaet al. the model gave good accuracy of 90% in Naive Bayes, but the model while using decision tree it gave the accuracy of 99.62%. Same as this model, a model newly proposed by Kiyong Noh et al. [19] used Naïve Bayes and Decision tree which also showed a good accuracy of 99%.E. Loukis et al. [23] proposed a model which also uses decision trees which is a representation of decision tree for the purpose of accurate prediction. They also used the SVM, but SVM gave poor results of 55% when decision trees gave the result of 100%.
[21] proposed a model which is done by using the clustering and classification algorithms.
A.S.ThamujaNishandi [15] proposed a model for predicting heart diseases in Logistic Regression of ML by Jupytelab, as it uses Logistic Regression but it doesn't provide an efficient accuracy. The model proposed by used L1 norm SVM [51] [52]and validation is done by K-fold cross validation and variousfeature selection is done by some specific algorithms. Bertsimas,J. Dunn et al. [20] proposed a model that is created by using the structured and unstructured data from the hospital.SSD (Single Shot Multi Box Detector) algorithm is a type of algorithm that is used for analyzing the disease in the human with the help of image processing. A review by MotilalTayade [22] on the image processing technology in medical field clearly explains about the technology health care industry in more detail.

Pseudo Code of the Existing models:
1. Begin 2. Preprocessing the dataset that is collected 3. Train the Classifiers 4. Using testing dataset, validate the model 5. Performance is Computed 6. End

Advantages:
Every model has its own set of benefits and drawbacks. The following are some of the major benefits that have been found in some of the most significant models that have already been completed.
In a model proposed by Kahramanli et al. [6] has the advantage of high accuracy because it uses the Artificial Neural Network techniques along with the Fuzzy Logic.Also [49] . Model by R. Das et al. [7] is also similar to the previous model as it has the advantage of the high accuracy. In an another model proposed by Palaniappan et al. [8] has the advantage that the high performance inaccuracy performance is achieved, so that model looked more better. . O. Olaniyiet al. [9] proposed a high accuracy model and they also proposed an another model with more higher accuracy [10]. X. Liuet al.
[11] proposed a model that has a high accuracy of nearly 92% this is achieved because of the correct feature selection. The Hybrid ML model proposed by Mohanet al.
[12] has a advantage of low computation time. Usage of Decision tree in [14] shown good results over SVM. In the model [17] the better feature selection is done so the accuracy measured for that model stands nearly 99% which is greater than all other models, so we can say that better feature selection will lead to the better accuracy.
[20] proposed model helps for the patient's to track the doctor about their availability and at the same time it is used to consult the doctor in name of WEB PROCTOR. This would be a greatest advantage for the heart patient. Because, it'll help them more as it minimizes the time of going to hospital for check up.

Limitations:
In a model proposed by Kahramanli et al. [6] has the limitations that the model needs the more execution time to predict the result. In the another model which is proposed by R. Das et al. [7] has the limitation that is computationally complex. In an another model proposed by Palaniappan et al. [8] the performance of NB and DT is more low. O. Olaniyi et al. [9] has the high computation time and an another model proposed by them is computationally complex [10]. The computation time is high in the model proposed by X. Liu et al. [11]. Mohan et al. [12] proposed a model using Hybrid ML model, but the accuracy of that model is very low when compared to the other model.
From the above models, we can identify that the model which has the good feature selection and the neural network based selection will give the high accuracy of the model. So, the model which is taken great from the feature selection has a high accuracy and low computation time. Selective model for prediction is required fro better performance. SVM doesn't work well for categorical data in [14]. Table 2 shows the comparison of various models with findings of its algorithms, advantages and its limitations.

Comparison of models:
6. Pseudo Code of the Proposedmodel: 1. Begin