UTILIZING MACHINE LEARNING TO DETERMINE THE COST OF MEDICAL INSURANCE

: By spreading the financial risk of unforeseen medical expenses among a large number of people, health insurance lowers the total amount of money at risk. Over the past 20 years, global public health spending has nearly doubled, and in 2023, it is predicted to reach $8.5 trillion, or 9.8% of the global GDP if inflation is taken into account. 60% of all medical procedures and 70% of outpatient care are provided by multinational multi-private sectors, sometimes at exorbitant costs. Because of growing healthcare expenditures, longer life expectancies, and an increase in non-communicable diseases, health insurance has become a necessary good. The availability of insurance data has increased, allowing insurance companies to leverage predictive modeling to enhance their business operations and customer service. Computer algorithms and machine learning (ML) are used to analyze previous insurance data in order to estimate future output values based on consumer behavior patterns, insurance policies, data-driven decision-making, and the development of new schemes. Machine learning (ML) has shown a lot of potential in the insurance industry, which is why the ML Health Insurance Prediction System was developed. Medical expenditures can be reduced by using this cost-price prediction algorithm to estimate premium values more promptly and effectively. This system compares and contrasts the Random Forest Regressor, Support Vector Regression, and Linear Regression regression models. Because the models were trained on a dataset, predictions could be made and the model's effectiveness could be verified by comparing it to actual data.


INTRODUCTION
General insurance plays a vital role in protecting individuals and their valuable assets, such as homes, vehicles, and real estate, from unforeseen events and accidents.It offers coverage against a range of risks, including fire accidents, earthquakes, floods, thefts, storms, travel accidents, and legal liabilities.Amongst these, health insurance holds particular importance as it ensures a secure and stable life by safeguarding against unexpected medical expenses that can disrupt financial stability and long-term goals [5].Given the complexities of modern health challenges, planning for healthcare has become a necessity, leading to the availability of insurance plans for individuals and families.
In India, a significant proportion (around 75%) of the population currently bears their medical expenses out of pocket.However, health insurance coverage has been increasing steadily, with approximately 514 million people covered during the fiscal year 2021.According to the NITI Aayog Health Index 2021, Kerala has been ranked as the healthiest state in India, with a composite score of 82.90.The insurance industry in India comprises 57 firms, including 33 non-life insurers and 24 life insurers, with seven public sector companies playing a prominent role.Strong competitors have also emerged in the form of private insurers such as ICICI, HDFC, SBI, and Star Health [7].
Previous studies have shown that individuals enrolled in Medicare tend to have more favorable assessments of their insurance compared to those with commercial plans.Various studies have compared Medicaid and commercial insurance, but the findings have been conflicting and limited to specific populations or service utilization.Recent data explicitly comparing the experiences of individuals with public and private health insurance is lacking [3].
The objective of this Paper is to provide accurate estimates of health insurance costs for different providers and individuals.While predictions may not always follow a consistent pattern, they can assist in making informed decisions regarding the selection of appropriate health insurance coverage [8].Early cost calculations can help individuals evaluate their options more carefully and ensure they choose the most suitable coverage.Furthermore, the research may offer insights into maximizing the benefits of health insurance.

LITERATURE SURVEY
India's market for general insurance is growing significantly in the post-liberalization environment.The opening of the Indian insurance market to foreign companies, Third Party Administrators, low insurance premiums, quick and immediate settlement of insurance claims, innovative general insurance policies, discounts on insurance products, growing public awareness, more distribution channels, and other factors have all contributed to this market's spectacular growth.The Below includes various research papers and articles related to different aspects of health insurance.[1]."Operational Efficiency of Selected General Insurance Companies in India" -This paper explores the operational efficiency of general insurance companies in India, particularly in the context of competition between public and private insurers.[2]."An Empirical Evaluation On Proclivity Of Customers Towards Health Insurance During Pandemic" -The research focuses on studying the awareness and inclination of the public towards health insurance during a pandemic, using SPSS software for analysis.[3]."Health Insurance in India -An Overview" -This article provides an overview of the health insurance industry in India, including the growth and development of standalone health insurers and government-sponsored health insurance providers.[4]."A Conceptual Review Paper on Health Insurance in India" -The paper reviews existing literature on health insurance in India to understand the growth and potential benefits of health insurance for the population.[5]."Need-based and Optimized Health Insurance Package Using Clustering Algorithm" -This research proposes the use of clustering algorithms to design health insurance packages based on the specific needs of employees, aiming to provide optimized coverage.[6]."Health Insurance Amount Prediction" -The authors analyze personal health data to predict insurance amounts for individuals using regression models.Multiple Linear Regression and Gradient Boosting Decision Tree Regression are compared for their performance.[7]."Predicting the Risk of Disease Using Machine Learning Algorithm" -The study aims to predict the risk of chronic kidney disease (CKD) using machine learning algorithms, specifically by building a regression model to predict creatinine values and combining them with other health-related features.[8]."Piecewise-linear Approach for Medical Insurance Costs Prediction Using SGTM Neurallike Structure" -This article proposes a method for predicting medical insurance costs using a piecewise-linear approach and the SGTM neural-like structure, comparing it with other methods like multilayer perceptron.[9]."Predicting Health Care Costs Using Evidence Regression" -The research investigates the use of an interpretable regression method based on the Dempster-Shafer theory, called Evidence Regression, for predicting health care costs.It outperforms Artificial Neural Network and Gradient Boosting methods in terms of accuracy.[10]."Health Insurance Sector in India: An Analysis of Its Performance" -This study analyzes the performance of the health insurance sector in India, specifically examining the relationship between premium earnings and underwriting loss using regression analysis.[11]."Knowledge and Understanding of Health Insurance" -The research focuses on health insurance literacy and disparities in knowledge among different socioeconomic groups in Israel, emphasizing the need for tailored communication strategies and simplified plan information.[12]."The Effects of Health Insurance on Health-Seeking Behaviour: Evidence from the Kingdom of Saudi Arabia" -The study explores the impact of health insurance on healthseeking behavior in Saudi Arabia and suggests the introduction of national health insurance coverage as an effective measure to improve access to healthcare.

PROPOSED MEDICAL HEALTH INSURANCE COST PREDICTION SYSTEM
The dataset used here contains information related to health insurance costs and various factors that influence them.The dataset has 7 columns and 1338 rows.Based on prediction ,we can identify some of the important columns/features in the dataset: 1. Age: Represents the age of the insured individual.To predict the cost of health insurance, the dataset needs to be cleaned and prepared before applying regression algorithms.The information suggests that age and smoking status have the most significant impact on insurance costs, with smoking having the greatest effect.Other factors such as No. of Children's , BMI, marital status, and geography also play a role in determining insurance costs.

TECHNOLOGY USED:
A. Machine Learning: Machine learning is a branch of artificial intelligence that concentrates on algorithms and models enabling computers to learn from data, make predictions, or make decisions without requiring explicit programming.It involves training models on historical data and using them to make predictions or classify new, unseen data based on patterns and relationships learned during training.

B. SVM (Support Vector Machines):
SVM is a supervised machine learning algorithm used for both classification and regression tasks.It works by finding an optimal hyperplane that separates different classes in a high-dimensional feature space.SVM aims to maximize the margin (distance) between the decision boundary and the data points of different classes, allowing for better generalization and improved performance on unseen data.It can handle linear and non-linear classification problems using different kernel functions, such as linear, polynomial, or radial basis function (RBF).

C. Random Forest:
Random Forest is an ensemble learning method that combines multiple decision trees to make predictions.It is a A supervised learning algorithm is commonly employed for both classification and regression tasks.Random Forest builds an ensemble of decision trees by training each tree on a randomly selected subset of features and data samples.
During prediction, each tree in the forest independently makes a prediction, and the final prediction is determined based on a majority vote (for classification) or averaging (for regression) of the individual tree Predictions.Random Forest is known for its ability to handle high-dimensional data, provide feature importance estimates, and handle non-linear relationships between features and the target variable.Linear regression is a supervised machine learnin g algorithm used for regression tasks.It models the relationship between a dependent variable (target) and one or more independent variables (features) using a linear equation.The objective of linear regression is to identify the optimal line of best fit that minimizes the disparity between the predicted values and the actual values.It assumes a linear relationship between the input features and the target variable.Linear regression can be extended to handle multiple variables (multiple linear regression) or non-linear relationships by using polynomial or other non-linear transformations of the input features.

4.RESULT
The proposed system's dataset was tested with three machine learning algorithms: Random Forest, Linear Regression, and Support Vector Regressor.The accuracy of each algorithm was measured, and the results are as follows: 1. Random Forest: 84% accuracy 2. Linear Regression: 74% accuracy 3. Support Vector Regressor: 83% accuracy These accuracy percentages indicate how well the algorithms performed in predicting the target variable based on the given dataset.It seems that Random Forest achieved the highest accuracy of 84%, followed by Support Vector Regressor with 83% accuracy, and Linear Regression with 74% accuracy Shown in fig 3.

5.CONCLUSION AND FUTURE SCOPE
You appear to be summarizing the findings and potential applications of the regression models built with information from health insurance policies.You claim that the random forest regression model performed the best out of the three models that were looked at.Age and smoking status were found to be the most important factors influencing insurance rates across all algorithms.To improve accuracy, irrelevant qualities were removed from the features and different combinations of attributes were looked into.It's probable that this process enhanced the models' ability to forecast and refine.

FUTURE SCOPE
The unpredictable nature of the Random Forest algorithm may lead to higher prediction accuracy when compared to other algorithms.To demonstrate the system's scalability, it is recommended to use a dataset with a minimum of one million items in the future.Such large-scale data processing requires distributed frameworks like Spark and Hadoop.By processing and

2 .
Smoking Status: Indicates whether the insured individual is a smoker or a non-smoker.3. BMI: Represents the Body Mass Index, a measure of body fat based on height and weight.4. No. of Childrens: Provides information about the insured children's count. 5. Sex: Indicates the Gender.6. Region : Represents the geographical region of the insured individual.7. Charges: Represents the medical insurance charges or costs.

Fig 3 .
Fig 3. Performance Graph of Proposed System with Three ML Algorithms