Machine Learning of the Reverse Migration Models for Population Prediction: A Review

Human migration from rural to urban has historically been prominent in the urbanisation process which associated with economic development that leads to city growth. However, the dwindling supply of natural resources and pressure from the pandemic has threatened economic growth and resulted in changes in human migration; urban to rural. This anecdotal evidence of reverse migration need to be examined and predict related to challenges and expansion of sustainable development The prediction of human migration; related to population size and growth are important for various policy on strategy, planning and industry. Moreover, predicting population mobility can sense the law of migratory flow in advance, and take effective preventive measures, such as crowd evacuation and epidemic diseases. However, migration predictions are notorious for bearing high error, time consuming, complexity and challenging. Therefore, aligning with IR 4.0, this study adopted a significant way to minimize the prediction errors by using a machine learning approach that can predict data in an intelligent way within a broad dataset. This paper present the investigation of the significant models of machine learning in developing reverse migration prediction. Thus, aims of this study is to identify the machine learning models for reverse migration through systematic literature review (SLR) screening.As SLR has recognised to presents a reliable review, this paper measures both, the review from Scopus and Google scholar to determining the signature algorithm for the models. The findings highlighted the decision tree, random forest and linear regression to be the propose algorithms that pursuit the development of the machine learning models for reverse migration in Malaysia.


Introduction
Globalization has brought together the prosperity of industrialization and urbanization which fuelled by the natural resources of oil and gas. These rapid growths in urban areas have significantly shifted massive population mobility from rural to urban in the 1970s 1. Within 30 years of this tiger economic growth in Malaysia, the depletion of natural resources has subsequently deceased the industrial demand in FDI and resulted in deindustrialization. Without industry, the city has lost its vital component of economic growth and threatened de-urbanization. Thus, there is emerging evidence of population moving in reverse; urban to rural migration to make better living and well-being. The changes in population mobility are affecting the law, planning strategy, health issues, education strategy and many more in the industry. As population impinge major effects on the national growth, it is crucial to equip the government and policymakers with the changes, consequences and prediction towards future strategy. Therefore, there are needs in managing population with adequate information on the processes of migration, their magnitude and structure. The underlying theories are far from being clear, migration involves relocating across an international boundary for some time, but in practices, the theories of migration vary between different countries to conceptualize in the contemporary world 2. There are numerous traditional models in human mobility has been established namely the Radiation model 3 and Gravity model 4. The Gravity models were implemented by Zipf 5 in 1946 denotedmovement of people between different places. Meanwhile, the radiation model is seen to better capture a long-range trip than gravity-based models and is defined as a 'universal model' of the population mobility and migration trendsError! Reference source not found.. Given this circumstance, the machine learning model has become a very promising approach in resolving the issues and it has been shown to be effective in different kinds of forecasting and categorizing problem 6. In addition, the demand for machine learning functionality is increasing very rapidly and in more fields of science, technology, and society. Therefore, this paper attempt to review various kind of algorithm and techniques from established references in order to develop significant machine learning models for reverse migration. The concern is to initiate a comprehensive dataset of the changes in population mobility to equip the national demographic prediction towards future expansion. This paper adopted the systematic literature review (SLR) techniques in screening reliable models and excluding the challenges and limitations. With the aid of Scopus and Google scholar searching engine, this paper shortlisted n=7 reviews out of the n=50 database discovered through the selection processes. The findings indicated three types of algorithm models namely decision tree, random forest and linear regression as the significant models pursuing the study purposes. Thus, this machine learning of the reverse migration model development is sufficient in assisting the strategic demographic planning development and benefits the society needs towards a sustainable future.

Literature Review 2.1 Machine Learning
Industrial Revolution 4.0 facing many issues in the industries can be resolved with artificial intelligence methods including machine learning 6. A significant role was played by data science in many industries such as engineering, education, science, medicine, business, finance, accounting, marketing, economics, stock market, and law, among others. Due to data availability, variety of open-source machine learning tools and powerful computing, machine learning has been attracting a lot of attention from the research and business communities 6.
Therefore, the determination of major factors is extremely important in adapting machine learning model to reverse migration. This requires pre-processing and exploration of the collected datasets. The accuracy results produced by the machine learning model are highly dependent to the pattern of the dataset, parameter tunings, and feature selections. The main purpose of this paper is to explore machine learning algorithms that were typically implemented in the reverse migration study.

Machine Learning Task and Tools
Machine learning is all about specific design algorithms that allow computer to learn. Learning is a process of findings statistical regularities or other patterns of data 7. Machine learning is categorized into three broad categories supervised learning, unsupervised learning and reinforcement learning 9. Figure 1 shows the diagram of ML tools.
Based on figure 1, supervised learning algorithms can be described as the classes predetermined. This means that the classes are formed in a finite set manner, defined by the human, which in practice means that a certain segment of data will be labeled with this classification 7. The algorithms learn from the training dataset and apply some kind of patterns to the test dataset for prediction or classification 11.Next, the machine learning algorithm's task is to identify patterns and build mathematical models. After that, these models evaluated based on the predictive capacity in relation to measures of difference in data itself 7. Machine learning is all about specific design algorithms that allow computer to learn. Learning is a process of findings statistical regularities or other patterns of data (Nasteski, 2017). Machine learning is categorized into three broad categories supervised learning, unsupervised learning and reinforcement learning (Nath et al., 2016).  Unsupervised learning algorithms is much complicated since the computer need to learn to perform specified tasks without showing it how to perform 9. Unsupervised data learning requires the identification of patterns without the participation of a target variable 12.Unsupervised learning refers to the process of grouping knowledge into clusters using automated data methods or algorithms that have not been categorized or classified 13. This type of machine learning was designed to extract structure from data samples. Clustering means dividing the available data instances into sub-groups based on the similarities between the instances in a certain group. These sub-groups are referred to as clusters, so the technique is known as clustering 9. Reinforcement learning algorithms can be defined algorithm learns a policy of how to act given an observation of the world. Each action has an environmental effect, and the environment delivers feedback that guides the learning algorithm 7. Reinforcement learning is a form of learning that allows choices based on which steps to take to make the result more positive. The learner has no knowledge of what actions to take before a scenario is presented 14. In general, reinforcement learning gives low performance since the functions are randomly chosen without predicting the potential effects 9.

Machine learning models in Reverse Migration Prediction
According to 15 Migration is known to be the migration of individuals from one ecological area to another region, probably of temporary or permanent origin. Depending on the circumstances that led to the decision to migrate, the community of individuals migrating on the basis of the current conditions and the reasons for it varies from one person to another 15.The industrialization and rapid urbanization process has brought both opportunities and challenges with it.Despite the technologically well-equipped cities and lifestyle, people are becoming more dependent on jobs, industrial goods, vehicles and machinery 16.
Machine Learning are applicable in varies disciplines such as business, computer engineering, industrial engineering, bioinformatics, medical, pharmaceuticals, physicals, and statistics to gather knowledge and predict future events. Human migration models were really important to governments as they can have wider forecasts of how the demographics of an area may change in the upcoming years 17, how labour markets would be impacted 17, how spreading infectious diseases 18, and how global economy will change, 19.
Most of previous research shows that there is a lack of study on what are the appropriate approaches in prediction of trends reverse migration. The motivations behind migration flows, and the emergence of new types of migration that transcend the short-term and long-term mobility has make forecasting became complicated and difficult to conceptualize in the contemporary world 20. In previous studies, machine learning methods have shown better performance than conventional models, especially when the data is complicated.
Error! Reference source not found.. Thus, this study to explores machine learning model namely five machine learning models, "extreme" gradient boosting regression (XG Boost model), Random Forest, Linear Regression, Decision Tree and Artificial Neural Network Model (ANN model) in human forecasting.

Method using the Systematic Literature Review
Systematic Literature Review was utilised to obtain related literature on machine learning model for reverse migration. The researcher will review the literatures on reverse migration modelling and machine learning to identify factual studies published to the current date. This section outlines the systematic literature review methodology to provide a suggestion of machine learning model for reverse migration. Figure 2 shows the phases of systematic literature review method used in this study. There are four phases in systematic literature review method (see below figure 2).  At this phase, published papers that related were obtained from the two search engine databases.The search terms and keyword were specified for each database including keyword like" reverse migration"," artificial intelligence"," machine learning model for migration"," population migration in Malaysia"," de-urbanization" were used during the search. The search string is summarised as Table 1.

Phase 2: Screening of the Identified Literature
At this phase, the identified literatures which include 50 studies were screened to suit the context of machine learning model in reverse migration modelling. 16 literatures suit the topic of this paper; the machine learning model in reverse migration modelling. The screening was done to exclude any literatures duplicates (similar authors, similar research title).

Phase 3: Eligibility and Exclusion
Next is the eligibility and exclusion phase. In this phase, the remaining 16 literatures were reviewed thoroughly and 9 full text articles were excluded by with reason such as the literature did not have information on machine learning model in reverse migration. At this stage, 7 literature remained relevant and used for the Systematic Literature Review. Figure 3 show the flow diagram for systematic literature review. The data in Table 2 was analysed and abstracted. A systematic review was tabulated with items checklist, author(s), title of publication, year of publication, country or region of study and machine learning model. The machine learning tools checklist of items was categorised into three (3) items namely issues, frequency used of algorithm and finding. First, the machine learning tools checklist was categorized into issued related with human migration prediction in machine learning model. Next, frequency of used machine learning algorithm which is namely into five machine learning model Random Forest, Gradient Boosting Regression, Artificial Neural Network Model, Linear Regression and Decision Tree. Last categorized is finding, to show the best algorithm and machine learning model in reverse migration. Based on finding in table 2 show the most frequently used in algorithm Random forest, Linear Regression and Decision Tree.

3.4.Phase 4: Items Abstraction
Final phase is abstraction. In the abstraction phase, the frequently used model was modelswere reduced from five (5) to three (3) tools. The reduction was based on two reduction factors. First, the machine learning model's suitability for reverse migration, and second, the frequency of the suggested model from the study. This abstraction later formed the suggested machine learning model and it was categorised into three (3) algorithms. These three (3) algorithms machine learning model were deemed as suitability model for reverse migration, categorised into Random Forest, Linear Regression and Decision Tree. The abstraction data was discussed later in the below section.

The Suggested Machine Learning Tools Use in Reverse Migration
After examining the studies made by the other researchers, the reviewers gathered all data to analyse and to know the most frequently used algorithm by other researchers in determining machine learning model to predict reverse migration. Based on the finding in Table 2, reviewers recommend some suitable machine learning algorithms chosen, explain the description of model, and show the most frequently used algorithm namely Random Forest, Linear Regression and Decision Tree. This machine learning model is outperform human and traditional way for forecasting reverse migration. Meanwhie, Section 3.1,3.2 and 3.3, explain in detail each of the suggested machine learning algorithms that is suitable to be adapted in reverse migration modelling aspect.

Random Forest
According to Izenman, 2006 as cited in Mohd 6one of advance tree structures from Decision Tree is Random Forest. Bootstrap, Bagging or Aggregation is a type of ensembled machine learning model. Random forests are an effective tool in prediction 24. The bootstrap is efficient statistical method for estimating a quantity from a data sample such as the mean. The Random Forest model takes a lot of data samples, calculates the mean, andthen combines all the mean values to provide a better approximation of the true mean value. Each time a split in a tree is considered, a random subset of predictors is chosen from the full set. At each split, the number of predictors equals the square root of the total number of predictors. For each division, the number of predictors is equal to the square root of the total number of predictors 21. The number of trees built on bootstrapped training samples must be sufficient to settle down the error rate 26. Four parameters are used for Random Forests: the sample size, the number of trees, the number of features chosen for each division, and the depth of the tree21. The better performance is achieved with a large number of trees in the forest but the optimal number is achieved when the minimum error cannot be reduced even with an increasing number of trees. The depth of the tree and the size of the subsample have a similar effect to the performance of a Random Forest, so it is sufficient to tune either of these 21.

Linear Regression Model
One of the most understood and well-known algorithms in machine learning and statistics is linear regression. It is also a forecasting model that focuses on minimizing the error and ensuring or making the most accurate and reasonable forecast in explaining the ability of the dataset 6. A standard linear regression is used to explain the relation between a set of predictors and a variable in response. It is a very easy method for numerical predictions and serves as an elementary unit for more complex models Error! Reference source not found..
Linear Regression model also used to predict and analyze internal migration pattern in USA 23.

Decision Tree Model
Another common model used to solve regression and classification problems is the Decision Tree 11.The algorithm generates a tree structure consisting of a root node and branches. Each internal node represents an attribute test, each branch denotes the result of a test called a decision node, and each leaf node contains a class label called a terminal node. Decision Tree model was used to predict people movements within Russian regions 9. The most accurate model of predict on people movement was using the decision tree method. An example of the decision tree is seen in Figure 4.

Figure 4 Decision Tree 27 5 Conclusion and Recommendation
This paper presents a systematic literature review on the machine learning of the reverse migration models for population prediction. This paper reviews established references in determining a suitable test and best practice of the machine learning performance before developing a significant ML for the reverse migration models. This machine learning model provides a better performance in forecasting reverse migration rather than traditional way. Moreover, as there is a high-resolution on socio-economic data that are increasingly available in countries recording on human migration flows, it is possible to use human migration machine learning models rather than conventional gravity or radiation models to support demographic database in future. This paper adopted the systematic literature review (SLR) techniques in assisting the selection of significant models that suitable for the reverse migration. The screening of related reviews from Scopus and Google Scholar searching engine, this paper disclose n=50 databases, then excluded of n=43 databases which is duplicate, similar author and similar research tittle before shortlisted into n=7 reviews through the selection processes identification, screening, eligibility and abstraction. The final result led to identification of the three types of algorithm models namely decision tree, random forest and linear regression as significant models for this study. Thus, this paper acknowledges SLR as a reliable reviewing mechanism to assist other researchers especially those who want to implement a machine learning model in their case study related to reverse migration modelling. As for recommendation for future study, the researchers aim to apply Machine Learning (ML) model in reverse migration prediction. Due to the industrial revolution 4.0, many important issues in the industry can be effectively solved by the ML model since it is designed to effectively predict and classify problems.