Prediction Tourist Visits With Multiple Linear Regressions in Artificial Neural Networks

Article History: Received: 10 November 2020; Revised: 12 January 2021; Accepted: 27January 2021; Published online: 05April 2021 Abstract:Tourist visit is a topic of discussion that has been much researched by previous researchers in conducting a prediction process. Many prediction models have been produced that refers to the use of several methods to obtain output in the form of information that is needed by the tour manager. Judging from the results of the study, it is still only focused on the discussion in producing output without testing the correlation of variables used as predictors. The problem in this study is how to predict the number of tourist visits by using Multiple Linear Regression (MLR) as a correlation test predictor variable and Artificial Neural Network (ANN) as a calculating machine in making predictions. The implementation of these two methods is very suitable to be used in terms of prediction, where the MLR method test results show the correlation of predictor variables used namely xxx correlation. Then in the prediction process that has been done to produce output with an accuracy value of xx%, the value of MSE xx% and RMSE is xx. Therefore, this research will be useful for managers of the tourism sector so that the goal achieved from this research is to assist the tourism office in seeing how many visits will occur in the next period.


Introduction
In the tourism sector, Indonesia has many attractions which become a destination for tourists to fill their spare time, both local and foreign tourists.One of the attractive provinces is a tourist destination, West Sumatra, where the province has a lot of attractions offered by natural beauty, therefore it becomes an attraction for tourists to travel.The improvement and development of the tourism sector are of particular concern to local governments and tourism object managers intending to increase the number of tourist visits.In this case, it can be seen from the growth of tourist visitors in West Sumatra Province, one of them in Padang city. The growth rate that occurs around 2.6% per year based on data analysis of the number of visits that occur from several tourist destination objects and this indicates that the potential of tourism West Sumatra province is in great demand [1].
To observe the growth rate of the number of previous tourist visits, we need a prediction process that will occur in the future period. The process can be carried out using several models based on visit data that occurred in the previous period.Many models and methods have been used in previous studies in the prediction process to obtain information.The prediction process using historical data to see industry tour visits by using Soft Computing-Based Gray-Markov Models shows that the prediction model with Soft Computing Based gives better results than the Gray Markov model [2]. The prediction model presents an approach in forecasting a location using a data model and is based on a pattern technique [3]. In a model that certainly provides supervised learning, this model can be generalized with new data with the approval of many samples to improve performance and evaluation models [4].In other studies, the prediction process is also carried out to determine the number of tourist arrivals. The results of the study, showed that the source of big data information can be significant to be able to improve the prediction work process by using tourist arrival variables from the data obtained in the previous period [5]. In the results of other studies conducted to predict the number of tourist visits using data from internet search results provide a causal relationship for a tourist to travel and visit [6].
There are many methods that can be used in conducting a prediction process, one of which is the Artificial Neural Network (ANN). This method is a method for learning a network model formed from a database. Prediction results produced with ANN will be able to provide more accurate prediction results by estimating tourist arrivals. This method is able to improve predictions of tourist visit arrivals [7]. In the same method, it also explains that ANN can also be combined with Genetic Algorithms (GA) to formulize a prediction model for aircraft ticket sales revenue. The results of the discussion in this study resulted in relatively low errors obtained so that it gives good predictive results [8]. ANN is a prediction tool that can be used in various fields [9].
Based on some of the research results previously explained, the discussion is still focused on the prediction process.In other words, the purpose of the study is only to discuss the results of making predictions from the use of the model and the method used. In this study, the discussion also conducted a prediction process, but before the prediction process is carried out, the process will begin by testing the correlation of some of the predictor variables used. To test the orrelation of these variables, the method to be used is the Multiple Regression Linear (MRL) method.
This method is able to test the relationship between several variables (x) and results (y). From previous studies using the MRL method aims to verify the variability and correlation between morphological and agronomic characteristics in the population. This will later be used in indirect selection to see the population of Synthetic corn [10]. In the discussion of a study, that correlation is needed especially in studies that require understanding of certainty and the extent to which the variables show a reciprocal relationship [11]. To observe the results of the regression process in analyzing variable correlations, it is found that forecasting tourism demand can be used for decision making in the use of related variables and not only relying on a model or method in conducting prediction processes [12]. Other research that conducted the process of predicting tourism demand in Turkey, showed that the MLR and SVR methods had better performance so that they were able to provide demand prediction results with the best accuracy results [13].
Both methods will be used later to be able to produce better results in carrying out a prediction process. In previous studies that have combined neural network methods with multiple linear regression in the case of forecasting daily water discharge using hydrogeological and climate variables as predictor variables [14]. The same research in the process of predicting bank performance using the Neural Network method with Multiple Regression Linear explains that MRL is used to find effective predictor variables in making predictions using Neural Networks. Prediction models by combining the Regression method and ANN are able to produce minimum errors and provide a high degree of accuracy [15]. Based on the results of the discussion and testing of the MRL method found two predictor variables with a value of about 60.9% of the total variation in the data and an Error (MSE) value of 0.330. So the prediction process with Neural Network is a strong method in predicting bank performance [16].
The discussion in this study aims to a prediction process of tourist visits that occur in the province of West Sumatra by using the MLR and ANN methods to get the prediction results that have the best level of accuracy. This prediction process will also be able to have benefits to the West Sumatra provincial government to develop the tourism sector based on the number of visits that will occur in the next period from the prediction results obtained by using MLR and ANN.

Multiple Regression Linear (MLR)
MLR is one of the many techniques widely used to analyze multivariate variables [17]. Theoretical models of multiple linear regression analysis can be categorized into two types, namely, research models between independent variables and variables, and research models between independent variables and variables independent [18]. Regression analysis is known as a complex mathematical-based method that will require time to apply. At present, with many improvements made in certain software, the implementation of regression analysis has been simplified, although not completely [19]. Multiple linear regression (MLR), known as multiple regression, is a statistical technique that uses several explanatory variables to predict the outcome of the response variable. Multiple regression is a powerful technique used to predict the value of unknown variables from the value of two or more known variables -also called predictors [20]. The MLR method in this study is based on the use of variables used in the calculation process with linear functions. Linear functions are shown in Eq. 1: Y is the response variable, Xi (1, 2, 3 ... k) is the explanatory variable, βi (1, 2, 3 ... k) is the regression coefficient and ε is the residual error [21]. To develop multiple linear regression equations the parameters obtained from the training data and variables are extracted from the dataset using correlations. The quantity r, called the linear correlation coefficient measures the strength and direction of the relationship between two variables. The mathematical formula for r in the form of equation 2 below [21]:

A. Artificial Neural Network (ANN)
ANN is a form of mathematical model designed based on human biological neural networks. The biological nervous system in question consists of groups of neurons that are interconnected and process information using a computational approach [22]. ANN can have a single or multiple layers and consists of processing units (nodes or neurons) [23]. ANN that are interconnected by a set of adjustable weights that allow to travel signals to pass through the network in parallel and in sequence [24]. ANN model can be used to ensure a process in the analysis of updated asset conditions, and in accordance with the status of the variable used, with the aim of making a prediction and adding the ability to predict existing and potential problems (errors, failures, production losses) [25]. Artificial Neural Networks have provided many benefits in a study in producing a finding [26]. Artificial Neural Networks (ANN) is a form of technique used to map a relationship within a variable and have a relationship with one another [27].

B. Backpropagation Algorithm
Backpropagation algorithm is a method used to model neurons for error functions [28]. Backpropagation algorithm is able to solve problems in artificial neural networks by using historical data for the success of artificial neural networks [29]. The backpropagation algorithm can be seen from a function to perform the process of calculation, testing and also training the network using historical data to get a conclusion [30]. The performance of the backpropagation algorithm performs the training and testing process using parameters namely the learning rate and the minimum error value. Network training is built using the activity function. After that, the calculation process in back propagation is carried out so as to produce results with maximum accuracy values [31]. Backpropagation algorithm is the most optimal algortma training and can be used as a basis for the process of getting solutions from complexity [32]. Backpropagation is a controlled training algorithm by applying pattern shapes to get the minimum error value [33].

Methods
To illustrate the discussion process in this research, a research framework is needed that can be used to observe the stages of the activity to be carried out. The research framework can be seen in Figure 1:

C. Preliminary Research
The preliminary research contains the issues that will be discussed in the research namely the prediction process to see the number of tourist visits using ANN. Before the prediction process is carried out, the process will begin by testing the prediction variable by using MLR to measure the variables used so that the prediction process will get the best results.

D. Literature Review
In using the literature in this study, the references used are based on previous research from the ANN method to predict and MLR in testing the correlation of variables.

E. Data Analysis
The data used in this study were sourced from data on the number of tourist visits that occurred in the previous period obtained from the West Sumatra Province Tourism Office. After the tourist visit data is obtained, the process will continue to find out the factors that influence the increase and decrease in the number of visits such as inflation rates and Rupiah exchange rates. The form of data to be used can be seen in Table 1:

F. Design Predict Model
The design of a prediction model will use data and variables to forecast the number of tourist visits in the province of West Sumatra. Some variables can be seen in Table 2: In Table 2 it can be seen that 4 variables are used as predictor variables to make the prediction process. The target (y) shown is the estimated value derived from the visit data that occurred in the previous year. After the predictor variable is determined, the research stage will proceed to the process of testing the correlation between variables (x) and between targets (y) to see the level of the interrelation of the variables used. So that the variables used in the prediction process will have a good degree of accuracy. After the correlation testing is done, the prediction process will be carried out using the ANN method. In this case, the process can be made in the form of a model in Figure 2:

Figure 2. Design Prediction Model
In Figure 2 above, it is a prediction model that has been built using the MLR and ANN methods. MLR method is used to perform the process of testing the correlation of predictor variables used so that the results of this process will get the right variables to make predictions. After the predictor variable is obtained from the correlation test, then the prediction of the prediction using the ANN method is carried out in order to obtain the prediction results with maximum accuracy. In this research, the tool in the research discussion later uses SPSS software to test the correlation of variables to the predictor variable used by the MLR method. Then the Matlab software will be used for the prediction process with the ANN method.

G. Correlation Test Process of Predictors with MLR
In the previous explanation based on the prediction model that has been built, the discussion will begin by testing the predictor variable correlation with MLR. Correlation analysis is performed using SPSS software program tools to see whether the variables that have been used are appropriate for use in predictions. The correlation test results are shown in Table. 3: Correlation test is done by bivariate person analysis with the help of SPSS (Statistics Program and Service Solution). From the table above, we can see the relationship between independent variables. For the relationship between the visit variable (X1) with inflation (X2) of -0.082 which means there is no strong correlation between the visit variable (X1) with inflation (X2). For the relationship between the visit variable (X1) with the exchange rate (X3) of 0.101, which means there is no correlation between the visit variable (X1) and the exchange rate (X3). For the relationship between the inflation variable (X2) and (X3) of -0,655 which means there is a fairly strong correlation between the inflation variable (X2) and the exchange rate (X3).
Furthermore, the test continued with MLR (Multiple Regression Analysis) or better known as multiple regression analysis. This test aims to determine the magnitude of the contribution of independent variables to the dependent variable and the significance of the effect of the independent variable on the dependent. The test results are shown in Table.4 and Table. 5 below.

Table 4. Coefficient Determination Test
From the results of the table above, it can be seen that the Adjusted R Square value is 0.648 or 64.8%. This means that the magnitude of the influence or contribution of independent variables on the dependent variable is 64.8%. The rest is influenced by other variables outside this study. From the results of the MLR test, it appears that the visit variable (X1) has a significant positive effect on the tourist target (Y). This can be seen from the sig value of 0,000 <0.05. Inflation variable (X2) has a significant positive effect on target tourists (Y). This can be seen from the sig value of 0.031 <0.05. Exchange rate variable (X3) has a significant positive effect on target tourists (Y). This can be seen from the value of sig 0.00 <0.05. So that the overall results of the test, it is found that all predictor variables to be tested are very influential on the prediction target that will be generated.

H. Designing Predicted Network Architecture
After obtaining a predictor variable from the previous correlation test process, the research stage will be continued to create predictive network architecture for the number of tourist visits that occur in the province of West Sumatra. The following forms of tissue architecture can be seen in Figs. 3:

Figure 3. Predicted Network Architecture
In Figure 3 above, it can be seen that the prediction network architecture is in the form of a 4-3-1 network pattern consisting of input layer, hidden layer and output layer. After the network architecture is obtained, the process will continue to train and test the network pattern. The process of training and testing this network pattern is used to get the best prediction results.

I. Network Training and Testing Process
This stage is a stage that is used to train and test a network that is formed based on network architecture. Basically, the work of the ANN method in making predictions will first carry out the process. In this network training and testing process, the algorithm used is backpropagation. This algorithm is able to train and test the network formed to be able to find the best network pattern and later it will be used in the prediction process. The training and testing process can be done using tools on Matlab software. The results obtained in the training and testing process can be seen in Table.6 below: In Table.6 above, it can be seen that the pattern with the network architecture 3-45-1 is the best pattern for the prediction process. The pattern is obtained based on the Main Average Percentage Error (MAPE) of 0.0071, Main Squared Error (MSE) of 0.0001, Percentage Error of 0.0089% and the Accuracy Value given at 99.9911% of the training and testing processes that have been carried out.

J. Prediction Number of Tourist Visits
After the training and testing process is done and produces the best network architecture pattern, what will be done is the prediction process. The prediction process is carried out to get the results of the number of tourist visits that will occur in the next period. Prediction results from testing can be seen in Table.7 below: Table 7. Prediction result Target  Prediction  Result  403099  161150  46951  187763  46075  39774  496125  640619  18182  315059  781647  636479  80238  192494  902312  738642  792302  661613  892322  767472  889230  788614  992320  675955   Based on Table.7 above, it can be seen that the ANN method using the backpropagation algorithm is able to produce predictive results. These results can be taken into consideration to see an increase and decrease in the number of visits that will occur in the province of West Sumatra. These results are obtained by using the matlab software tool based on the network pattern obtained previously. The results of the iteration graph during the prediction process can be seen in Figure 4 below:  Figure 4 above, it appears that the prediction process carried out is obtained in the 405 iteration process. The iteration is able to find the goals that have been determined as the initial target in predicting the number of tourist visits.

Conclusions
In this study, the conclusion obtained that by using the Multiple Linear Regression method is able to test the validity of the correlation between the variable predictor (X) and the target prediction (Y). From the results obtained that the relationship between the visit variable (X1) and inflation (X2) is -0.082. Then the visit variable (X1) with the exchange rate (X3) is 0.101 which gets no correlation with the number of visits. And the inflation variable (X2) with the exchange rate (X3) obtained by -0.655 also indicates that there is no strong correlation between them.
Although between variables (X1), (X2), and (X3) does not prove a strong correlation between them, but these variables greatly affect the correlation relationship with the target (Y) of 64.8%. This explains that the relationship does not occur as a strong correlation, but rather affects the target (Y). This is because, if an increase in inflation value, it will not affect the interest of tourists to visit.
After the MLR method gives the results of the correlation, the ANN method is used in the process of making predictions using variables that have been tested before. The prediction process is carried out using the backpropagation algorithm. The results obtained that the ANN method is able to forecast the number of tourist visits with a Percentage Error value of 0.0089% and an Accuracy Value of 99.9911%. In the end this study was able to provide initial information for the West Sumatra provincial government in describing the number of tourist visits that will occur in the next time period.

Thank-You Note
In the research presented in this article, we, as the authors, thank you for Mr. H. Herman Nawas and Mrs. Dr. Hj. ZerniMelmusi, SE, MM, Ak, CA, as the Computer College Foundation (YPTK) Padang to have provided us with much assistance and motivation to complete research. Do not forget to say thank you also to the Rector of Putra Indonesia University YPTK Padang Prof. Dr. H. SarjonDevit, S.Kom, M.Kom who have guided us and provided us with opportunities and time in conducting this research.