A Swarm based Bi-directional LSTM-Enhanced Elman Recurrent Neural Network Algorithm for Better Crop Yield in Precision Agriculture

: Agriculture plays a significant role in providing food in a country. It is a major industry in terms of revenue and contributes to the economic development of a country. Global warming and sudden changes in climatic conditions have hampered agricultural industry creating multiple challenges in crop cultivation affecting productivity of crops. In spite of recent changes agricultural practices, challenges exist. Current technological growths can help overcome challenges in this industry in terms of improving productivity. PF (Precision Farming) is a technological concept that can aid traditional farming practices into becoming more productive. Moreover, traditional methods are advantageous in crop yield predictions, but considering unknown environmental factors makes these methods achieve lesser yields. PF can forecast or suggest the right time for cultivation based on previous known data. DCNNs (Deep Convolution Neural Networks) is one MLT (Machine Learning Technique) that can effectively predict crop growths. Hence, this work aims towards contributions in this area by presenting a short-term crop yield prediction model called RDA-Bi-LSTM-EERNN based on Bi-directional LSTM-Enhanced Elman Recurrent Neural Networks Algorithm with Red Deer Algorithm. The proposed RDA-Bi-LSTM-EERNN algorithm is an altered version of Bi-LSTM-EERNN with RDA based optimizations. This works hybrid method was compared with traditional approaches for its predictive performances using a crop dataset. This work’s proposed scheme can greatly help farmers take valuable decisions as its experimental results were found to be satisfactory.


Introduction
Agriculture, the bread winner for many in India is facing challenges mainly due to lack of knowledge on changing climates.Crop cultivations are based on suitable climates and this lack of knowledge on climates can be overcome with PF.The addition of technology to farming in PF helps in meeting surplus food demands while maintaining higher productivity and yields.
India conditions demand sustainability in agriculture for its exploding population (Mandal &Maity, 2013).Though loss in crop productivity has been reduced, disadvantages of traditional method in farming do exist.Thus, alternatives to traditional farming lies in PF which can help farmers overcome a range of environmental issues.
Agriculturists face two major issues namely the right selection of crops and varying climatic conditions which can be overcome using monitoring and predictions for optimal crop solutions (Mulla & Khosla, 2016).Problems found in current farming systems and technology based solutions include inadequacy of nutrients, effectiveness of algorithms, improper analysis and selection of parameters which affect crop yields.
These drawbacks have been taken into account in this study's proposal which aims to increase crop yields, analyze crops in real-time, select efficient attributes and help make smarter decisions for higher yields.These parameters imply the need for efficient crop prediction algorithms (Medar & Rajpurohit, 2014).
DNNs (Deep Neural Networks) based models have been found to effective in crop predictions/suggestions and from a technological the right choice of agricultural factors can be generated by DNNs for suggesting the right crop to farmers (Khaki & Wang, 2019).The basic objective of crop suggestions lies in identifying crops capable of high yields and minimizing crop losses.
Prior suggestion of crops was specific to regions, characteristics of soil, and other factors.Moreover, the accuracy of crop predictions varied based on the chosen algorithm, making it imperative to choose suitable algorithm or features based on favourable conditions that maximize accuracy of crop suggestion.RNNs (Recurrent Neural Networks) have also been to be effective in predicting crop yields.Hence , the main motivation of this work lies in presenting accurate crop yield predictions/recommendations.The contributions of this work are detailed below: • Compilation of historical data on crop production and climate for pre-processing.
• Proposing a prediction model, RDA-Bi-LSTM-EERNN, for crop recommendations which is optimized using RDA using biases and weights.
Following this introductory section, a detailed review of literature related to this study is presented in section two.Section three details on the proposed hybrid deep learning model for crop yield predictions and is followed by a display of its results in section four.This paper concludes with future work in section five.

Related Work
BNNs (Bayesian Neural Networks) were used in the study (Ma et al., 2021) to predict corn yield predictions of the country.The study used publicly available multiple data sources including timeseries satellite information, soil properties, climatic observations and countrywide corn yield history.The proposed scheme was a robust framework which predicted crop yields for a season while projecting the need to account for environmental stress on agricultural productivity and crop yield estimations deeply.The study in (Zhong et al., 2019) aimed at using DLTs (Deep Learning Techniques) for classifying remotely sensed time series crop data.Experimentations on Yolo County crops of California where diverse irrigation forms exist prioritized economic crops.The study's classification procedure for summer crops used EVIs (Enhanced Vegetation Indices) of time series data and two DLTs namely LSTM (Long Short-Term Memory) and Conv1D (uni-dimensional convolution layer).
CNNs (Convolution Neural Networks) a DLT used in image classification tasks was used in (Nevavuori et al., 2019) to develop a crop yield prediction model based on UAV's NDVI and RGB data.The proposed CNNs selected training parameters, network's depth, strategy for regularization and hyper-parameter tuning for efficient predictions.The study in (Yang et al., 2019) also used CNNs to learn the important features of rice yields using low-altitude sensed images.Crop varieties having high potential were identified and evaluated by plant scientists and breeders based on historical location wise performances.The study in (Moghimi et al., 2020) facilitated selection of advanced varieties using an automated framework.
Country wise data on crops are prepared in-house and based on region wise crop model implementations.DLTs have the capability to extract important features used for estimations based on input data while reducing dependency on the type of inputs.DLTs were applied in (Kuwata & Shibasaki, 2015) to estimate Illinois corn yields as accurate estimations of yields is primary to ensure food security.MLPNNs (Multi-Layer Perceptron Neural Networks) were used in (Bhojani & Bhatt, 2020) to forecast district level wheat crop yields.The study used an altered MLPNN by proposing a new activation function and revising random weight/bias values for crop yield estimations arrived using weather datasets.
The study in (Murali et al., 2020) aimed at forecasting sugarcane yields from non-linear time series data using a hybrid prediction model.RNNs which hold values in memory for a long time gave the ability to forecast with fewer parameters.The study optimized weights and thresholds of the network using WOA (Whale Optimization Algorithm) for improved outputs and better accuracy while being efficient in forecasts.The study in (Elavarasan & Vincent, 2021) used a hybrid regression-based algorithm, RRFs (Reinforcement Random Forests) for improved performances when compared with other MLTs like RFs (Random Forests) DTs (Decision Trees), gradient boosting, ANNs (Artificial Neural Networks) and deep Q-learning.
Citrus fruits were estimated by counting from images in (Apolo-Apolo et al., 2020).The study developed an automated image processing methodology where the fruits on individual trees were counted along with their sizes for estimations using DLTs.The stydy proved that DLT discriminations could be used for estimations prior to harvesting the fruits.€The study trained using LSTMs for per tree yield estimations.DLTs were also used in (Chu & Yu, 2020) with the BBI-model.BPNNs (Back-Propagation Neural Networks) predicted yields in combination with IndRNNs (Independently RNNs).
The study found that CNNs, LSTMs, RNNs and DNNs were the most preferred DLTs.The study suggested usage of other algorithms for developing crop yield prediction models like LSTM/RNN combination.

Proposed Methodology
This study's proposed Bi-LSTM-EERNN with RDA scheme's architecture is depicted in figure 1.This work assembles historical crop production and climate data which are pre-processed before being used by the proposed RDA-Bi-LSTM-EERNN scheme for crop recommendations.This study uses RDA to determine optimality in the architecture of Bi-LSTM-EERNN where cellular structures are refined.. Climate data was obtained from https://www.timeanddate.com/weather/india/new-delhi/historicandwhile crop production details were fetched from the Link: https://data.world/thatzprem/agricultureindia.The obtained data included 6 years time series data had many measured parameters available including irrelevant ones as per this study.Hence, in the preprocessing stage, less relevant features were ignored and only relevant ones were considered.The historical information from two sources were preprocessed and combined in this study.Further, the unified data was split into 60/40 implying 60% was used for training while the remaining was used for testing the proposed model's accuracy.

Proposed Classification of Data for Crop Yield Predictions/Recommendations
The proposed RDA-Bi-LSTM-EERNN scheme aimed at achieving 90% used the unified dataset for crop yield predictions.Initially NNs train the model , Bi-LSTM-EERNN for identifying useful data features from the dataset and for understanding temporal information from subsequent datasets output in the work.This developed model is then optimized in terms of the weights/biases.This followed by evaluations of the trained model by testing it on predefined dataset.Assuming the given dataset  = {(  ,   )( = 1,2, . . ., )} where in   ∈   and   ∈   and  is count of data samples. is divides into a training subset  1 = {(  ,   )( = 1,2, . . ., )} and testing dataset  1 = {(  ,   )( =  + 1,  + 2, . . ., )} which are normalized.Bi-LSTM-EERNN model predicts crop yields.The Bi-LSTM-EERNN architecture is depicted in figure 2.
Repeat until defined optimal loss value or before overfitting Where, ℎ  -hidden vector sequence,  -weight matrices ( ℎ matrix of the weights connecting input layerto the hidden layer),  -bias, and  -Hidden layer's activation function.Equation ( 1) depicts connections between previous and current hidden states, thus implying EERNNs use prior values/ environments.Hidden layer's' each output neuron in time (t-1) is saved (context neurons) and used in time (t) along with initial input to the hidden layer.Thus, context neurons during propagations of recurrent connections are used in parameter updates at time (t).Thus, the network summarizes prior inputs.EERNNs however fail to summarize historical data effectively due to the issue of vanishing gradients (Vorontsov et al., 2017).Overcoming this issue requires operations in dual directions like Bi-LSTMs (Wang et al., 2015) where past and future dataset features are used in propagations.The network has two distinct hidden layers where the first computes forward hidden sequences ℎ  ⃗⃗⃗ followed by backward hidden sequence ℎ  ⃖⃗⃗⃗ and combines the two to generate the final outputs   .Assuming LSTM block' hidden state is ℎ, then Bi-LSTM can be implemented using the following equations: This work uses a RDA variant to optimize proposed Bi-LSTM-EERNN parameters for classifying crops for better outcomes in terms of classification accuracy.

Objective Function (OF):
This work optimizes network's weights and bias for reducing error rates and effectively enhancing the accuracy of crop yield predictions.This study optimizes weights and bias values at each iteration while training the network.MSEs (Mean Square Errors) can be computed using: Where,   -Desired value,   -Predicted value, and  -feature count.RDA takes MSEs as inputs while outputting weights and biases.

RDA-Bi-LSTM-EERNN:
This work's RDA method was randomly initialized mimicking RDs (Red Deers).The number of optimal RDs are ''male RDs'' while remaining deers are ''hinds.''Male RD's roar based on their capacity to roar and can be one of Commander or Stag.Commanders build harems i.e. number of hinds in a harem is based on the commander's roaring ability and capacity to fight.The commander also mates with hinds while a few stags also mate with nearby hinds (Fathollahi-Fard et al., 2020).The flow chart of RDA is depicted in Figure 3.The proposed approach can be typically defined as an optimization of continuous variables without constraints.Mathematically RDA can be used resolve minimization issues.Three main features are used for explorations and exploitations in this work.Alpha (), Beta () handles diversifications while Gamma () helps balance intensifications.All these parameters lie in the interval [0,1].

Figure 3. RDA Flow for Optimizing Hyper-Parameters of Bi-LSTM-EERNN Initial Rd Generation:
The main objective of optimization is identifying near-optimal solution using variables.GAs (Genetic Algorithms) use chromosomes which is RDs in RDA where RDs can suggest better possible solutions in a solution space.Assuming a solution S's dimensionality is   , then its dimensionality optimization of weights and biases in RDA, a 1 ×  array can be represented as Equation ( 4): And the functional value of all estimated RDs can be : The initial population of size   is invoked for choosing optimal RDs (  ) while the balance RDs are  ℎ ( ℎ =   −   ).Hence, the number of   depicts an elitist condition or maintains intense QoS constrains, while  ℎ depicts diversifications.

Roar of Male RDs:
Male RDs enhance their efficiency by roaring which may also be ineffective certain times.Since, RDs are optimal solutions in a solution space, male s are identified by enabling them to alter locations using Equation ( 6): Where,  01 -male 's present location,   -updated place and  1 ,  2 and  3 are randomized processes in the interval [0, 1].
Male RD roars are for extending their territory, but with random movements.The demonstration of the male RD roaring process is demonstrated as M and N which occurs commonly.A new location of M is approved based on the objective fitness of a solution when it is better than the previously found solution while for N, the new solution cannot be accepted.-axis depicts the objective functions while male RD locations are in the x-axis.
Selection of  % of Male RDs as Commanders: Variations exist amongst male  where a few of them are attractive, energetic or effective in their expansions.RDs can thus be classified as commanders or stags.The number of commanders can be determined using Equation (7): Where,  males count and  -initial approach value in the range (0 , 1) .The number of stags can be found using Equation (8): Where,   -stags count based on male population.The RD's population is the sum of commander, stag and hind counts.In spite of solutions in terms of male RDs they are constrained by UBs (Upper Boundaries) and LBs (Lower boundaries) in the search space.

Male Commander and Stag Fights:
Assuming commanders stags fights are randomized two new solutions can be obtained which are interchanged with the commander for an optimized better solution amongst 4 solutions i.e. two new solutions, commander, stag.The fights can be depicted as mathematical equations ( 9) and (10): Where  1 ,  2 -novel solutions resulting from fights, Comdsolution from the commanders and Stag -solution from stags.For a novel solutions UBs and LBs constrain the solutions in a search space.Randomizations of fights result in y1, y2 from uniform functional distribution in the interval [0,1].The optimal solution amongst the four is identified by the objective function.Every fight has a winner (high energy) and a looser (Low energy).The final result of objective function is the highest solution where  1 becomes the latest commander.

Formation of Harems:
The number of hinds in harems is based male commander energies where their effectiveness is determined by the objective function.The hinds are divided amongst commanders to form harems and depicted as Equation (11): Where,   -n th commander's energy and   -normalized value of commanders using Equation (12).
Male commander's normalized energy can be defined as hinds that can be occupied by male commanders.Harem's hinds count can be evaluated using Equation (13): Where, .ℎ  -Number of hinds in n th harem and  ℎhinds count.The hinds are classified using .ℎ  and selected randomly from hinds count.Thus, commanders with optimal fitness acquires massive count of hinds.
Commander Mating with  Percent of Hinds in a Harem: All species in the world undergo mating as a natural process for generating new off springs.Mating is performed by a commander Rd with  hinds and defines as Equation (14).
Where, .ℎ   -number of hinds of n th harem which mate with commanders In the solution space .ℎ   of .ℎ  is selected randomly.

Mate Commander of a Harem with 𝜷 Percent of Hinds in Another Harem:
The harem is selected in a random manner and male commander mates with  number of hinds.Thus, the commander attack to other harem for grabbing the opponent territory and expand the values.Also,  shows an initial parameter of this approach.The count of hinds in harem mates the commander can be determined using Eq. ( 16) Where .ℎ   denotes the value of hinds in k-th harem, that mate with the commander.

Stags Mating Process with Nearest Hinds:
Stag choose nearest hinds for mating.During the breeding season, male RDs desire to mate with their favorite hinds without harem territory assumptions.This identification of closest hind's distance from a stag in a -dimension space can be formulated as Equation (17): Where,   -i-th hind/stag's distance.Lower values in a matrix depicts selected hinds after which mating occurs as handled by Equation ( 15) and alternatively stags can be applied instead of a commander.

Next Generation Selections:
The selection of the next generation is based on 2 principles.In the initial phase all male RD (commanders and stags) are retained.This is followed by hinds and production of children based on fitness values.As these approaches are familiar, related arithmetical formulation is not needed.
Stopping Criteria: Since, this work involves weights and biases in iterations, optimal solutions can be identified within a specific period of time.The parameter and objective spaces of RDA is depicted in figure 4. Sort males while forming stags/commanders as per (7) and (8) for each male commander Fights between commanders/stags based on (9) and 10) Update male commander/stag positions end for form harems as per (11) (12) and for each male commander as per (13) mate a male commander with his harem's selected randomly as per (.15) randomly select a harem named k as per (16) mate male commander with selected hinds of the harem as per (15) end for for each stag compute stags and hinds distances and select the nearest hind and mate stag with selected hind end for select the next generation update  * if there is a better solution  2 = ;  =  1 −  2 end while return  * as the best value of weight and bias When the evolutionary generation value reaches a maximum, the process will stop and the latest weight and threshold values will be extracted Otherwise, stepswill be repeated.
The obtained weight and threshold are applied to Bi-LSTM-EERNN, and after training, it will be able to reach the desired accuracy or condition.

Experimental Results and Discussion
This section provides the performance evaluation of proposed research methodology, here the proposed research method RDA-Bi-LSTM-EERNNfor crop recommendation system is compared with existing research techniques namely DTs, KNNs, RFs, NNs, PSO-MDNN and ACO-IDCNN-LSTM.The performance of the proposed methodology is compared and verified by using the metrics namely accuracy, precision, recall and f-measure.These performance measures are based on: correctly classified positives TPs (True Positives); if classified as negatives FNs (False Negatives); classified as negative considered as TNs (True Negatives) and if classified as positive FPs (False Positives).
Precision: Proportion of positive crops correctly classified to the total positively predicted crops total given by:  =  + (18) Recall: Proportion of correctly classified positive crops to positive sample counts given by:  =  + (19) F-measure: also the  1 -score is the harmonic mean of precision and recall given by: Accuracy: Common measure of classification performance and the ratio between correctly classified crops to the total number of crops:   respectively.The proposed work's optimal solution selection based on RDs fitness holds minimum errors thus improving its recall value.

Accuracy comparison Figure 8. Result of Accuracy
From the above figure 8, the graph explains that the accuracy comparison for the number of datasets in specified datasets.The methods are executed such as Dec-Tree, KNN, R-Forest, Neu-Net, PSO-MDNN, ACO-IDCNN-LSTM and RDA-Bi-LSTM-EERNN.In x-axis the number of datasets is considered and in y-axis the accuracy value is considered.

Error Rate Figure 9. Result of Error Rate
From the above figure 9, the graph explains that the error rate comparison for the number of datasets in specified datasets.The methods are executed such as Dec-Tree, KNN, R-Forest, Neu-Net, PSO-MDNN, ACO-IDCNN-LSTM and RDA-Bi-LSTM-EERNN.When the number of datasets is increased and the error value is decreased correspondingly.From this graph it is learnt that the proposed RDA-Bi-LSTM-EERNNprovides lowererror results which is 2.3996% than the previous methods such as Dec-Tree, KNN, R-Forest, Neu-Net, PSO-MDNN and ACO-IDCNN-LSTMproduces9.4257%, 11.8596%, 8.2820%, 7.0488%, 5.0158% and 4.3333%.Thus the proposed algorithm is greater to the existing algorithms in terms of better crop recommendation prediction results.

Figure 10. Result of Time
From the above figure10, the graph explains that the time comparison for the number of datasets in specified datasets.The methods are executed such as Dec-Tree, KNN, R-Forest, Neu-Net, PSO-MDNN, ACO-IDCNN-LSTM and RDA-Bi-LSTM-EERNN.When the number of datasets is increased and the time is increased correspondingly.From this graph it is learnt that the proposed RDA-Bi-LSTM-EERNNprovides lowererror results which is 12m than the previous methods such as Dec-Tree, KNN, R-Forest, Neu-Net, PSO-MDNN and ACO-IDCNN-LSTMproduces 21m, 24m, 23m, 20m, 17m and 15m.Thus the proposed algorithm is greater to the existing algorithms in terms of better crop recommendation prediction results.

Figure11. Consolidated Results for Class Balanced Datasets
The figure 11 show the consolidated results of accuracy, precision, recall, f-measure and error rate.From the results, RDA-Bi-LSTM-EERNNis more efficient than Dec-Tree, KNN, R-Forest, Neu-Net, PSO-MDNN and ACO-IDCNN-LSTMas shown in figure, because the result of accuracy, precision, recall and F-measure is higher than these existing methods.Finally, in all datasets pruned stacking attains high results, the reason is that it can harness the capabilities of a range of well-performing models on a classification task and make predictions that have better performance than any the existing methods and produce better crop recommendation as shown in figure 12.

Recommended Crop
Non-Recommended Crop • Potato Figure12.The Output of Crop Recommendation using RDA-Bi-LSTM-EERNN

Conclusion and Future Work
The proposed RDA for optimization of and biases of the Bi-LSTM-EERNN model was used to suggest a crop recommendation method for classifying crops in this work.This change had a positive impact, as the quest agents' positions were revised with an additional best solution.The aim of using meta-heuristic methods with a neural network is to optimise the NN m's output in general.The results showed that the proposed adaptation significantly improved crop yield efficiency.The RDA-Bi-LSTM-EERNN was compared to several proposed models, including DTs, KNNs, RFs, Neu-Net, PSO-MDNN, and ACO-IDCNN-LSTM, based on the obtained results, and the RDA-Bi-LSTM-EERNN provided an accuracy of 97.6004 percent and outperformed some other algorithms; it also implies that the RDA-Bi-LSTM-EERNN classification results are statistically important.This degree of precision of RDA-Bi-LSTM-EERNN shows it is more robust when it comes to over fitting and local minima problems.There is a plan to test more network architectures and evaluate the algorithms on larger datasets in the future to demonstrate their robustness.Other deep learning models, such as the Deep Reinforcement Learning model, are also available to researchers.

Figure 2 .
Figure 2. Bi-Directional LSTM-EERNN Architecture As per the above Figure Bi-Directional LSTM networks step through input sequences bidirectionally.The altered ERNN model, incorporates time delays on signal input (( −  )) where network's weights are classical ERNN weights.Assuming input features are represented as { 1 , . . .,   }, the Enhanced ERNN computes output vectors (  ) of input vectors (  by repeating the following equation for  = 1  :

Figure 4 .
Figure 4. Parameter and Objective Spaces of RDA Algorithm1: Red Deer Algorithm based Optimal Network Structure Design of Bi-LSTM-EERNN Input: Set initial values of Bi-LSTM-EERNN parameter, MSE of the RD population Output: Selection of network's Optimal weights/bias Compute fitness (), arrange them and form hinds ( ℎ ) and males RDs (  )  * = ℎ    1 = ;

Figure 7 .
Figure 7. Result of F-Measure Rate