Household Load forecasting using Deep Learning neural networks

Advancements in different types of electrical meters and computing technologies aiding the data collection and sensing of various parameters of the electrical power system has been made possible with the availability of vast amount of electrical data. With the help of such technology and data, statistical prediction of load can be made smarter and more accurate. This can help stop excessive electricity production. With the help of deep learning techniques such as a long-short-term neural network (LSTM), it is possible to build time-series models that map non-linear parameters that can be used for precise memory sequences. An increase in recognition is witnessed in the field of forecasting with a short-term demand. In the field of power system control, it is now considered important. When proper pre-data is available, precision results can be high. Here, we are employing long short term neural network to forecast the load of a sample household.


Introduction
For effective load forecasting of a concise region, previous data is necessary to understand the historical pattern. This estimation is done by employing time series, longitudinal data, or cross-sectional methods. To employ methods that require prior information to be handled, we use neural networks that can use historical data as a source for formulating better algorithms. One such network is a recurrent neural network (RNN). RNNs can remember the past data which can be used to make a decision or an arbitration by analysing the previously given information. Electrical load demand's prediction of a region that can vary from a few hours to few weeks is referred to as short term forecasting. In the past, there has been very few prediction models built for electrical load forecasting.
Most of the load forecasting has been built to predict a short-term output ranging around 2-3 week using Neurofuzzy logic, feed forward neural networks and basic support vector machines. An old methodology was also produced of algebraic modelling for comprehensive forecasting based on regression investigation that implemented a essential survey of the regular regression models, original algorithms, and results of combined classes is exponentially polynomial regression models (Mahendran, 2019; Aroulanandam, V. V., 2019). A few models using the ARMA and hybrid neural models have been employed.
For long time few models using the ARMA, hybrid neural models and fuzzy logic models have been employed. However, majority of the mentioned works employ a year on year resolution of Maximum energy demand (MED) or Total Energy Utilization (TEU) for prediction of load ranging up to a decade. Ascribed to the lack of observations, these estimations are insufficient for informed devising, planning and investing for the utility companies. Genetic algorithm (GA), a heuristic search and optimization technique imitates the procedure of calculation with the purpose to minimize or maximize some function. Nevertheless, majority of the mentioned works employ a yearly resolution of Maximum energy demand (MED) or Total Energy Utilization (TEU) for prediction of load ranging up to a decade.

Figure 1: A Recurrent Neural Network
A scribed to the lack of observations. These estimations are insufficient for informed devising, planning and investing for the utility companies.
Calculation of minimization or maximization of some objective function can be used to find suitable number of lags (Kanaga, 2015;Shankar, 2020) for time series model. Contrariwise, medium and short-term forecasting was not effectively calculated although their basic inputs for the power system scheduling and funds distribution .
In this paper a long short-term memory neural network model is implemented for forecasting power demand of a household for a week. To train this model, real time power statistics are provided by the archives of UC Irvine. The forecasts have a daily rate which makes them highly advantageous for the use by the metering companies/consumers. For the forthcoming weeks, the maximum and daily load demand's meticulous and precise reports forecasts can also be processed on monthly basis. Frequently used Recurrent Neural Networks (RNNs) as shown in the figure 1 performs time series predictions (Hyndman, 2010;Sekaran, 2020)). However, it leads to a back propagation issue that leads to the slow elimination of the gradient descent. To build dependencies between training samples that are long term and surmount this issue, RNN-LSTM is applied which extensively improves the fidelity of the defined model (Ranjeeth, 2019). The prototype produced is calculated and reported to be reasonably exact with an overall Mean Absolute Percentage Error (MAPE) of approximately 110. A forecast of such time interval(a week) requires offline training weekly reason why the mentioned training time is commendatory in the given situation. Although an arduous and a taxing process for a low-end machine, it is gratifying.

Recurrent Neural Network
The past is memorized by the recurrent neural network and its judgments are inspired by what it has experienced and learnt from the earlier period from previous inputs. Basic F.F.N. "memorize" things too, but they memorize things they learnt during training. Remembering elements iterated from previous inputs while generating outputs is what the RNN network does. Every kind of neural network has its way of updating data. RNNs usually take a single or many input vectors which leads to the output(in the form of vectors).The factor that affects the output is not only the weights, because additionally a "hidden" state vector is formed that is formed based on the prior input(s) or output(s). So, the same input could generate a different output put through the previous inputs in the sequence. The figure below depicts an RNN network (Sampath Kumar, 2020; Latchoumi, 2017).
Here ht is the current state, ht-1 is the previous state and xt is the input state.
Formula for applying activation function - Here, whh is the weight at recurrent neuron and wxhis weight at input neuron Formula for calculating the output

___________________________________________________________________________________
Here, yt is the output why is the weight at output layer.

Long Short-Term Neural Network
In a vanilla Recurrent Neural Network, one input layer is enough to pass through the activation layer. In fact, even the hidden layer can be passed through it if a tanh function is used. Application of long short term memory networks enhances onto this and helps add a cell state to the model and gates. This allows an automatic and fundamental solution to a problem of saving or resetting context, across the large flowing data and that does not depend on the field of such framework resets. Long short-term memory systemsusually just called "LSTMs" are trained for long term conditional relationships are a type of extraordinary recurrent neural networks. Their refining was studied and later applied in various fields. Their applications exceed the field of usual convolutional neural networks due to their architectures. LSTM networks were made to deal with problems lingering in fields where long-term dependencies are needed to be dealt with. Collection of data for long periods of time is what LSTM networks can easily do. Every repetitive neural network is a repetitive chain of individual smaller networks or simply modules. Also, this repeating feature might start from a fundamental element/structure, just like a single activation layer.
Each line conveys a whole vector, from the yield from one input node to the further respective nodes. The yellow envelopes are learned neural network layers, while the pinkish circles represent pointwise tasks, like vector summation (Vujošević, 1987). Lines merging indicate connection, while a line forking indicates its substance being duplicated and which goes to various areas.
The primary phase in LSTM is to decide what data is going to bed is carding from the cell state (Tarassov, 2019). A layer called the "sigmoid layer" does this task for us which are also called as "forget gate layer". It looks at ht−1 and xt that proceeds to make the decision to select a number/float somewhere in the range of zero and one. Selecting any number from the cell state Ct−1,a '1' expresses the decision to "totally keep this" while a '0' expresses "totally dispose this." The cell state may incorporate the orientation of the present subject, with the goal that the right data can be utilized. When searching for new subject, one has to overlook the older one. The subsequent phase is to opt what fresh data will be hoarded in the cell position. This has two sections. Initially, a sigmoid layer called the "input gate layer" chooses what data could be reset/saved. Consequently, new candidate data is developed in the form of a vector using a tanһ layer that will be called -Ćt, that could be added to the state (Singh, 2020). To move forward, we will join these two to make an update to the memory. The model that we consider, an addition would be essential of the orientation of the new subject to the cell state, so that the ignored element can be replaced. Now we replace the old cell state, Ct−1, into the new cell state Ct. the past advances previously chose what to do. We increase the previous state by ft. At that point we include it * Ćt and ignore the old state. The new candidate values, considered by measuring the amount have to choose to refresh each state value. On account of the model, this is the place one would really drop the data about the old subject's orientation and include the new data, as one has to choose in the past advances.
Ct =ft*Ct -1+it*Ćt (7) ot= σ[Wo(ht-1,xt)+b0] The target must at output what we are going to get as output (ot) the harvest will be found on our cell state, in a filtered form. Initially, the running of a sigmoid layer which chooses what segment of the cell state we're going to yield. At that point, now place the cell state through tanh (to push the values to be somewhere in the middle of of −1 and 1) and increase it by the yield of the sigmoid gate, so can yield the parts wherever required. For the model, since it just observed a subject, it should yield data applicable to a desired output, if that is what is coming straightaway.

Model Building
Technically, this surrounding of the issue is referred to as a multi-step time arrangement determining issue, given the numerous figure steps. A model that utilizes various information factors might be alluded to as a multi-component multi-step time arrangement anticipating prototype. The building of such type of aprototype could be useful inside the domesticated unit in arranging consumptions. It may likewise become useful on the source unit for arranging power interest for a domesticated unit. This encircling of the dataset additionally proposes that it is helpful to compress the per-sixty second perceptions based on intensity utilization into the everyday consumptions. This is not required, yet bodes well, as we want information on total load for each day.
An estimation of this sort will be contained of 6-7 qualities, one for every 24 hours of the 7 days ahead. For assessing time phases independently, regular multi stage prediction needs to take place. It is useful because: it remarks over expertise at a forwarded time (for example +1day versus + 3days). For differentiating such prototypes that are dependent on their aptitudes at various newer times (for example prototypes working well +1day versus the same working well at days+5). Kilowatts is the unit of the complete force that is helpful to utilise an error measurement unit that was additionally in similar units. Root Mean Squared Error and Mean Absolute Error (RMSE&MAE) are used here, even though RMSE's utilization is high and is embraced right now. Dissimilar to mean absolute error, root mean square error is more rebuffing of figure mistakes. The efficiency measurement unit for this issue will be the RMS error for each forwarded time from the first to seventh day. As an alternate way, it might become helpful for outlining the presentation of a model utilizing a grade to associate in model choice. A reasonable grade that can help utilize can have values of RMSE over completely forecasting days.
Also, the utilization the initial 36 months of information for preparing prediction prototypes and the final year required to assess it is done. Weekly division of information in each dataset is practiced where the first day being Sunday and last being Saturday. The force utilization for the week ahead can be anticipated making this a promising and a helpful way for utilizing the picked encircling of the prototype. Additionally, useful with demonstrating, it is where such prototypes are utilized to foresee a day (for example Monday) otherwise with the entire arrangement. The test classifier dataset would be worked in reverse. The first and last 12 months of the information is from Dec 2006 to Nov 2010 from Sceux, France (Georges, 2016). The main Monday for 2010 was January 2nd. The information finishes in 2nd week of November of 2010 and the nearest last Monday in the data is November 22nd. This gives 11.5months of test data (Goldberg, 1987). This is the place our prototype is planned to make a 7-dayprediction; at that point, real information for those 7 days is caused accessible for the prototype which makes it utilizable for assessment on the ensuing week. This is sensible as the prototype might be helpful and become useful for the prototypes permitting to utilize the most qualifiable accessible information. By working with neural systems, and all things considered, they are commonly delayed preparing however quick to assess. This implies the favoured use of these prototypes is to manufacture these for one time on authentic information also to utilize these for conjecturing walk-forward validation's individual progression. These prototypes are static (for example not refreshed) when their assessment is done. These are distinctive against different prototypes that are quicker to prepare re-fitting or refreshing of a prototype after each progression of a stroll forward approval as new information is made accessible. After preparing the dataset by dividing it into train set and test-set, the train set of the dataset is fed to the developed LSTM module and thus, train the neural network. Now, for evaluating the input, the code is written in a manner that the input is loaded as a week's data, two, three or more in the form of number of days in that week. The input graphical representation is show as given below:     Line Plot of RMSE per Day for Univariate LSTM with Vector Output has been shown in the figure below -The specific outputs may deviate given the stochastic nature of the contrivance (algorithm). We have tried running the example 7 times. The graphical input shows how the data was consumed by the household and its variations in different areas of the graph show the seasonal consumption change. This was employed using LSTM and the output was produced for the next 7 days of a given date.

Conclusion
We employed LSTM as an example to show how long-term data can be handled by a neural network which the fast paced RNN network could not. The output, though only shows the prediction of the next 7 days can be very useful to the consumers to handle their electricity bills, lower their consumption and plan the usage of their electrical equipment in the future accordingly. The users will not only be able to predict the output of their future consumption, but will also be able to analyse the areas where they will need to alter their routine in order to get a safer, more efficient utilisation of their machinery/electrical appliances.
If temperature, sag, weather conditions were also noted in a similar manner as the load is noted from the system, and for a longer duration than just 4 years; the accuracy of the output would be possible to be increased to an applicable limit. Load forecasting of substations can also be possible if proper data is available. Data handling with a better accuracy and precision with lesser time durations in power stations needs to be practiced habitually and ethically for the application of automation using LSTMs and CNNs in the field of power system analysis, operation and control.
Most of the wastage of power is due to the mishandling and mismanagement of electrical loads. Even though, the household loads aren't as big a problem as the industrial wastage is, if every household was able to analyse and find out how they could reduce the consumption of electricity and monitor how their future consumption could be changed, the Earth would be able to preserve a lot more of its non-conventional sources of energy allowing a greener, less polluted environment with a healthier place to be a part of.