Task Failure Prediction in Cloud Data Centers Using Deep Learning

Main Article Content

Dr. Rafath Samrin Abdul Kareem Abid,Majed Mohd Taher, Syed Abdul Rahman Ahmed


A large-scale cloud data center must reduce the likelihood of failure while simultaneously providing high service dependability and availability. However, modern large-scale cloud data centers continue to experience high failure rates due to software and hardware flaws that frequently cause task and job failures. Such failures may have a significant negative impact on the dependability of cloud services and necessitate significant resource restoration. Task or work failures must be accurately predicted prior to their occurrence in order to reduce unexpected waste. Evaluation of previous system message logs and recognition of the relationship between the data and failures are two methods that have been published that use machine learning and deep learning to predict task or job failures. We present a cloud task and job failure prediction strategy based on multi-layer Bidirectional Long Short Term Memory (Bi-LSTM) to improve the accuracy of machine learning and deep learning-based failure prediction systems. The Bi-LSTM prediction algorithm determines the success of jobs and projects. With 93% and 87% accuracy for task failures, respectively, our method outperforms current cutting-edge prediction algorithms in trace-driven experiments.

Article Details