Main Article Content
Communication using text can be understood by human beings as language plays a vital role in their life. Computers need to understand such human readable languages in order to explore documents in a wider manner. For such cases, machine learning algorithms as well as natural language processing algorithms can be used. Computers understand the human readable language and convert them as machine readable form. Replication of data stands as a most challenging problem in document exploration as the document may contain many repetitive words. The system makes use of classifier named LSTM to remove the words appearing frequently in the entire document. Firstly the text is transformed using NLP techniques followed by feature extraction and finally Bi-LSTM is used to classify the text data. LSTM mainly focus on reduction of repetitive words from the entire document and also maintains the integrity of the document. Bi-Lstm is an effective method to connect two different independent RNN together that helps to follow the sequence of statement front and back path every now and then.Multi directional ways of running the code will help in future prediction by saving the sentence for future purpose also.When the independent hidden states are joined it will be preserved for future and also past search. Although the word count is reduced, content of the word document is maintained in the LSTM method which stands as a major advantage. The experimental results showcase the efficacy of the system.