LeDoCl : A Semantic Model for Legal Documents Classification using Ensemble Methods

Main Article Content

R. Priyadarshini, et. al.


NLP is one of the components of Machine Learning. Topic Modeling is a sub component of information retrieval Information Retrieval is a broad domain research in Natural Language Processing (NLP). This downside has been broadly studied in the perspective of cluster algorithms like K–means and K-fold, that tends to converge to at least one of diverse native issue counting on the selection of format method. To overcome the instabilities and assumptions in existing systems such as Vector Space Model (VSM) and SVD, Semantic based topic modeling (SLDA) and ensemble model with generation and integration is proposed. In the case of topic modelling, instability is visible in two distinct aspects. First, when the topic descriptors are examined over multiple runs. During which there will be considerable change in the term rankings and few terms may appear or disappear completely as well. Next, there could be instability due to the extent to which topics have association with document, through several executions. In the proposed system, ensemble learning comprises of algorithms Kernel Support Vector Machine (KSVM) and Random Forest algorithm which overcomes the instability. The first issue of appearance and disappearance of words between multiple runs is overcome by Gibbs Sampling based Semantic LDA (GSLDA). The second issue of alignment of topics with document is aided by using ESLDA. This ensemble SLDA algorithm show increased accuracy in terms of retrieval and reduced time interval compared to conventional models. The accuracy increases up to 98% using ESLDA compared to SLDA (82%) and term frequency methods (78%).

Article Details