Optimizing Text Categorization for Indonesian Text Using Clustering Label Technique

Main Article Content

Syopiansyah Jaya Putra et.al

Abstract

Text Categorization plays an important role for clustering the rapidly growing, yet unstructured, Indonesian text in digital format. Furthermore, it is deemed even more important since access to digital format text has become more necessary and widespread. There are many clustering algorithms used for text categorization. Unfortunately, clustering algorithms for text categorization cannot easily cluster the texts due to imperfect process of stemming and stopword of Indonesian language. This paper presents an intelligent system that categorizes Indonesian text documents into meaningful cluster labels. Label Induction Grouping Algorithm (LINGO) and Bisecting K- means are applied to process it through five phases, namely the pre-processing, frequent phrase extraction, cluster label induction, content discovery and final cluster formation. The experimental result showed that the system could categorize Indonesian text and reach to 93%. Furthermore, clustering quality evaluation indicates that text categorization using LINGO has high Precision and Recall with a value of 0.85 and 1, respectively, compare to Bisecting K-means which has a value of 0.78 and 0.99. Therefore, the result shows that LINGO is suitable for categorizing Indonesian text. The main contribution of this study is to optimize the clustering results by applying and maximizing text processing using Indonesian stemmer and stopword.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Article Details

How to Cite
et.al, S. J. P. (2021). Optimizing Text Categorization for Indonesian Text Using Clustering Label Technique. Turkish Journal of Computer and Mathematics Education (TURCOMAT), 12(3), 1483–1491. Retrieved from https://turcomat.org/index.php/turkbilmat/article/view/947
Section
Research Articles