Optimizing Text Categorization for Indonesian Text Using Clustering Label Technique
Main Article Content
Abstract
Text Categorization plays an important role for clustering the rapidly growing, yet unstructured, Indonesian text in digital format. Furthermore, it is deemed even more important since access to digital format text has become more necessary and widespread. There are many clustering algorithms used for text categorization. Unfortunately, clustering algorithms for text categorization cannot easily cluster the texts due to imperfect process of stemming and stopword of Indonesian language. This paper presents an intelligent system that categorizes Indonesian text documents into meaningful cluster labels. Label Induction Grouping Algorithm (LINGO) and Bisecting K- means are applied to process it through five phases, namely the pre-processing, frequent phrase extraction, cluster label induction, content discovery and final cluster formation. The experimental result showed that the system could categorize Indonesian text and reach to 93%. Furthermore, clustering quality evaluation indicates that text categorization using LINGO has high Precision and Recall with a value of 0.85 and 1, respectively, compare to Bisecting K-means which has a value of 0.78 and 0.99. Therefore, the result shows that LINGO is suitable for categorizing Indonesian text. The main contribution of this study is to optimize the clustering results by applying and maximizing text processing using Indonesian stemmer and stopword.
Downloads
Metrics
Article Details
Licensing
TURCOMAT publishes articles under the Creative Commons Attribution 4.0 International License (CC BY 4.0). This licensing allows for any use of the work, provided the original author(s) and source are credited, thereby facilitating the free exchange and use of research for the advancement of knowledge.
Detailed Licensing Terms
Attribution (BY): Users must give appropriate credit, provide a link to the license, and indicate if changes were made. Users may do so in any reasonable manner, but not in any way that suggests the licensor endorses them or their use.
No Additional Restrictions: Users may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.