Identification of Languages from The Text Document Using Natural Language Processing System

Manjula S, et. al.


One of the fundamental and significant tasks of data interpretation is language detection from textual data. The current effort is to detect the 22 distinct languages in a multilingual document using the Hybrid Isomap technique. Language identification research is becoming increasingly relevant in everyday life. Language identification tasks  performed using the "European Parliament Proceedings Parallel Corpus 1996-2011." The corpus is a vast and systematic collection of machine treadle writings generated in a natural communication situation. This corpus is derived from the proceedings of the European Parliament, and it usually involves 21 European languages. The Natural Language Processing approach will facilitate in identifying the many languages included in the text document.

