Naïve Bayes Twitter Sentiment Analysis In Visualizing The Reputation Of Communication Service Providers: During Covid-19 Pandemic

: We present the real-world public sentiment expressed on Twitter using the proposed conceptual model (CM) to visualize the communication service providers (CSP) reputation during the Covid-19 pandemic in Malaysia from March 18 until August 18, 2020. The CM is a guideline that entails public tweets directly or indirectly mentioned to the three biggest CSP in Malaysia: Celcom, Maxis, and Digi. A text classifier model optimized for short snippets like tweets is developed to make bilingual sentiment analysis possible. The two languages explored are Bahasa Malaysia and English since they are the two most spoken languages in Malaysia. The classifier model is trained and tested on a huge multidomain dataset pre-labeled with the labels “0” and “1”, which resemble “positive” and “negative”, respectively. We used the Naïve Bayes (NB) technique as the core of the classifier model. Functionality testing has done to ensure no significant error that will render the application useless, and the accuracy testing score of 89% is considered quite impressive. We came out with the visualization through the word clouds and presented -56%, -42%, and -43% of Net Brand Reputation for Celcom, Maxis, and Digi.


Introduction
Currently, social media has become extraordinarily popular among people of all ages. Millions of social media users use social media network sites to express their emotions and opinions and disclose their daily lives [1]. Twitter is a social media or micro-blogging platform available as a website and mobile application that lets its registered users to share short messages called tweets anytime, from their smartphone, tablet, or computer [2]. According to the Malaysian Communications and Multimedia Commission (MCMC)'s Internet Users Survey 2018 Statistical Brief Number Twenty-Three, in 2018, it is estimated that there are about 24.6 million social networking users. Of those, 23.8% own a Twitter account. By February 2019, Twitter averaged over 320 million monthly active users making an average of 500 million tweets daily, which means around 6 thousand tweets per second [3]. The channel provides the sales campaign products and services to engage with their customers for advertising [4]. Through online communities, like the one that exists on Twitter, for example, an interactive media where consumers inform and influence others is created, consumers mostly depend upon user-generated content on the internet for decision making. Positive feedback from previous users could influence a consumer's decision to purchase a particular product, generate brand awareness, and increase its sales.
During the pandemic of Covid-19, social media medium usage has been used actively as the communication platform. Starting from March 18, 2020, until August 18, 2020, Malaysia announced the Movement Control Order (MCO) to prevent the virus's spread. As many people must stay home during the MCO, and most activities conducted online, internet lines increased. Although the bandwidth is sufficient, the increase in internet data consumption makes it essential for telco companies to aware of their performances based on the comments or feedback so that it enables them to make any improvements. This ocean of opinionated tweets consists of various topics, making it the right spot for researchers to do data mining and gain research information. One of the studies related to using Twitter data is Twitter sentiment analysis. Sentiment analysis is a computer science field that uses language processing and machine learning to study and analyze one's attitude, opinion, and evaluation towards entities like topics, services, products, and more. The objective of sentiment analysis or opinion mining is to determine the author's attitude and emotions from a piece of writing or text [5]. For many different purposes, the sentiment value found within a written language such as comments, feedback, or critiques provides useful indicators for specific organizations. According to [6], two categories of sentiment values: a binary scale consisting of either positive or negative and an n-point Likert scale attitude measurement. The sentiment analysis technique automatically extracts and summarizes the sentiments of such a large amount of data in the social media that is unable to be handled by the average human reader [7].
Recently, Twitter has attracted researchers to analyze Twitter data for various types of sentiment analysis research, such as making predictions [8], detecting users' sentiment towards different issues, and detecting users' emotions [9]. Furthermore, Twitter contains more relevant data than traditional blogging sites as each tweet has a limited number of only 280 characters to express opinion compressed in a short text [10]. Twitter caught the researcher's attention with the number of tweets posted a day reaches 500 million, making it the right spot for data mining [11]. However, the biggest challenges of Twitter sentiment analysis are implicit sentiments, synonyms, and sarcasm.
Data visualization is a powerful technique for exploring and communicating information as it represents quantitative attributes over visual properties such as position, length, area, and colour in an organized form [12]. Referring to [13], data visualization and visual analytics enable nontechnical organizations to make data discovery in a self-directed style to enhance decision making result and daily business operation. The innovation of various visualization tools helps users improve their understanding and skills in generating various charts using different visualization techniques. Due to the ability to provide a quick and clear understanding of the information, this field has rapidly grown, resulting in an increased number of types of charts and types of data analysis [14]. A significant amount of data was visualized using various charts such as pie charts, bar charts, and word clouds to find the data's hidden information.
Therefore, this paper presented the real-world of public sentiments using the proposed conceptual model (CM) to visualize CSP's reputation to cater to the ideas. The CM was adapting the two CM, which is Simulation in Modeling CM (2008) and Integrated Framework for CM (2016) [15], to visualize the reputation during the MCO period. This study involves Full Stack Web Development, which means there are two components: the back-end and the front-end. For the back-end section, data collection, data pre-processing, and Naïve Bayes (NB) algorithm developed the model and accuracy testing of the model discussed. For the front-end section, we explained the flow of the system and the designed interface.

Back-End Development
The back-end is the server-side of the web application. Data manipulation and model development is a part of back end development. The back-end component of a web application also makes sure everything on the front end works accordingly. In this section, we performed data collection, data pre-processing, and implementation of the NB algorithm to develop the model and the accuracy testing. These processes are explained extensively in the following subsections.

Data Collection
We extracted the dataset from huseinzol05's GitHub repository named Malaya-Dataset for the training set and testing set for this study were extracted. The public can access this repository at https://github.com/huseinzol05/Malaya-Dataset. From the readme file, the repository claimed to gather and store Bahasa Malaysia corpus. We also discussed the method used to gather these data in the same file. The data are mostly collected using crawlers, and these data are semi-supervised by paid linguists. We extract two repository folders data, which are Sentiment Twitter and Sentiment Multidomain. These data are all in. json format, and the number of data in total is 1,231,396, and all the data are pre-labeled. The number of negative tagged sentences is 693,249, and the number of positive tagged sentences is 538,147.
We extract the data from Twitter profiles for real-world implementation without tusing Twitter's API through Twint.TwintisanadvancedTwitter scraping tool written in Python, and it utilizes Twitter's search operators to allow scraping from specific users and tweets relating to certain topics, hashtags,andtrends.Inthisresearch, we scraped tweetsthatcontainthekeyword'Celcom', 'Digi', and 'Maxis' dated from March 18, 2020, until August 18, 2020.Throughsearchingforthosekeywords,tweets, directly and indirectly, mentioned to the 3 CSP can be extracted. The scraped data stored in .csv files. The total number of data scraped for Celcom, Digi, and Maxis is 101,768, 45,783, and 36,582, respectively. There are 34 columns, and some examples of the columns are 'timezone', 'user_id', 'username', 'name', and 'tweet'.

Data Pre-Processing
We perform data pre-processing to discard any unnecessary qualities in the data, making the trained model a poor generalizer. We pre-processed the data for real-world implementation by removing the columns that are insignificant for this study. The final dataset comprises three columns: 'date', 'username', and 'tweet'. The .csv files are then imported into Jupyter Notebook to run the cleaning process. The cleaning process involves removing HTML unique entities, converting @usernametoAT_USER, removing tickers, shifting all thecharacterstolowercase, removing hyperlinks, hashtags, punctuation, words with two or fewer letters, whitespace, and characters beyond Basic Multilingual Plane (BMP) of Unicode.The cleaned data are then stored. However, to draw up a word cloud,aseparatedatasetneedstobecreated. It is due to the need fortokenization on the data. Also, we remove the punctuations,stop words, andafew more 'special' words.
The purpose of the Net Brand Reputation (NBR) is to simplify the process of gauging consumers' loyalty [16]. The index helps in focusing on creating more positive remarks and decreasing the negative feedback. NBR scores do not reflect the scores obtained using the Net Promoter Score (NPS). Thus, we choose NBR as the reputation index for this study as it suits the nature of this study better, and it addresses the issues faced by NPS [17].

Naïve Bayes Classifier
The sentiment analyzer is built on top of a Naïve Bayes (NB) Classifier Model. The model learns the correct labels from the training set and performs a binary classification. The model assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. The NB theorem calculates the probability of a specific event happening based on the probabilistic joint distributions of certain other events [18]. Overall, NB is famous for its classification techniques due to its captivating structure for different tasks and a satisfactory performance obtained in the task. It shows excellent performance accuracy and the minimum rate of error compared to other classifiers [19]. In this study, we fed the model with the training set containing prelabeled tweets, and it teaches itself the characteristics of the features of a positive and negative tagged sentence.
As both the training and testing sets are already partly pre-processed, the earlier stage's pre-processing is deemed unnecessary. However, vectorization still needs to be carried out on the two sets. Each message, representing a list of tokens, is converted into a vector that a machine learning model, like the NB Classifier Model, can process. The Bag of Words model used and it involves three simple steps, which are counting the number of times a word appears in each message, weighing the counts, which in turn lowers the frequent tokens' weight, and normalizing the vectors to unit length to abstract from the original text length. Each vector has as many dimensions as there are unique words in the tweeter corpus. Figure 1 shows the idea of the conceptual model developed in the previous research to visualize the reputation of CSP through Twitter sentiment analysis. The first two steps are also commonly known as term frequency and inverse document frequency. These two are combined to form Term Frequency, Inverse Document Frequency, or TF-IDF, a weight commonly used in information retrieval and text mining. This weight is a statistical measure used to evaluate the level of significance of a word to a particular document in a collection or corpus. The level of significance increases proportionally to the number of times a word appears in the document. Nevertheless, it offset by the frequency of the word in the corpus.
Term Frequency (TF) is a measure of how frequently a term appears in a particular document. Since every document varies in length, there is a possibility that a term would occur many more times in longer documents than shorter ones. Thus, the term frequency is usually divided by the document length as a way of normalization, and the formula to calculate TF shows in Eq. (1).
Inverse Document Frequency (IDF), on the other hand, measures the level of significance of a term. While computing TF, all terms were considered similarly significant. However, specific terms such as "is", "of", and "that" have the tendency to appear more frequently while adding little to no significance. Thus, we weighed down the frequent terms and scaled up the rare ones at the same time. The formula to generate IDF shows in Eq. (2).
The recommended method for training a good model is to first and foremost, cross-validated using a portion of the training set to check whether if the model is overfitting the data. Different hyperparameter configurations have been testing out to split the model into random parts to evaluate whether it is generalizing well or overfitting. For this particular study, there 4+2+2 parameter combinations to test and 10 kfold validations. Hence, the model is trained and tested on the testing set 80 times. The data are split into training and testing sets beforehand, with the ratio of 80:20. It is considered there are 985,117 tweets in the training set and 246,279 tweets in the testing set.
The model is then stored and can be retrieved in the future without having to retrain it. Model evaluation is then performed on the trained model to predict the unseen test data, allowing grading and retrieving the performance metrics. Two metrics are retrieved, which are the classification report and the confusion matrix. Figure 2 shows how we interpreted a confusion matrix.

Real-World Data Visualization
We run the sentiment predictions with the trained model on the data collected; the data and model were loading. Sentiment analysis based on the trained NB Classifier Model is then performed on the data to generate new data with the texts tagged with either a "positive" or a "negative" label, which is represented by "0" and "1". "0" represents "positive", and "1" represents "negative". These data are sorted according to the date and saved.
The data visualized using Plotly, an open-source, interactive graphing library for Python. We imported the data into Pandas data frames, and from the data frames, the data plugged in, and charts and draw up the graphs. The charts generated include bar charts, line charts, and word clouds. Apart from the word clouds, all the charts generated are interactive, and hovering over some elements on the charts will trigger a pop-up containing extra details of the charts.

Front-End Development
Front-end development, also known as client-side development, is the writing HTML, CSS, and JavaScript practices for a web application to allow users to see and interact with the application directly. The product of the development is then served to the users commonly through a web browser. Front end development also involves the design aspects like visual aesthetics and usability of the web application. Extensive explanations of the frontend development are provided, including design elements such as a use case diagram and a flowchart. Figure 3 shows the overall use case for the visualization system that involved user interaction with the system. It is crucial to show the sequence of actions and the interactions involved to achieve the objectives. Each use case has its description of the involved activities, and the developer can easily comprehend the system's requirements. Figure 4 shows the whole system's flow, which includes the sentiment analyzer and other system features. As soon as the system is running, the user can see the landing page. If the user clicks on the 'Go to Dashboard' button, the system directed the user to the overview page. The user can browse through where the overview of the data and analysis performed on the data is displayed. If clicks the 'Celcom' button, the system directed the user to the Celcom page with extensive details, including the NBR and data visualization. The user is directed either to the Maxis or Digi page, depending on the selection button, 'Maxis' or 'Digi'. The contents are similar to the Celcom page but for the respective CSP. For the 'Sentiment Analyzer' button, the user can enter any input in the text. After clicking the 'Submit' button, the system performed the sentiment analysis based on the NB Model Classifier developed in the back end development and display the result of the sentiment analysis. Lastly, for the 'Twitter Updates' button, timelines of Celcom, Maxis, and Digi's official Twitter accounts are streamed and displayed on the 'Twitter Updates' page.

Design User Interface Diagram
A design must be drawn up beforehand before developing the system's prototype interface to ensure that the interface's flow is not compromised. Besides, a design provides a better and more precise view of the flow of how the actual system will function. The UI covered the functional requirements and non-functional requirements of the system.

Result and Discussion
In this subsection, we discussed the final result of the web-based visualization of the CSP system started from the interface, functionality testing, accuracy testing, real-world data analysis, and the three CSP word cloud. Figure 5 shows the description of the user interface's final design of the system started with the Landing Page, where the user has to click on the 'GO TO DASHBOARD' button to enter the application. Figure 6 displays the page of 'Overview'. User can see the NBR for all the 3 CSP and a summary of the analysis through data visualization. Figure 7 illustrates the Celcom page. It displayed the result of the analysis for Celcom on this page through data visualization.

Functionality Testing
The purpose of conducting functionality testing is to locate any anomaly, error, or odd behaviour in the system. It is vital to ensure that every function of the system works smoothly and accordingly. If an error can be detecting, this is an indicator of a poorly developed system.

Accuracy Testing of the Native Bayes Classifier Model
We automated the accuracy testing of the NB Classifier Model by writing a simple Python code. Figure 8 shows the snapshot result of accuracy testing.

Figure 8 Snapshot Result of Accuracy Testing
As observed, the accuracy score is 89% after conversion to a percentage. This score means that the model predicted the correct label 89% of the time. In simpler words, out of 10 attempts, the model was able to get approximately nine correct results based on the data fed from the testing set. This score provides a decent picture of how well the model is performing. The model predicted 129,727 labels correctly as "negative" and 89,948 labels correctly as "positive" from the confusion matrix. However, there were 8,598 "negative" labels and 18,007 "positive" labels mispredicted.
Lastly, from the classification report, extra details on the model's performance can be extracted. The label "0" precision is 88% and 91% for the label "1". These numbers indicate the proportion of the labels predicted correctly out of the total number of predictions for that class. Next, label "0" got a score of 94%, and label "1" managed a score of 83% for recall. Recall translates to the number of correct predictions out of accurate labels for that class. F1-score, on the other hand, is the weighted average of precision and recall for that class. It typically provides a bigger and more precise picture of how well the model performs for that label, and a higher number is a good indicator of a better performing model. Label "0" scored 91%, and label "1" scored 87% for the f1-score.
The agreement scores in social computing studies average at around 0.60 or 60% [20] to provide a perspective on how well the model performs. Therefore, a nearly 90% accurate program can be considered quite impressive. The model can also be benchmarked against similar work by [21]. The final model based on Support Vector Machine (SVM) achieved an accuracy of 79.08%.

Analysis Result of Real-World Data
We analyze the real-world data result for three CSP as in this subsection. A. Celcom CSP: Tweets directly or indirectly mentioned to Celcom amount to a total of 101,768 tweets. The model runs on these tweets, and the number of "positive" tagged tweets are 22,235, and "negative" tagged tweets are 79,533. Hence, the NBR for Celcom is -56%. Figure 9 shows the line chart generated to illustrate the sentiment from March 18, 2020, until August 18, 2020, every month. The variance of negative statements tweeted is much higher than positive statements. Twitter users were tweeting marginally fewer negative statements at the beginning of the year than towards the end of the year.
B. Maxis CSP: On the other hand, tweets directly or indirectly mentioned to Maxis amount to a total of 36,582 tweets. The number of "positive" tagged tweets are 10,571, and "negative" tagged tweets are 26,011. Hence, the NBR for Maxis is -42%. Figure 10 shows the line chart generated to illustrate the trend of sentiment in six months. The variance of negative statements tweets for Maxis is also higher than positive statements. Twitter users were tweeting marginally fewer negative statements targeted at Maxis at the beginning of the year than towards the end of the year, similar to the pattern identified on Celcom's tweets. Tweets directly or indirectly mentioned to Digi amount to a total of 45,783 tweets. The number of "positive" tagged tweets are 13,107, and "negative" tagged tweets are 32,676. Hence, the NBR for Digi is -43%. Figure 11 shows the line chart generated to illustrate the trend of sentiment within six months. The variance of negative statements for Digi tweeted is higher than positive statements. However, the variance is lower than Maxis' and Celcom's. The tweeted number of negative statements about Digi is similar during the beginning and end of the year. There is consistency in Digi's reputation throughout the year.
D. World Cloud Visualization: Figure 12, Figure 13, and Figure 14 show the word clouds generated from Celcom, Maxis, and Digi data. Fromtheseword clouds, we conclude that the biggest issue that gets the subscribers ofthese CSP tweetingisthe cellreception. Theyallhaveonething in common: the most frequent term or word used in the tweets is "line". In the context of Malaysia and Malaysians, this translates to connectivity and cellreception.    Table 1 and Figure 15 shows the comparison of the results for the three CSP. In conclusion, during the first MCO started from March 18 until August 18, 2020, the NBR of Celcom is the worst compared to its competitors, and the NBR of Maxis is the best, with Digi's being a close second best with a minimal margin. The figure shows that Celcom received the highest percentage of negative tweets and the lowest percentage of positive tweets. These two combined make the Net Brand Reputation. However, all three CSP has a negative NBR.

Conclusions
This web-based real-world Twitter sentiment analysis of Malaysian Communication Service Providers (CSP) serves as a medium to visualize the results of the sentiment analysis conducted on tweets directly or indirectly mentioned to Celcom, Maxis, and Digi during the first MCO. The Naïve Bayes Classifier Model developed for this research is also embedded in the application, allowing the user to use the model on any textual data. The information extracted from the application can facilitate decision-making and make a rough estimation of how well a particular CSP is doing. The application's visuals are all interactive, making it easier for the user to gain better and more precise insights. An extra feature included in the application is streaming the tweets from the official Twitter accounts of the three CSP. This feature updates the users with the latest news and announcements regarding these CSP's services and products. For future work, we can improve the corpus to include different slangs of Bahasa Malaysia and commonly used short forms and add an extra class to represent texts that do not belong to either "positive" or "negative".