Fakeheader: A Tool to Detect Deceptive Online News Based on Misleading News Headlines and Contents

: Online news has been the primary source of news content for newsreaders. Unfortunately, based on several findings, readers tend to judge on specific events based on the news headlines rather than its contents. With the advancement of mobile and web technologies, it is easier to spread the news to others with these unhealthy habits that can cause negative impacts on individuals, organizations, or nations that are victimized by the news.In the proposed work, a tool to detect deceptive news based on misleading headlines or content is developed. The tools incorporate data veracity framework for online news with Support Vector Machine and proposed combination of features. The experimental results show the proposed tool managed to produce high performance results with more than 90% precisions and recalls.


Introduction
The Internet has been the source for the users to search almost everything for their daily life. They searched the Internet for the things they want buy, travel, socialize, banking and financing as well as finding reading materials (Malviya,  ). News is among the most accessible reading materials that people will read and discuss daily, but researchers have found out that trustworthiness, and the truth of the news content can be lacking (Osgood, 1971); (Knapp, Hart, & Dennis, 1974); (Dirsehan &Çelik, 2011); (Al-Kinani et al. 2020); (Jamil et al. 2015); (Kerby & Marland, 2015). Similarly, online news shares the same problem, but the scale of the problem is even greater than the traditional paper-based news as online news can be shared rapidly through computers and mobile devices regardless of its trustworthiness. Some literatures also highlighted the impacts of deceptive writing. As an example, it is found that when information are hidden with added cognitive for deception, it changes in human behaviour form (Frank et al., 2008). Jung reported about false details through the media can influence receivers (Jung, 2009). In news reporting, the headline is one of the critical parts of the news report authors. It provides the fundamental idea of the news, and it allows readers to choose from a large number of news items in which they summarized the content of the story through the title. However, there were times that 'catchy headline' approach is used to get the reader's attention while the content is totally or partially different that its headlines. Plus, the media always manipulated the use of the title as an attention grabber to increase their news rating (Dor, 2003)(Ecker, U.K, Lewandowsky, S., Chang, E.P., Pillai, 2014)(D. Q. Wang, 2016). Therefore, it is very important to have a tool to detect news that are misleading and fake in which it is also the objective of this research.

Materials and Methods
Some initial success in deception detection approaches has created a new wave of applying intelligent technologies to support deception detection on fake news ( .However, most of the previous approaches did not focus on misleading headlines, did not acknowledge the news data structure that contains header, content and other metadata or having low detection accuracy. Since headlines are deemed as critical part of the news, deception detection approach must acknowledge the news data structure in order to detect misleading headlines. From the previous studies, researchers have utilized numbers of features combination. These features are essential to train the data to determine deception detection classifiers. There are a few features that have been highlighted in the previous researches. Among the features utilized were Absurdity and Humour, Punctuation, Grammar, Body-independent feature, Body-dependent feature, N-gram, Cosine Similarity, and Deception Detection measurement. In the proposed approach, a new set of combination of features is proposed. In this approach, Bigram and Lemmatization features are proposed to be combined with the Base (TFIDF), Syntactic, Bigrams (N-Grams) and Punctuationfeatures to produce prediction technique with high accuracy (precision, recall and F-score). Figure 1 shows the proposed framework for detecting deceptive news based on misleading headlines or contents using the proposed combination of features. In this study, the dataset from Fake News Dataset (FN) (McIntire, G., 2018) has been selected to validate the proposed approach. The dataset contains 6,335 articles. 3,171 of them are labelled as real news and 3,164 of them are labelled as fake news. The ratio of real and fake news articles in the dataset are around 1:1 in which the titles, contents and veracity labels are provided. The dataset is grouped into three types; news headlines, news content (without headline), and combined (headline+content).

Results and Discussion
A number of experiments are conducted using 80% of the dataset for training and 20% for testing with fivefold cross-validation. Subsequently, the classification technique tested with five different types of base classifiers, namely Support Vector Machine (SVM), Stochastic Gradient Descent (SGD), Logistic Regression (LR), K-Nearest Neighbour (kNN) and Artifical Neural Network (ANN) applied at the training stage. The parameter is set up under Linear Kernel SVM with probability TRUE and C=5. SGD used loss parameter 'hinge,' penalty l2, alpha, and tolare − 3, and maximum iteration 1000. LR used parameter solver 'lbgs' and maximum iteration of 1000.
Based on the training results shown in Table 1, 2 and 3, SVM classifier emerged as the best classifier with highest accuracy in majority numbers of features used with more than 90% accuracy recorded for all types of datasets used. SVM topped as the classifier with the highest accuracy in detecting fake news over the headline dataset for all features used. SVM also topped as the classifier with highest accuracy over the content dataset for all features except for Base + Syntactic feature where ANN record the highest accuracy. For combined dataset (headline + content), SVM classifier recorded highest accuracy on all features except for Base+Lemmatization, Base+Syntactic, and All features. ANN classifier recorded the highest accuracy for Base+Lemmatization, Base+Syntactic, Base+Lemma+Syntactic and All features. Based on the training experiments, SVM topped the accuracy on all types of dataset and SVM classifier is chosen to be included into the implementation of deception detection tool of online news based on misleading headlines.

Performance Measures
In order to evaluate our approach, experimentshave been implemented to detect misleading online news by usingthe Fake News Dataset. Table 4,5 and 6 present the measures of precision, recall, and F-score with associated five-fold cross-validation results for our deception detection model. Each table represents different data type. Based on the results, the proposed approach recorded high precision for Base+Bigram, Base+Lemma+Bigram and Base+Lemma+Syntactic features (with 98% and 99% precision). This shows that the proposed approach with the above features produced high precision of prediction on headline dataset, which generally are short texts. In terms of Content dataset (without headline), similar features recorded among the highest precision as well as Recalls and F-Scores. Content dataset has longer text; therefore, recall and F-score were also high. This shows that the proposed approach with proposed combination of features able to predict deceptive news with high accuracy. In terms of combined dataset (Headline+Content), similar features recorded high accuracies, recalls and F-Scores (except for Base+Lemma+Syntactic feature). In general, the proposed approach with combination of features that mainly consist of Bigram and Lemmatization recorded high precision. As the dataset grows in size (Content and Combined dataset), the approach recorded high precision as well as recalls and F-Scores.

FakeHeader development
FakeHeader is developed as a prototype tool that can be used to detect fake news using news headlines, contents or the combination of both. The tool is developed using Python version 3.7.3 on Windows 10 Professional by using 64-bit as an operating system. Below is the Graphical User Interface (GUI) created for the proposed tool.In order to detect deceptive news, users can input either title of the news, content of the news or the whole news into the text area to get the prediction and accuracy of the fake news detection result. Figure 2

Conclusion
As a conclusion, this paper proposes a simple-to-use tool called FakeHeaderto detect deceptive online news based on misleading news headlines, contents or both. The proposed tool incorporatesa specialized data veracity framework for data veracity of online news based on news headlines and uses Support Vector Machines classifier. Based on the experiments, the proposed tool with the proposed combination of featuresscored high accuracyfor detecting deceptive news based on headlines, contents and combined news data. For future works, the proposed tool will be validated further with further experiments using datasets that are larger in size for larger scale detection and to improve accuracy.