Phishing Websites Detection Model based on Decision Tree Algorithm and Best Feature Selection Method
Main Article Content
Abstract
The ongoing progress of network technology has a huge influence on their broad acceptance in many facets of our life in recent time. Phishing websites have suddenly emerged as a major cybersecurity concern. Phishing websites are counterfeit web pages created by hackers to replicate the web pages of legitimate websites in order to deceive users and steal personal information such as usernames and passwords. Despite the fact that several techniques for identifying phishing websites have been presented, phishers' strategies have evolved to circumvent detection. One of the most efficient approaches for identifying these harmful behaviours is machine learning. This is due to the fact that most phishing attacks exhibit features that machine learning algorithms can recognize. Accurate identification of phishing websites is a tough subject since it is based on various dynamic elements. This study proposes a Decision Tree (DT) classifier with optimal feature selection for
phishing website detection, with the goal of improving the classification of phishing websites as phishing or legitimate websites. The experiments were conducted out using the publicly available phishing website dataset from the UCI Machine Learning Repositor, which comprises 4898 phishing websites and 6157 legitimate websites. We extract 30 features from this dataset. In addition, we selected 20 of the most significant features, such as wrapper and correlation-based feature selection.
Ten-fold cross-validation was utilized for training, testing, and validation. The best experimental result was obtained by using 20 of the 30 features and submitting them to the classification algorithm. This study obtained 98.80% accuracy the wrapper - based features selection strategy, that is outperformed the DT classifier, with other feature selection method.
Downloads
Metrics
Article Details
Licensing
TURCOMAT publishes articles under the Creative Commons Attribution 4.0 International License (CC BY 4.0). This licensing allows for any use of the work, provided the original author(s) and source are credited, thereby facilitating the free exchange and use of research for the advancement of knowledge.
Detailed Licensing Terms
Attribution (BY): Users must give appropriate credit, provide a link to the license, and indicate if changes were made. Users may do so in any reasonable manner, but not in any way that suggests the licensor endorses them or their use.
No Additional Restrictions: Users may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.