Examining Login URLS to Identify Phishing Threats
Main Article Content
Abstract
Phishing refers to a type of cyberattack known as social engineering, in which criminals trick users into revealing their credentials by utilizing a deceptive login form that submits the information to a malicious server. In this project, we compare machine learning techniques to propose a method for effectively detecting phishing websites through URL analysis. Most current state-of-the-art solutions for phishing detection consider homepages without login forms as the legitimate class. However, we differ in our approach by incorporating URLs from the login pages into both classes. We believe this approach better reflects real-world scenarios and demonstrate that existing techniques yield a high false-positive rate when tested with URLs from legitimate login pages. Furthermore, we employ datasets from different yearsto illustrate how models experience a decline in accuracy over time. We train a base model using outdated datasets and evaluate its performance using recent URLs. Additionally, we conduct a frequency analysis of current phishing domains to identify the various techniques employed by phishers in their campaigns. To support our claims, we introduce a new dataset called Phishing Index Login URL (PILU-90K), which consists of 60,000 legitimate URLs encompassing index and login websites, along with 30,000 phishing URLs. Lastly, we present a Logistic Regression model that, when combined with Term Frequency - Inverse Document Frequency (TFIDF) feature extraction, achieves an accuracy of 96.50% on the provided login URL dataset.
Downloads
Metrics
Article Details
You are free to:
- Share — copy and redistribute the material in any medium or format for any purpose, even commercially.
- Adapt — remix, transform, and build upon the material for any purpose, even commercially.
- The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
- Attribution — You must give appropriate credit , provide a link to the license, and indicate if changes were made . You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Notices:
You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation .
No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.