BotChase: Integrated Unsupervised Learning with Decision Tree Classifier for Graph-Based Bot Detection

V. Krishna Sahithi, R. Jyothika, S. Preethi, Dr. A. R. Siva Kumaran

doi:10.17762/turcomat.v14i2.13695

PDF

Published: May 7, 2023

DOI: https://doi.org/10.17762/turcomat.v14i2.13695

Keywords:

Bot detection, machine learning, DoS attack, K-means clustering

V. Krishna Sahithi, R. Jyothika, S. Preethi, Dr. A. R. Siva Kumaran

Abstract

Bot detection using machine learning (ML), with network flow-level features, has been extensively studied in the literature. However, existing flow-based approaches typically incur a high computational overhead and do not completely capture the network communication patterns, which can expose additional aspects of malicious hosts. Recently, bot detection systems that leverage communication graph analysis using ML have gained attention to overcome these limitations. A graph-based approach is rather intuitive, as graphs are true representation of network communications. To overcome from the issues arisen from existing models, this project uses supervised and unsupervised algorithms, and these algorithms will be trained and generate a model and this model will be applied on new request data to identify whether request is normal or attack. Using unsupervised (K-means) algorithm, we will separate dataset into Bot (attack) and BENIGN (normal) records. K-means will arrange similar records in one cluster, and we will filter out all those records which has a smaller number of requests. All high request number of records will consider as BOT or attack. After separating records, it uses graph-based features extraction technique to extract features from dataset. Dataset will be passed to graph and each IP will be consider as VERTEX and then connect source and destination with edges. Edges will have weight based on its incoming and outgoing link connections. To get edge weight we will calculate between_ness centrality, incoming edge weight, outgoing edge weight and alpha_centrality weight. After all this calculation we will extract in_degree, out_degree, in_degree_weight, out_degree_weight, between_ness, clustering and alpha_centrality as features. Any record which has high number of connections will mark its label as 1 (BOT) otherwise 0 (normal). After features extraction from graph, we will go for normalization to get mean values of each feature. Normalized features will be used to train decision tree classifier and this model can be used to predict type of future requests.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

How to Cite

V. Krishna Sahithi, R. Jyothika, S. Preethi, Dr. A. R. Siva Kumaran. (2023). BotChase: Integrated Unsupervised Learning with Decision Tree Classifier for Graph-Based Bot Detection. Turkish Journal of Computer and Mathematics Education (TURCOMAT), 14(2), 628–642. https://doi.org/10.17762/turcomat.v14i2.13695

Issue

Vol. 14 No. 2 (2023)

Section

Research Articles

You are free to:

Share — copy and redistribute the material in any medium or format for any purpose, even commercially.
Adapt — remix, transform, and build upon the material for any purpose, even commercially.
The licensor cannot revoke these freedoms as long as you follow the license terms.

Under the following terms:

Attribution — You must give appropriate credit , provide a link to the license, and indicate if changes were made . You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.

Notices:

You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation .

No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.

Article Sidebar

Main Article Content