A Comparative Analysis on Hybrid SVM for Network Intrusion Detection System

Rapid growth in technology, not only makes smoother the life style, but also reveals a lot of security issues. Day by day changing of attack types distractsnot only organizations, companies but also the people who are using network services for their daily needs.Intrusion Detection Systems (IDS) have been developed to avoid financial losses caused by network attacks. KDD CUP 99, NSL-KDD, KYOTO 2006+, CIDDS-01 etc., some of the Intrusion Datasets available for researchers to test and develop their IDS models. In this paper, an attempt is made to compare the effect of various SVM Kernel based models and Hybrid kernel based models etc., on CIDDS-01 dataset. Results were drawn.


Introduction
The growth in the technology leads to the wide spread usage of Internet. Internet is one of the major source for wide spread services in various sectors like business, medical, education, banking etc. which facilitate the customers as well as vendors. These online services are facing major security issues which become a greater threat to the network users for their valuable data; money etc. Network Intrusion detection system (NIDS) is an effective mechanism that provides security for the network users.
There are several machine learning techniques that help in developing Intrusion detection systems. Most of the cases, supervised learning techniques were applied by the various authors due to the availability of the class labels in the intrusion datasets.Classification models like Nearest Neighbor classifiers, Support Vector Machine (SVM), Convolutional Neural Networks, Decision Tree Induction etc., can be adopted on the intrusion datasets and can achieve good prediction rates for attacks.
There are numerous benchmark datasets readily available in the web in various formats that can be easily downloaded and can be applied any NIDS model. Among them KDD CUP 99, NSL KDD, Kyoto 2006+, CIDDS-01 [7][15] are a few datasets. All these datasets are collected through network traffic and contain several parameters such as Source IP Address, Destination IP Address, Source Port, Destination Port, Duration etc., through these features the patterns of a specific network activity is observed and detected whether it is either an attack or a benign.
Some of these datasets are generated by constructing testbeds and some of them are collected from real-time network traffic.
The objective of this paper is to study the behavior of the Hybrid Kernel based SVM algorithm on the dataset CIDDS-01 and compare with the results of other datasets.

i.
Data Set Description CIDDS-001(Coburg Network Intrusion Detection Dataset) is a labeled unidirectional flow based dataset generated by emulating small business environment in cloud for the evaluation of Network Intrusion Detection System (NIDS). It consists of real traffic data from an internal server with open stack environment (Web, E-Mail servers etc.) and external server (file synchronization, web server). Python scripts emulate normal user behavior on the clients.
The dataset contains 14 attribute, the first attributes 1 to 11 are default NetFlow attributes whereas the attributes 12 to 14 are additional attributes described the attacks. Table I   The remainder of this paper is as follows: In Section II researchers' work in intrusion detection system is discussed. A detailed description of Hybrid Kernel based SVM (HKSVM) and its feature selection approaches is given in section III. Section IV provides methodology adopted for testing the dataset, results are discussed in section V and finally conclusion is provided in Section VI.

Related work
Idhammadet al., [4] suggested detection system of DDoS attacks in a cloud environment based on information theoretical entropy and random forest classifier. Time-based sliding window algorithm is used to estimate the entropy of network header characteristics of incoming traffic. When estimated entropy exceeds its normal range then incoming traffic is preprocessed and then random forest classifier is applied. The significant improvement of the accuracy of 2.5% is noticed here compared to the accuracy of Random forest tested directly on the CIDDS-001 which is 97%. (2015) proposed an Intrusion Detection system (IDS) to detect DoS attacks emanating from one or more Virtual machines to another in cloud environment which has got multiple VM's as multi-tenanted set up. The Intrusion Detection system composed of a packet sniffer, a function extractor, and one class Support Vector Machine classifier. The proposed Intrusion Detection System showed promising results to detect seven different types of DoS attacks.

Raneel kumar, Lal and Sharma
Ertam et al., 2014 proposed a method to arbitrate whether data captured on the internet was normal or malicious. Theclassifiers in the proposed work were analyzed with recall, precision, F measure metrics, falserate and accuracy rate values.Ye and Yu, 2015combined binary ELM classifiers of each class into an ensemble classifiers using one to all strategyto classify network intrusion and evaluated the accuracy of different approach. ELM is used with least human intervene. The experiment was performed on NSL-KDD dataset. Wang et al. 2017, proposed support vector machine (SVM) based intrusion detection framework with feature augmentation. The framework supports feature augmented technique for providing immense quality, concise data for training SVM classifiers. The proposed system improved detection along with reduction in training time. Proposed model used NSL-KDD dataset for finding out performance of classifiers. The performance was found to be superior for the metric viz. false alarm rate, accuracy, detection rate.

Methodology
This section will discuss about the proposed methodology to implement a Hybrid Kernel based SVM (HKSVM) [1]and an Ensemble Hybrid Kernel based SVM (EHK-SVM) a feature selection approach [2]on network intrusion detection datasets. For the purpose of experimental test two benchmark datasets namely Kyoto2006+ and CIDDS-001 are used. Figure 1 presents the methodology for the implementation of the proposed model. The dataset is preprocessed to apply transformation and normalizations. The resultant preprocessed normalized dataset is now ready for the mining. This normalized dataset is now undergone to SVM classification with different kernels like RBF-kernel, Polynomial kernel, Gaussian Kernel and Hybrid Kernel based SVM[1] and their corresponding accuracies are observed. On the other hand a feature selection approach namely Relief is applied on the CIDDS-001 dataset and 6 features are extracted having highest ranking to extract good accuracy. Table 2 presents the list of features that were selected after Relief feature selection method.

Results
The experimental study is implemented on JDK 1.7 on Windows 7 environment on Intel Core i5 processor. Table 3 and Table 4 contains tabulated values of accuracies for various kernel based SVM methods on Kyoto 2006+ dataset and CIDDS-001 dataset respectively.
The Figure 2 presents a bar graph for the accuracies of the models on both the datasets. Table3: Accuracies of the SVM methods on CIDDS-001 dataset.  (11) 99.08% From Table 2 and 3 and Figure 2 and 3 it is observed that HKSVM and EHKSVM are giving highest accuracies for the both the datasets.

Kyoto 2006+
In this paper, an attempt is made to find the accuracies of various SVM models on CIDDS-001 and Kyoto 2006+ datasets and compared the results. It is concluded that the Hybrid Kernel based SVM is gained good accuracy when compared to other kernel based approaches. It is also observed that an ensemble hybrid kernel based SVM (EHK-SVM) is also yielding good accuracy with 11 features from Kyoto 2006+ dataset and with 6 features from CIDDS-001 dataset. As a future work of this paper, the proposed model can be tested on the real time network traffic.