Traffic Sign Classification Using Convolutional Neural Networks and Computer Vision

The world is quickly and continuously advancing towards better technological advancements that will make life quite easier for us, human beings [22]. Humans are looking for more interactive and advanced ways to improve their learning. One such dream is making a machine think like a computer, which lead to innovations like AI and deep learning [25]. The world is running at a higher pace in the domain of AI, deep learning, robotics and machine learning Using this knowledge and technology, we could develop anything right now [36]. As a part of sub-domain, the introduction of Convolution Neural Networks made deep learning extensively strong in the domain of image classification and detection [1]..The research that we have conducted is one of its kind. Our research used Convolution Neural Network, TensorFlow and Keras.


Introduction
Today's ore advanced technologies are furthering our goals and helping with automation in every field making the need for a human in those areas invalid, because a human is prone to making mistakes, but a machine in his/her place would certainly be more efficient, both in terms of speed and accuracy [34]. Technologies such as Deep Learning and Machine Learning have evolved greatly in this time [2].
This technology helps to teach machines to learn on their own instead of having to program every single action and possibility [3]. So, this research required us to use techniques like this such as convolution neural networks, Keras, TensorFlow, etc. and implement them so as to help the self-driving cars to be able to perceive traffic signs and react according to the input received. In this research, we have built a deep neural network model using Convolutional Neural Networks that have the capability to classify traffic signals that are present in the image based on its class. With the model that awe have developed, we were able to detect as well as classify traffic signals, which is very crucial to self-driving cars because it can otherwise lead to fatal accidents.

Objective
The main objective of this research is to develop a product which would help people learn about one of most underrated, yet very import part of our daily life, a traffic sign. This model has been made using deep learning libraries TensorFlow and its high-level API, Keras. The objective of this model is to attain an accuracy so strong that an individual should be able to use our product without any hesitation [37].

Motivation
In the past and recent times, there have been many road accidents where the main reason for these being inadequate knowledge of road and traffic signs [30]. Even though speed is one of the key issues for the cause of such atrocities, in a survey, it was found out that the second most heard reason was an individual not knowing what a particular traffic sign meant [23]. We strongly believe that the research that we have done would help individuals learn these signs intuitively, especially the adolescence of 21st century, who also stay and live around technology, which is growing faster than ever [4]. Our research focuses on detecting traffic signs, when provided an image to it through deep learning, image processing through OpenCV and a convenient UI is having been developed in Python GUI using Tkinter library.

Dataset:
The data set we have decided to use for our research was the GTSRB-German Traffic Sign Detection Benchmark. This is a very well-known dataset for traffic signs in websites like Kaggle. This data set has more than 40 classes of images and 50000 images for training, validation and testing purposes [14]. We have divided the data set into training, validation and testing set, which further helped us in understanding how well our architecture was working. The dataset is also very diverse. Figure 2.1 will present a basic depiction of how diverse the dataset.

Fig 1-Images in the dataset
We can understand from the above figure that this dataset has been prepared in a very robust way so that the model developed can be used for future-work of any research.

Design Approach:
For the design part we have made an architecture after doing research on various other architectures like Alex net, VGG16 and VGG19.The type of network that we have used in our research is the very well-known CNN [5]. The research on these architectures and network structures gave us a proper insight into how to make our own architecture [15]. This research gave us an idea of how to put convolution layer and maximum pooling layer as well as the drop out values in order to reduce the computational power need as well as increase of accuracy [36]. The basic functionality of CNN is given in figure 3.1. The biggest constraint that we had during the course of this project was the need of computational power that was needed to train and test the data set. With a total size of more than 50000 images, this model requires quite some amount of computational power to train the network comparatively faster, and finding such high computational power is not accessible easily [17]. The second constraint we faced were the transferring of images from the downloaded data set to training and testing sets, which also requires quite a good amount of computational power [20].

Environment of development:
In this paper, both training and testing were performed on a workstation running on i7-8750h. The workstation consisted of 16GB of RAM and 512GB of Solid-State Drive. Our research process was comparatively swift, as we had access to Nvidia GTX 1080, which helped us in providing a comparatively better computational power for the training of our deep learning model.

Preprocessing of images:
As we have taken a prominent data set present in this domain, the images were of good quality. The main preprocessing that we had to do was resizing the images to a lower size for the ease of computation [19]. We did not change the color of the images as we wanted the images to retain maximum of its properties for the model to learn [6]. Once the images have been resized to the same size, we have converted the images to a NumPy array an appended that array to a list.
Similarly, outputs are also converted to integers and appended with another list, depicting the output labels of each image that is being trained.

Structural Representation:
Basically, when an image is passed to a model, it is passed through 2 convolution layers, which are then followed by a maximum pooling layer of pooling size (2,2). Maximum pooling layer has been utilized to lower the dimensions but still retain the details of an image [13]. This set is repeated for two times and then it is flattened and passed to a fully connected dense layer network. The activation functions used here are rectified linear unit functions, followed by another fully connected layer which runs on a soft-max layer in order to predict the class of a traffic sign to which it belongs. The

Result:
neural network architectural design that has been used in our project is presented in figure 4.1.

Fig-3 Developed CNN structure
The architecture table of the developed network structure that has been made in figure 1 is shown in Table 4.1.
In our research, we came out with a very efficient network architecture which attained an exceptional accuracy of 98.8% on the validation set and an accuracy of 96% on the test set. This model was saved as a h5 file whose location is further passed to our file containing our GUI, for using our trained model extensively. We were also successful in developing this GUI, using which, a user can upload an image in our GUI and the user would get a message of what traffic sign it was. The accuracy graph of the model has presented in figure 5.2.

Comparison with state-of-the-art models:
For classification of traffic sign, in general for classification of images, there have been multiple previous implementations [43]. In spite of this, our CNN model has achieved much better accuracy (96%) than most of the other models, such as the HOG-LDA, HOG-Random Forest, CNN-SVM, etc. For instance, the HOG-LDA model attained an accuracy of only 90.36% [7], the HOG-Random Forest achieved an accuracy of 92.43% [8] and the CNN-SVM model has attained 95% [9] accuracy, closest among all the other models. The below chart, Figure 5.2, will provide an easier understanding the above depicted data. This research has given us an insight into how well deep learning can be utilized to create intelligent systems [12]. As a part of future work, we were planning on integrating our model into a real time camera, which would further improve its functionality and application.
This can further be included in industrial level products such as driverless cars in the future, provided we integrate our research work into a real time system [11].

Summary:
In this research using TensorFlow, CNN and OpenCV, we have successfully developed a traffic sign classifier which attained an accuracy of 96%, which is functioning better than many other models that have been developed from other researches.
We also developed a python GUI which looks interactive and intuitive to use, which takes an image as input and presents the predicted traffic sign to the user.