Convolutional Neural Network Based Advertisement Classification Models for Online English Newspapers

Image processing for knowledge management and effective information extraction is the key element for steering towards society 5.0. There has been a substantial research and progress in the area of image recognition and classification in the recent years but at the same time, there is a lack of significant work in the field of advertisement image classification from online English newspapers. This research paper analyses and compares various popular image classification techniques to find out the most suitable technique for advertisement image classification problem. Automatic feature extraction without any prior knowledge of features makes Convolutional Neural Networks (CNN) the most suitable technique for advertisement image classification. This paper further explores and implements three different CNN-based image classification models that can classify advertisement images from online English newspapers into four pre-defined categories including Admission-notices, Job-advertisements, Sales and Promotional advertisements and Tenders. These models are trained and tested on an advertisement image dataset collected from four different online English newspapers over a time frame of 15 months. Fine-tuned ResNet50 Model using ‘Transfer-learning’ is found to be the most suitable model for this advertisement image classification task with results exhibiting around 74% accuracy. This CNN-model based automated classification of advertisement images will help newspaper readers in performing exhaustive advertisement search in a category of their own interest, saving the time and efforts of sequential manual search across a range of multiple newspapers. Also, the proposed research will help in performing advertisement analysis and studies.


Introduction
Online newspapers are in much trend these days for the convenience of accessing information on our laptops, mobiles, smartphones, tablets and desktops etc. anywhere at any time. Moreover, home locked situations during unprecedented circumstances like pandemics which restrict the access to the printed copy of newspapers have further boosted this trend many folds. With younger population increasingly embracing technology, this trend of online newspaper reading is going to last for many years to come. Along with the news articles, advertisements in the newspapers are of much interest. Government departments, recruitment agencies, educational institutes, private companies etc. use newspaper advertisements as a primary source for advertising tenders, jobs, admission notices, sales and promotions etc. and people anxiously wait for these advertisements to be out in the newspaper. Students may be interested in admission-notices whereas job aspirants may look for job-advertisements in the newspapers. A contractor may be interested in the relevant tender-notice and a shopping enthusiast may be looking for sales and promotional advertisements. But online newspapers do not give this type of category-wise personalised search options. Also, no search engine including Google has a primary purpose of searching advertisements from online newspapers. As a result, when we search for some advertisement in the newspapers through search portals, we may get hundreds of images but needless to say that only a few of them are relevant. Hence the reader is left with no option but to sequentially go through all the newspapers and manually search the relevant advertisements for himself or herself. This sequential manual search is very time consuming and tedious specially when the reader is searching for a particular advertisement across a range of newspapers. An advertisement image classification model which can classify each input advertisement into various pre-defined advertisement categories can be very helpful in performing this type of personalised advertisement search. When combined with OCR (Optical Character Recognition) techniques and user-friendly search interface, this advertisement image classification model can help a reader in performing category-wise advertisement search across a range of newspapers saving the time and effort of sequential manual search.
Advertisement image classification is typically a supervised machine learning problem which involves two phases. The first phase is the learning or training phase in which a classification model is created using a classification technique (learning algorithm) and the model is trained on advertisement dataset. Second phase is the recognition or classification phase where the advertisement classification model is used to classify the new advertisement images into different pre-defined categories. Many image classification techniques are available to choose from and each of the technique has its own advantages and disadvantages. SVM is one of the most popular image-classification techniques used in several applications including facial expression classification, bioinformatics (classification of genes, protein remote homology detection, cancer classification etc.), Generalized Predictive Control (GPC), text and hypertext classification and many more. On the other hand, CNN is also used in many real-world applications such as face recognition (in social media applications), image analysis (in health care), OCR, object-detection (for driver-less cars) etc. and has become the first choice for image recognition and classification tasks among many researchers. The proposed research explores various popular image classification techniques including K-NN, Decision trees, Naïve Bayes, SVM and CNN and chooses the most suitable image classification technique for advertisement image classification from online English newspapers. The chosen technique is employed in three different models and these models are evaluated on the advertisement dataset and their validity is ascertained.
The major contributions of the proposed research are: (1) Choosing the most relevant image classification technique for advertisement image classification from English newspapers, (2) Creation of advertisement dataset from online English newspapers, (3) Designing, training & testing three different advertisement image classification models using the advertisement dataset created.
The rest of the paper is organized as follows: Popular image classification techniques are introduced and a comparative analysis is drawn and the most suitable classification technique is chosen for advertisement image classification from online English newspapers. Related work in the field of advertisement image classification is presented in the next section. Following section presents the proposed CNN-based advertisement image classification model building and the advertisement dataset used followed by 'Result and Discussion' section that elaborates the implemented CNN models along with their performance analysis. Finally, the conclusion section concludes the paper.

Popular Image Classification Techniques
Popular image classification techniques for supervised learning are presented as follows:

K-NN (k-nearest neighbor)
K-NN (Cover & Hart, 1967) is one of the simplest supervised machine learning algorithm for image classification. Training phase is very simple and involves storing feature vectors and labels of the training imageset. In the recognition phase, K-NN first locates the k-nearest data samples to the query data sample and assigns a class label which is most frequent in the k-nearest data samples i.e. it classifies the unknown data point on the basis of its closest neighbours whose classes are already known. KNN algorithm considers all the features equal in similarity computation leading to misclassification when only a small subset of features are useful for classification purposes (Kim 1 et al., 2012).

Decision Trees
Decision trees (Murthy, 1998) classify query samples based on their sorted feature values. In a decision tree, each node represents a feature that a data-sample may possess and each branch represents a value that a particular feature node can assume. Leaves represent the final decisions (classes in which the instances are finally classified). Classification process starts at root node. Different branches are followed based on the feature values present in the query sample and finally a leaf node is reached which is the result of the classification algorithm. The well-known algorithms for building decision trees are ID3, C4.5, EC4.5, Rainforest and PUBLIC etc. (Kotsiantis, 2007).

Naïve Bayes
Based on the Bayes Theorem, Naïve Bayes (Rish, 2001;Lewis, 1998) is a statistical learning technique that predicts membership probabilities (probability of a data sample or record belonging to a specific class) for each class based on the feature vector. The class whose probability is the highest is considered as the most likely class. The name naive is used because it assumes the features that go into the model is independent of each other. This assumption of independence of features rarely holds true in the real world and hence Naïve Bayes classifiers are generally less accurate as compared to other more complex learning algorithms (Kotsiantis, 2007).

SVM
Support Vector Machine (SVM) (Cortes & Vapnik, 1995) represents the data samples as points in space and finds out the optimal hyperplane which separates the data samples belonging to two different classes with maximum margin. The test or query data sample is also mapped onto that same space and its class is predicted based on which side of the hyperplane it falls on. Training and testing speed of SVMs is inherently very slow. Also, the selection of the kernel function parameters for SVM is not straight forward (Kim 1  is a deep learning architecture that contains many hidden layers including convolutional layer, pooling layers, fully connected layers and flatten layers. Convolutional layers detect various low-level and high-level features by applying different filters to the sample image. A nonlinear activation function (generally Rectified Linear Units (ReLU)) is applied to the features obtained after convolution. After each convolutional layer, a pooling layer is added. These convolutions + pooling blocks can be repeated many times. The last pooling layer is followed by flattening layer. Next step is to add a fully connected layer. One or more fully connected layers may follow and finally, classification is performed using different classification techniques (for e.g. Softmax) and classification output is obtained. Three main properties of CNN namely : 'Sparse connectivity', 'Shared weights' and 'Pooling' work towards dimensionality reduction of the network and reduce the training time to a great extent (Hasan et al., 2019). Training time can further be decreased using Graphical Processing Units (GPUs).

Comparative Analysis of Popular Image Classification Techniques
The important features and limitations of different classification techniques are summarized in Table 1.

Classification Technique for Advertisement Images
As discussed in the previous section, many image classification techniques are available to choose from. Being simple and easy to understand, K-NN is a basic classification approach and gives accepted results but when it comes to advertisement image classification, no. of dimensions of the feature space is too large to compute pixel by pixel feature space distances making K-NN unsuitable for advertisement image classification. The information to be extracted from online newspapers / web contents can be structured, unstructured and semi-structured (Dhuria et al., 2016). Decision Trees are the most comprehensible image classification technique but creating a decision tree which includes all the features required to classify the advertisement images in different categories would be computationally very-very intensive and hence not feasible. Strong 'feature independence assumption' makes Naïve Bayes classification technique unfit for advertisement image classification. SVM is a commonly used technique for image classification but the major limitation of SVM is that it does not support automatic feature extraction and various other techniques are required to extract useful features from the entire set of features prior to the classification process. This generally needs prior knowledge of features involving domain experts and the datasets need to be prepared according to the feature set identified. Also, SVM is basically a binary classification approach. For multi-class classification problems like advertisement image classification, the classification task needs to be converted in to a set of multiple binary classification problems adding computational complexities as the no. of classes increase. On the other hand, CNN is a multi-class classification technique that just needs an image dataset for training and automatically learns the features from the dataset. This automatic feature extraction property makes CNN the most suitable technique for advertisement image classification from online English newspapers where only images are available with the class labels to learn from, without any prior knowledge of the features.

Related Work in Advertisement Image Classification
There has been tremendous research efforts in object image recognition and classification field ( Mikolajczyk et

Proposed CNN Model for Classification of Advertisement Images
Since 'Imagenet-classification' in 2012 (Krizhevsky et al., 2017), CNN is considered as state-of-the-art for image recognition and classification. The proposed research explores various CNN-based image classification models for classifying the advertisement images collected across a range of newspapers.

CNN-Hyperparameters
Many CNN parameters need to be tuned for achieving the desired accuracy including: Batch-size: In CNN training, batch size is the number of training samples shown to the network before the weight updation. CNNs are sensitive to batch size.

No. of Epochs:
The number of epochs represents the total number of times the entire training data samples are presented to the network for training purposes.

Number of neurons in each hidden layer:
It is an important parameter to tune. It should ideally be optimized with batch size and number of epochs. On the other hand, Keras is a high-level API (Application Programming Interface) that work as a wrapper to low-level libraries of Tensorflow.

Dataset
There is a lack of standard data set of advertisements from English newspapers and hence the authors have created their own dataset of advertisements from various online English newspapers which were free to download at the time of advertisement data collection (May, 2019 to Sept 2020) including 'Times of India', 'Hindustan Times', 'Indian Express' and 'The Tribune'. A balanced dataset of 4400 images is created with 1100 images in each of the four categories namely: (1) Admission_Notices (including admission-notices, advertisements of educational institutes, coaching classes, scholarships etc.) (2) Job_Ads (3) Sales_and_Promotion (4) Tenders (including tenders, bids, auctions, request for proposals etc.)

Figure 1. Division of dataset in Training, Validation and Test sets
As shown in Figure 1, Advertisement dataset (4400) is divided in to Training Data set (80% i.e. 3520) and Test Data set (20% i.e. 880). Training data set is further divided in data for training use (80% of training data i.e. 2816) and data for validation use (20% of training data i.e. 704).

Performance Measures
To compare the results of various models, different performance measures are used including:

Results and Discussion
Three CNN models with different hypermeters are trained on the advertisement dataset and are evaluated to ascertain the validity of each model.

Adv
CNN requires huge datasets in order to achieve higher accuracies. Since, our dataset is limited to 4400 images, the achieved accuracy (65%) is low as expected. It is also observed that when more layers are added to this simple network, the performance of the network further degrades resulting in lesser accuracies.

Model 3 (Fine-tuned ResNet50)
Model 3 also uses 'Transfer learning' with pretrained ResNet50 model but this time few more layers at the end of the model architecture (Global_Average_Pooling2D + Fully connected (fc-1) with 512 output units + 'ReLU' + Dropout (0.5 or 50%) + Fully connected layer (fc-2) with 256 output units + 'ReLU' + Dropout (0.5 or 50%) + Fully connected layer (fc-3) with 4 output units + 'Softmax' classification layer) are trained using advertisement dataset along with the classifier as shown in Figure 6.  Table 4 shows the evaluation results of Model 3 on the test set and Figure 7a shows the 'Confusion matrix' whereas Figure 7b shows the training and validation accuracy curves.  The confusion matrix for Model 3 (Figure 7a) shows the improvement in the recognition of all the advertisement categories including 'Sales_And_Promotion' (191/228), 'Tenders' (167/ 213), 'Job_Ads' (147/216) and 'Admission_Notice' (142/223) as compared to Model 2 and hence the overall accuracy is also improved (74%) significantly as reflected in the validation accuracy curves (Fgure 7b).

Comparison of the Results obtained
The results of all the three CNN-based models trained and tested on advertisement dataset from online English Newspapers are summarized in Table 5. It is found that Finetuned-ResNet50 which is trained on last few layers gives the best accuracy of 74% among the three evaluated models and is the most suitable model for advertisement image classification from online English newspapers. ResNet50 + Classifier 68% 3.

Conclusion
The choice of image classification technique (learning algorithm) always depends on the task at hand. CNN just needs an image dataset for training and automatically learns the features from the dataset and this property makes it the most suitable technique for advertisement image classification from online English newspapers. There is a lack of standard dataset of advertisements from English language newspapers and the authors have created their own dataset of advertisements from four different online English newspapers. The proposed research designed, implemented and trained three CNN-based advertisement image classification models on the advertisement dataset. These models can classify the input advertisements from different English newspapers into four pre-defined advertisement categories. Simple CNN model with two convolutional layers gives less accuracy of 65% as CNN requires huge dataset for training but the training dataset of advertisements is small in size (4400 images). Using ResNet50 for feature extraction and training the classifier on the advertisement dataset improves the accuracy (68%). Fine-tuned ResNet50 model (which is trained on last few layers using advertisement dataset) is found to be the most suitable model exhibiting around 74% accuracy. Future work includes increasing the accuracy of the model by increasing the size of the advertisement dataset so that more image samples are available to learn from. Future scope also includes filtering out the non-advertisements before advertisement images are classified in to different categories. More advertisement categories like: Public Notices, Matrimonial advertisements, Remembrance messages, Political advertisements etc. can be added in the future extension. The proposed research when clubbed with OCR techniques can also provide for keyword-based advertisement search in different advertisement categories in online English newspapers enhancing reader's online newspaper reading experience for advertisement search many folds.