M-Cuckoo and SVM Classification Algorithm Based Opinion Mining

Opinion Mining or Sentiment Analysis is a task in the processing of natural language to find the customers' mood about buying a specific product or subject. It involves developing a framework in many online shopping sites to gather and review opinions about the product made. Opinion mining is a sub-field of the mining of web content. Data mining is a branch of Web content mining. Opinions are statements that reflect the opinion or sentiment of individuals. Opinion on objects or events is also given in this statement. For any person, reviewing consumer review is more relevant in making the right buying product and organization decision. CS is the best search algorithm inspired by cuckoos' breeding behavior. It provides a short overview of the natureinspired algorithm's applications. The CS algorithm is used in various fields, such as business, image processing, wireless sensor networks, flood forecasting, document clustering, speaker recognition, distributed system shortest path, health sector, job scheduling. In terms of better efficiency and less processing time, the Cuckoo algorithm performs various nature-inspired algorithms. Therefore, this research paper proposes a hybrid feature selection which is a combination of cuckoo search and mRMR (Minimum Redundancy Maximum Relevance) algorithm. Due to the subjective nature of social media reviews, hybrid feature selection technique outperforms the traditional technique. The performance factors like f-measure, recall, precision, and accuracy tested on Amazon dataset using Support Vector Machine (SVM) classifier.


INTRODUCTION
As we all know very well, e-commerce websites are becoming increasingly popular all over the world. Because of its ease, convenience, reliability, and speed, consumers shift more towards online transactions instead of going to the markets. There are a range of websites for online shopping available on the internet, such as Amazon, Flipkart, Zovi, etc. These websites allow users to purchase items quickly and at a lower price [3]. From these places, many attractive and day-to-day useful items such as books, electronic products, home appliances, clothes, and footwear are sold. These websites provide consumers with an opportunity to write their analysis of their product they purchase from these pages. These feedback or thoughts are really beneficial to consumers, product suppliers as well as website developers [4]. Users who are in difficulty purchasing a product should read feedback from these websites about the particular product so that they can have a view of their product before buying it and also know which is in the 1st spot. In order to solve optimization problems related to engineering designs, data mining, machine learning and image processing, metaheuristic algorithms are now commonly used. Typically, these algorithms are simple and versatile, easy to implement and efficient in practice. There are still some fields that are less explored and implemented despite their progress, and so there are more research opportunities in many areas. As there are so many metaheuristic algorithms, all the problems in general cannot be solved. Instead, we focus on the applications and opportunities concerning cuckoo search (CS) and its variants.

LITERATURE REVIEW
Nowadays, online social media is online discourse where people contribute to create content, share it, bookmark it, and network at an impressive rate. In [1] Abd. Samad Hasan Basaria, Burairah Hussina et al., attempts to use the messages of twitter to review a movie by using opinion mining or sentiment analysis. Opinion mining refers to the application of natural language processing, computational linguistics, and text mining to identify or classify whether the movie is good or not based on message opinion. Support Vector Machine (SVM) is supervised learning methods that analyze data and recognize the patterns that are used for classification. This research concerns on binary classification which is classified into two classes. Those classes are positive and negative. The positive class shows good message opinion; otherwise the negative class shows the bad message opinion of certain movies. This justification is based on the accuracy level of SVM with the validation process uses 10-Fold cross validation and confusion matrix. The hybrid Partical Swarm Optimization (PSO) is used to improve the election of best parameter in order to solve the dual optimization problem. The result shows the improvement of accuracy level from 71.87% to 77%.
In [6] Harshit Sanwal, Sanjana Kukreja presented opinion mining and summarization of hotel reviews on the web. For opinion classification of hotel reviews we used SVM with Particle swarm optimization (PSO) algorithms Intentions are expressed in a different way with different vocabulary, short forms, and jargon making the data massive and disorganized. The proposed approach is termed sentiment polarity that automatically prepares a sentiment dataset for training and testing to extract unbiased opinions of hotel services from reviews. A comparative analysis was established with compliment Naïve Bayes and Composite hyper cubes on iterated random projections to discover a suitable SVM with Particle swarm optimization(PSO) for the classification component of the proposed approach. Features are an important source for the classification task as more the features are optimized, the more accurate are results. Therefore, in [3] Dipti Sharma, Munish Sabharwal proposed a hybrid feature selection which is a combination of Particle swarm optimization (PSO) and cuckoo search. Due to the subjective nature of social media reviews, hybrid feature selection technique outperforms the traditional technique. The performance factors like f-measure, recall, precision, and accuracy tested on twitter dataset using Support Vector Machine (SVM) classifier and compared with convolution neural network. Experimental results of this paper on the basis of different parameters show that the proposed work outperforms the existing work. The Cuckoo Search is an optimization algorithm developed by Yang and Deb in 2009. It is used in solving optimization problems. It was inspired by a bird species named cuckoo that lays their eggs in the nest of other host birds. The cuckoo egg laying and breeding is the first basic motivation for the development of new optimization algorithm. This optimization algorithm increases the efficiency, accuracy, and convergence rate. In [9] Venkata Vijaya Geeta. Pentapalli, Ravi Kiran Varma P, reviewed about the cuckoo search algorithm and also about the optimization and its problems. Different categories of the cuckoo search and several applications of the cuckoo search are reviewed. Privacy-preserving data mining (PPDM) is a novel approach that has emerged in the market to take care of privacy issues. The intention of PPDM is to build up data-mining techniques without raising the risk of mishandling of the data exploited to generate those schemes. In [5] G.K. Shailajaand C.V. Guru Rao developed a novel PPDM technique, which involves two phases, namely, data sanitization and data restoration. Initially, the association rules are extracted from the database before proceeding with the two phases. In both the sanitization and restoration processes, key extraction plays a major role, which is selected optimally using Opposition Intensity-based Cuckoo Search Algorithm, which is the modified format of Cuckoo Search Algorithm. Here, four research issues, such as hiding failure rate, information preservation rate, and false rule generation, and degree of modification are minimized using the adopted sanitization and restoration processes. Cuckoo search (CS) is an efficient swarm-intelligence-based algorithm and significant developments have been made since its introduction in 2009. CS has many advantages due to its simplicity and efficiency in solving highly non-linear optimization problems with real-world engineering applications. In [7] Iztok Fister Jr, Dusan Fister et al., provides a timely review of all the state-of-the-art developments in the last five years, including the discussions of theoretical background and research directions for future development of this powerful algorithm. Optimization techniques play key role in real world problems. In many situations where decisions are taken based on random search they are used. But choosing optimal Optimization algorithm is a major challenge to the user. In [8] Manar Abdulkareem Al-Abaji reviewed on Cuckoo Search Algorithm which can replace many traditionally used techniques. Cuckoo search uses Levi flight strategy based on Egg laying Radius in deriving the solution specific to problem. CS optimization algorithm increases the efficiency, accuracy, and convergence rate. Different categories of the cuckoo search and several applications of the cuckoo search are reviewed. In [4] Edison Marrese Taylor, Juan D. Velasquez et al., extended the Bing Liu's aspect-based opinion mining technique to apply it to the tourism domain. Using this extension, we also offer an approach for considering a new alternative to discover consumer preferences about tourism products, particularly hotels and restaurants, using opinions available on the Web as reviews. An experiment is also conducted, using hotel and restaurant reviews obtained from Trip Advisor, to evaluate our proposals. Results showed that tourism product reviews available on web sites contain valuable information about customer preferences that can be extracted using an aspect-based opinion mining approach. The proposed approach proved to be very effective in determining the sentiment orientation of opinions, achieving a precision and recall of 90%. However, on average, the algorithms were only capable of extracting 35% of the explicit aspect expressions. One important problem in sentiment analysis of product reviews is to produce summary of opinions based on product features. In [2] Arti Buche and Dr. M. B. Chandak surveyed and analyzed in this paper, various techniques that have been developed for the key tasks of opinion mining. We have provided an overall picture of what is involved in developing a software system for opinion mining on the basis of our survey and analysis. Most online marketplaces in Indonesia provide review or feedback feature in order to enhance customer's satisfaction. However, there is a large number of unstructured opinions and every opinion can discuss one or more aspects. In [10], Zulva Fachrina, Dwi H. Widyantoro proposed a combination of rule-based and machine learning approach to classify aspect and its sentiment of online marketplace opinions. We use Support Vector Machine and Naïve Bayes Classifier for classifying opinions. The evaluation uses 2960 reviews from various categories collected from Indonesian online marketplace site. The best method for quality, accuracy, service, communication, and delivery aspect is machine learning SVM with rule-based as one of the features while the best method for packaging and price aspect is using rule-based only. The average f-measures for all aspects ranging from 78.9% to 92%.

PROPOSED METHOD
Opinion is the opinion of a person reflecting in a specific sense their opinions, beliefs or conclusions in relation to a matter of interest and is generally considered to be subjective in nature. Studies indicate that stakeholder views have a huge effect on decision-making by individuals as well as groups such as governments and organizations rather than evidence. Opinion mining and sentiment analysis, the words that are used interchangeably these days are a field of text data mining that involves extracting opinions from evaluative texts and classifying the polarity of the opinion as positive or negative based on the orientation of the text results after the computational treatment of opinions expressed towards the main features. Natural Language Processing(NLP) techniques are often used in conjunction with KDD methods for different stages of opinion mining, such as opinion statement detection, feature recognition, opinion extraction, polarity determination and opinion summary, since opinions are expressed in human language. Supervised machine learning techniques centered on algorithms such as Support Vector Machine (SVM), Naïve Bayes (NB), K Nearest Neighbor (KNN) and Maximum Entropy, among the lexicon-based approaches and machine learning methods, are widely used to evaluate polarity for the purpose of classification using a large number of labeled training data. In this research work mRMR algorithm is combined with Cuckoo Search algorithm and proposed new algorithm called M-Cuckoo Search algorithm. Opinion mining process is a three step process. First step is Pre-processing where cleaning process takes place. Second step is feature extraction step, which is very important step for better result. Third step is classification where the result will be opinion like negative, neutral and positive. In this research work Support Vector Machine algorithm is used for classification

DATA PRE-PROCESSING
The pre-processing consists of translating textual information into simple components, removing contradictions for its potential interpretation. In this research work Text Tokenization, Stop Words Removal, Letter Replacement and Punctuation Removal are used for Data Pre-Processing. Text Tokenization -Tokenization is the process of breaking a string sequence into pieces called tokens, such as words, keywords, phrases, symbols and other components. Tokens may be words, phrases or even whole sentences that are individual. Some characters, such as punctuation marks, are discarded in the process of tokenization. For another method such as parsing and text mining, the tokens become the input. Stop Words Removal -An important step in NLP text processing is the elimination of stop words. It involves filtering out words of high frequency that, for example, add little or no semantic meaning to an expression, which, to, at, for, is, etc. For example: "Hello, I'm having trouble logging in with my new password", it may be useful to remove stop words like "hello", "I", "am", "with", "my", so we're left with the words which helps to understand the topic of the ticket: "trouble", "logging in", "new", "password". Letter Replacement -Other than English, there are different types for certain letters. The multiple forms of each of these letters have therefore been substituted by some previous research into the default type. Punctuation Removal -Most punctuation marks, such as commas and full stops, are not helpful for polarity detection.

FEATURE EXTRACTION
Feature selection in sentiment analysis tackles a number of issues, including broad function space, redundancy, noise characteristics, context sensitivity, domain dependence, and restricted work on Lexico-structural characteristics, among others [3]. The primary objective of the choice of features is to increase the classifier's output by selecting only useful and relevant features and eliminating obsolete, irrelevant and noisy features and thereby reducing the vector function. In addition, where classification algorithms are unable to scale up to the size of the feature set in terms of time and space, extracting specific and distinct characteristics is also imperative. The lack of proper feature selection technique will result in more resources and more processing time being consumed by the classifier. The first and foremost challenge in feature extraction is to select the minimal feature subset without any loss of classification accuracy. A variety of terms as candidate features are considered in a generic emotion classification task, but only a few convey feelings in essence. When they turn the classification process down, this collection of additional features have to be pruned and appear to decrease the classifier's accuracy. Feature selection therefore requires looking for optimal subsets of features using certain search strategies. The search may be thorough or approximate, but it is not possible for large datasets and social media data typically have enormous dimensionality. Exhaustive search offers an optimal solution. In this case, exhaustive search becomes impractical as finding the optimal subset of features falls in the NP-hard problems group as for N number of features, the number of possible solutions would be exponential to 2N. So the focus of researchers has now shifted to meta-heuristic algorithms, which are taken as a subclass of approximate methods. In this research work mRMR algorithm is combined with Cuckoo Search algorithm and proposed new algorithm called M-Cuckoo Search algorithm. Using this M-Cuckoo algorithm Amazon dataset is preprocessed for better result.

mRMR (MINIMUM REDUNDANCY AND MAXIMUM RELEVANCE)
The minimum redundancy and maximum relevance (MRMR) based feature selection algorithm iteratively selects features that are maximally important to the prediction task and minimally redundant with the collection of features already selected, unlike univariate feature selection methods that return a subset of features without accounting for redundancy between the selected features. The following Algorithm 1 shows an mRMR (Minimum Redundancy Maximum Relevance) algorithm.

MODIFIED CUCKOO ALGORITHM
A meta-heuristic algorithm inspired by the bird cuckoo is Cuckoo search; these are the birds of the "Brood parasites." It never builds a nest of its own and lays its eggs in another host bird's nest. The Cuckoo is a parasite of the best known brood. The intruding cuckoo may be directly involved with certain host birds. If the host bird detects the eggs that are not their egg, they either throw away the eggs from their nest or simply delete their nest and establish a new nest. Each egg in a nest represents a solution and the cuckoo egg is a new and good solution. The solution obtained is a new solution based on the current one and the alteration of some functions. Each nest has one cuckoo egg in the simplest form, representing a series of solutions in which each nest would have several eggs. CS is successfully used to address scheduling issues and is used in structural engineering to solve design optimization issues. In many applications, such as reorganization of expression, task preparation, global optimization, M-Cuckoo search idealized such breeding behavior and can be applied to various optimization problems. 1. Each cuckoo lays one egg at a time and dumps it in a nest chosen at random. 2. For the next generations, the best nests with good egg quality will carry. 3. The number of available host nest is fixed and if a host bird identifies the cuckoo egg with the probability of pa= [0,1] then the host bird can either throw them away or abandon them and build a new nest.

OPINION MINING
Optimization is a device modification process to make certain features function more effectively or find alternative output under certain constraints, by optimizing desired parameters and minimizing the undesired parameters involved in the problem as much as possible. Opinion mining and sentiment analysis is a technique used in text documents to identify and extract subjective data. It is a form of text analysis that uses machine linguistics and the processing of natural language to automatically classify and extract feelings or opinions from text (positive, negative, neutral, etc.). Opinion mining is the identification of user opinions from feedback on a specific topic. A relatively new predictor method, both in the case of classification and regression, is the Support Vector Machine (SVM). The Support Vector Machine (SVM) is a collection of directed learning techniques used for regression classification and analysis that analyzes data and recognizes patterns. Vladimir Vapnik developed the initial SVM algorithm. The following Algorithm 3 shows SVM (Support Vector Machine) algorithm. This research work used SVM algorithm for classification after feature set extraction

RESULT AND ANALYSIS
In this experiment, the Amazon appliances dataset is used. The experiment done on two stages. Initially the dataset is preprocessed using Dialect Replacement, Text Tokenization, Stop Words Removal, Letter Replacement and Punctuation Removal. After preprocessing over feature set is extracted using Cuckoo Search algorithm. Then this result will be classified using Support Vector Machine algorithm. In second stage the preprocessed dataset is implemented on proposed algorithm called M-Cuckoo Search algorithm. Then using Support Vector Machine algorithm the result is produced. Finally, the result produced by two algorithms (Cuckoo using SVM and M-Cuckoo using SVM) are compared and proved that M-Cuckoo Search based SVM opinion mining produced more accuracy, precision, f-measure and recall compared with Cuckoo based SVM opinion mining.

CONCLUSION
Opinion mining has emerged as an active domain among the research fraternity because through www, viz., ecommerce websites, social networks, discussion forums, blogs etc., a huge amount of heterogeneous user data is constantly growing every day. Opinion mining is an issue with text classification in which the evaluation text document is categorized as a positive or negative opinion review in groups. In this research work new algorithm is proposed by combining mRMR with Cuckoo algorithm and proposed MCuckoo algorithm. The proposed MCuckoo algorithm worked on the opinions of the reviewers on the Amazon Dataset. Finally the proposed MCuckoo with SVM classifier algorithm provides better result.