Brainstorm optimization for multi-document summarization

: Document summarization is one of the solutions to mine the appropriate information from a huge number of documents. In this study, brainstorm optimization (BSO) based multi-document summarizer (MDSBSO) is proposed to solve the problem of multi-document summarization. The proposed MDSBSO is compared with two other multi-document summarization algorithms including particle swarm optimization (PSO) and bacterial foraging optimization (BFO). To evaluate the performance of proposed multi-document summarizer, two well-known benchmark document understanding conference (DUC) datasets are used. Performances of the compared algorithms are evaluated using ROUGE evaluation metrics. The experimental analysis clearly exposes that the proposed MDSBSO summarization algorithm produces significant enhancement when compared with the other summarization algorithms.


Introduction
Document summarization is the process of making a shorter version of the original text without dropping any content from the given document.The summary will help the reader to make a decision about the documents whether it is significant or not [1].The task of summarization is done in two ways such as extractive and abstractive.An extractive summary will extract significant parts such as paragraphs, sentences, etc.An abstractive summary uses linguistic investigation to make a summary [2].
The document summarization is classified into two types based on the number of documents such as single document summarization and multi-document summarization.The single document summarization compresses a given single document to a shorter version.The multi-document summarization process aimed at extraction of information from multiple document sources.The multi-document summarization is a challenging task when compared with a single document summarization due to large search space in multi-documents.The problem of multi-document summarization is accepted as optimization problem.The main aim of the multi-document summarization problem is to generate best possible informative summary of the original documents.
In this research work, BSO algorithm is proposed for solving multi-document summarization problem (MDSBSO).The performance of proposed multi-document summarization algorithm is compared with PSO and BFO summarization algorithms.To the best of the author's knowledge, this is the first research work for solving multi-document summarization using BSO algorithm.The objectives of the research work are as follows, • The proposed summarization algorithm is used to produce optimal summary of the document • Two DUC datasets is used to analyze the strength of summarization algorithms • The performance comparison is analyzed using ROUGE score The organization of this research paper is as follows; the related works are discussed in the section 2. Section 3 discusses about the conventional BSO.Section 4 discusses the proposed MDSBSO.The experimental results and discussions are given in section 5. Finally, conclusion of this research work is discussed in section 6.

Related works
The document summarization gains more attention among many researchers and developers to develop an efficient summarization model to fulfil the requirements of the end user.The nature inspired optimization algorithms plays a major responsibility for solving the document summarization problem.Hence, this section discusses some of the methods in the field of document summarization.Nandhini et al. (2014) designed an improved DE (IDE) algorithm for document summarization problem [27].Ouyang et al. (2011) presents a regression model to make a query-focused multi-document summarization.The support vector regression (SVR) model is used to guess the significance of a sentence from given documents [5].Fattah et al. (2009) designed a new content selection approach for automatic text summarization with two major phases.First, features are trained using GA and mathematical regression (MR) models to achieve an appropriate combination of feature weights.Then, the appropriate feature is considered as inputs to the Gaussian mixture model (GMM) in order to build an optimal text summarization [3].Nandhini et al. (2016) developed an interactive GA-based individualized summarization to exploit the readability of significant sentences [28].Mirshojaee et al. (2020) developed a multi-agent meta-heuristic optimization algorithm (MAMHOA) for extractive text summarization [29].The MAMHOA scheme is a combination of multi-agent systems and biogeography-based optimization (BBO) algorithm.Rautray et al. (2019) developed a new cuckoo search-based multi-document summary extractor (CSMDSE) [30].
Yuan et al. (2020) designed an abstractive summarization method that combines word attenuation with multilayer convolutional neural networks (CNNs) to extend a standard sequence-to-sequence (seq2seq) model [31].Patel et al. (2019) developed new multi-document summarization algorithm to expand good content exposure with information diversity [32].A statistical feature based technique that exploits the fuzzy technique that dealt with the uncertainty and imprecise of feature weight.In addition, cosine similarity used to remove redundant information from the given document to improve the performance.Rautray et al. (2015) developed a new population-based stochastic optimization based summarization for comparisons study to solve document summarization problem.It identifies the relationship between sentences based on similarity and reduces the weight of each sentence to remove summary sentences at different compression stage.A comparison of both the optimization methods based on the fallout value of extracting sentences demonstrates the good performance of PSO in contrast with DE on five English corpus data [9].

Brainstorm optimization algorithm (BSO)
BSO algorithm is a well-known population-based swarm intelligence algorithm inspired by the behaviour of human brainstorm [18].The brainstorm process helps common people to come up with diverse ideas.The good ideas are picked up from the groups of better diverged ideas.In the BSO algorithm, there are four major phases such as initialization, clustering, generation and selection.The description of conventional BSO algorithm is shown in Algorithm 1.

Initialization phase
In the initialization phase, the population is randomly generated with N ideas ( 12 [ , ,....., ] ), where 1 i N   , N -is the population size and D is the problem size in the search space.Along with this, necessary parameters are also initialized at this stage.Step 4.3: If the termination condition is not satisfied then go to step 3, otherwise terminate the process

Clustering phase
The clustering phase is used to generate the diverse ideas for speeding up the ability of searching process.In the BSO, the solutions are separated into several clusters.The clustering process is supported to pick up the good ideas and finds an optimal solution.The k-means clustering algorithm is used to find the cluster center of each cluster corresponds to the ideas, which are considered as optimum ideas among the given populations.In each clustering, the best ideas are recorded as cluster center based on the given threshold values.The probability value replace P employed to control the probability of replacing a cluster center by a randomly generated solution.

Generation phase
The new individual idea generation is used to achieve the global minimum for given solutions.For idea generation by piggyback, the new ideas generation is done with the help of old individual.It is written as Where, i x old is the value of the weighted summation of the th i dimension of Where, log () sig is a logarithmic sigmoid transfer function.

Selection phase
Selection of better idea is the most important task to evaluate the next iteration.In this phase, the cluster center is randomly chosen as optimal value.This phase will not simply perform in all iterations.However, it will perform when the probability value is small.

Proposed multi-document summarization using BSO (MDSBSO)
The BSO algorithm is proposed for multi-document summarization problem and the overview of proposed system is shown in Figure -1.The proposed MDSBSO is categorized into four phases including pre-processing phase, input representation phase, summary representation phase and summary selection phase.

Pre-processing phase
• Sentence Segmentation: Each individual document is denoted as D is segmented as , m is number of tokens/terms.• Removing stop word: Less significance words are removed with respect to the document.For instances, 'a', 'an', and 'the' are low significant words in the English language.• Stemming: Stemming method is used to remove the ends of words to common base form.

Input representation phase
The word form of pre-processed data is used to compute the weights for each sentence which is called a sentence informative score.The sentence informative score is calculated as follows, Here, ij w and iq w represents the title input text weight and the weight of each word in document respectively.The similarity matrix is the comparison of sentence based on their keywords and essential words.

Summary representation phase
The aim of the summary representation phase is extraction of small set of useful information from the given documents.The optimal sentence selection process is performed by BSO algorithm using the sentence informative score based on the threshold value.Algorithm 2 shows the proposed MDSBSO.

Summary selection phase
In this phase, the optimal sentences are selected based on the given threshold value.

Experimental results and discussions
The performance of proposed MDSBSO document summarization algorithm is compared with PSO [36] and BFO [14].The performance measures are calculated using ROUGE tool which is a well-known document summarization measuring tool [37].The performance results are employed using MATLAB R2015 on windows 10 with Intel i3 and 4 GB RAM.

Datasets collections
Two benchmark datasets are used to analyze the performance of document summarization algorithms such as DUC 2006 and DUC 2007.The Table-1 shows the description about the datasets.

Parameter settings
Parameters setting of every nature inspired optimization algorithms are more significant to produce optimal results.An optimal parameter setting is shown in Table-2.

Performance measures
ROUGE is a well-known performance evaluation tool for document summarization problem to analyze the performance of the summarization algorithm.It is a software package that determines the similarity between human generated summary and machine generated summary.The high ROUGE score indicate highly informative summary and the low ROUGE score specify less informative summary.The ROUGE is defined based on various strategies including ROUGE-1, ROUGE-L, ROUGE-S, ROUGE-SU.ROUGE-1 used to asses overlap between the manual summary and the system summary.ROUGE-L calculates the ratio between the length of the longest common subsequence's (LCS) summary and the length of the reference summary.ROUGE-S used to asses overlap between ratio of the set of reference summaries and the candidate summary.ROUGE-SU is the advancement of ROUGE-S and added with unigram as the counting unit.The Precision (7), Recall (8) and F-Score (9)

Results analysis and discussions
The performance of the proposed MDSBSO summarization method obtains the best results when compared with PSO and BFO based summarization methods.Table 3 shows the experimental results of Precision, Recall, and F-Score using ROUGE-1.From the Table-3, it is evident that the proposed MDSBSO summarization algorithm produces higher enhancement when compared with PSO and BFO.According to ROUGE-L, the performance of the proposed MDSBSO summarization algorithm it produced slight enhancement when compared with PSO and BFO and performance results shown in

Research Article
In this research paper, the BSO algorithm is applied to multi-documents summarization to extract optimal summary (MDSBSO).The proposed MDSBSO is compared with PSO and BFO summarization algorithms.The performance of all conversed summarization algorithms assessed in terms of the different ROUGE score.From the experimental results, it is determined that the performance of proposed MDSBSO based summarizer produces significant outcomes better than the PSO and BFO based summarization algorithms.

1 :
Randomly initialize n ideas and required parameters 2: Clustering phase Step 2.1: Cluster n idea into m cluster using clustering algorithm Step 2.2: Assign the ranking values for each cluster and record the best individual idea as cluster center in each cluster Step 2.2: If ( () replace rand P  ) Randomly choose the cluster center Randomly generate an idea to replace chosen cluster center End 3: Generation phase Step 3.1: For i=1 to N If (

12 {
the document.N is the number of sentences in the document.• Tokenization: The sentences are tokenized as 12 { , ...

Figure 1 :
Figure 1 : Overview of proposed multi-document summarization 4. Selection phase Step 4.1: Newly generated ideas are compared with existing ideas and then better ideas are stored as new ideas Step 4.2: If new ideas are generated, go to step 4.3,otherwise go to step 3.3.

Figure 2 :Figure 3 :Figure 4 :Figure 5 :Figure 6 :Figure 7 :
Figure 2 : Performances comparison based on precision values for DUC 2006 Newly generated ideas are compared with existing ideas then better ideas are stored as a new ideas Step 4.2: If new ideas have been generated, go to step 3.1, otherwise go to step 4.3.
Current Iter is current iteration.k is a slope changing value of log () sig .
_Max Iter is the maximum number of iteration._ are the three criteria used to investigate the performance comparisons which are generated by ROUGE metric (Mirshojaee et al., 2020).

Table - 4
.shows performance results of Precisions, Recall, and F-Score using ROUGE-S.From the Table-5, it is evident that the proposed MDSBSO summarization algorithm produced higher accuracy when compared with PSO and BFO summarization algorithms.Similarly, Table-6 demonstrates the performance of proposed MDSBSO summarization model using ROUGE-SU.Figure-2-4 demonstrates the performance comparisons of proposed MDSBSO summarization models on DUC 2006 datasets.Similarly, Figure5-7 illustrates the performance comparisons of MDSBSO summarization model on DUC 2007 datasets.Hence, the experimental results confirmed that the proposed MDSBSO summarization method produced higher accuracy and optimal document summary.