Improving the Performance of Multidimensional Clinical Data for OLAP using an Optimized Data Clustering approach

Medicine is a fresh way to utilize for curing, analyzing and detecting the diseases through data clustering with OLAP (Online Analytical Processing). The large amount of multidimensional clinical data is reduced the efficiency of OLAP query processing by enhancing the query accessing time. Hence, the performance of OLAP model is improved by using data clustering in which huge data is divided into several groups (clusters) with cluster heads to achieve fast query processing in least time. In this paper, a Dragon Fly Optimization based Clustering (DFOC) approach is proposed to enhance the efficiency of data clustering by generating optimal clusters from multidimensional clinical data for OLAP. The results are evaluated on MATLAB 2019a tool and shown the better performance of DFOC against other clustering methods ACO, GA and K-Means in terms of intra-cluster distance, purity index, F-measure, and standard deviation


Introduction
The huge amount of data is collected in the form of data warehouse [1,2,3] to combine all the information about organisations. This data information is very difficult to access in minimum time due the big data for OLAP. to improve the performance of OLAP, the data is organised in several groups to save the accessing time and query processing cost. This organisation of data into groups is known as data clustering. KPI (Key Performance Indicator) [4] is also merged with OLAP [5] to perform fast query processing.
OLAP is also used for power cost examination in marketable areas to diminish the expenses with increasing the influence performance [6,7]. A multidimensional data is used for calculating the influence expenses at various stages of simplification. Hence the rule association is developed with OLAP to obtain efficient results on numerous building data using UML (Unified Modeling Language) [8] and SQL (Structured Query Language) [9,10]. The decision support system is also developed with data clustering for fast accessing the huge data [11,12] with maximum accuracy of information with respect to future aspects [13,14].
Here, several researchers introduced data clustering techniques for improving the efficiency of multidimensional data model [15,16]. K-Means is one of the widely useful clustering techniques for simple and easy development for huge amount of data. But, there is still some drawback in K-Means like highly dependable on initial cluster. So, here we utilized the optimization for data clustering on huge multidimensional data sets to obtain optimal results by removing the limitation of K-Means. The GA (Genetic Algorithm) and ACO (Ant Colony Optimization) are two most popular optimization approaches are used with data clustering to improve the quality of clustering. In this work, we implemented a DOFC (Dragon Fly Optimization based Clustering) approach on clinical multidimensional datasets to generate optimal clusters with cluster centroids and compared the results with ACO, GA and K-Means in terms of several parameters.

A. DFOC approach
Dragon Fly Optimization (DFO) approach is a nature inspired methodology which is stirred by dragon fly`s stagnant and energetic behaviour on the basis of examination and utilization. DFO offers three crucial standard Severance (SR), Configuration (CF) and Consistency (CS) and two former significant convictions of brimming Foodstuff sources Appeal (FA) and Opponent Escaping (OE) represented in (1) to (5).
Here, X =dragonfly individual location, The speed vector is evaluated by utilizing (6), then dragonfly`s location is updated through (7).
Here, sr, cf, cs, fa, oe and wt are steady coefficient.

DFOC approach
START Assign N data entities as cluster centroids randomly.  (1) to (5) Update neighbour`s area If (minimum 1 neighbour locates in dragonfly area) Update speed vector by (6) Update location vector by (7) Else Update location vector by (7) End If Confirm and accurate next location of dragonfly based on capricious restrictions End While End For

STOP
In DFOC, the DFO is applied on multidimensional clinical datasets to obtain optimal clusters with cluster heads (centroids) with minimizing the intra-cluster distances among data elements. In DFO, every cluster is assigned as dragonfly and each data entities are assigned as explore agents. All dragon fly`s positions are updated according to fitness standards with reducing the intra-cluster distances among data entities to find out the optimal clusters with centroids.

B. Multidimensional Clinical Datasets
The DFOC is applied on several multidimensional clinical datasets describing in table 1.

Result and Analysis
The DFOC is implemented on all four clinical data sets (table 1) on MATLAB 2019a tool. The results are obtained in terms of intra-cluster distance, purity index, F-measure, and standard deviation over 1000 repetitions.

A. Intra-cluster distance
It is explained as the mean distance among data entities in identical cluster. It must have least value for optimized clustering.

B. Purity Index
It is illustrated the frequent clustering of data entities by using (8). It must have maximum value for optimized clustering.

C. -Measure
It is obtained from precision (prec) and recall (rcl) for data reclamation by (9)

D. Standard Deviation
It is explained the data clustering strength about the mean standards using (13). It must have least value for optimal clustering.
Here, de = data entity in dataset, de = mean of data entities in a dataset.   The results in table II to table VI and figure 1 to figure 4 illustrates the better quality results of DFOC on all four multidimensional clinical datasets against K-Means, GA and ACO in terms of intra-cluster distance, F-measure, purity index and standard deviation. Due to better examination and utilization, DFOC improves the search space in global area for generating optimal cluster, hence DFOC generates enhanced outputs as compare to prior approaches.

Conclusion
In this work, a Dragon Fly Optimization based Clustering (DFOC) approach is implemented to improve the performance of data clustering by obtaining optimized clusters from multidimensional clinical data for OLAP. The outcomes are examined on MATLAB 2019a tool and illustrated the superior efficiency of DFOC as compared to prior approaches ACO, GA and K-Means in terms of intra-cluster distance, purity index, F-measure, and standard deviation.