Performance Analysis of Big Data with Data models using Artificial Intelligence

Main Article Content

L.Umarani, Dr. A. John Sanjeev Kumar

Abstract

Proliferation of new information sources such as medical images, financial data, sales data, radio frequency identification and web tracking data, there is a challenge to decipher trends and gain sense of data that is orders of magnitude larger than ever earlier. One of the technologies most often associated with the era of big data is Hadoop. Although in that respect is much expert information about Hadoop, there is not much info around how to effectively structure data in a Hadoop environment. Though the nature of parallel processing and the MapReduce system provide an optimal environment for processing big data quickly, the structure of the big data itself plays a vital role. This paper explores doable used for data modeling in a Hadoop environment. Specifically, the purpose of the experiments described in this paper was to figure out the best structure and physical modeling techniques for storing data in a Hadoop cluster using Hive to enable efficient data access. Although other software interacts with Hadoop, the experiments focused on Hive. The Hive infrastructure is most felicitous for traditional data warehousing-type applications. The experiment does not focus on HBase. This paper explores a data partition strategy and investigates the role indexing, data types, file types, and other data architecture decisions play in designing data structures in Hive. To test the different data structures, it focused on typical queries utilized for analyzing web traffic data. These test included most referring sites, web analyses such as counts of visitors, and other typical business questions used by weblog data.   The primary measure for culling the optimal structure of data in the Hive is predicated on the performance of web analysis queries. For comparison purposes, it was quantified the performance in Hive and the performance in an RDBMS. The reason for this comparison is to more preponderant understand how the techniques that we are habituated with utilizing in an RDBMS work in the Hive environment. It explored techniques such as storing data as a compressed sequence file in Hive that are particular to the Hive architecture. Through these experiments, it endeavored to show that how data is structured (in effect, data modeling) is just as consequential in an immensely colossal data environment as it is in the traditional database world.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Article Details

How to Cite
L.Umarani, Dr. A. John Sanjeev Kumar. (2023). Performance Analysis of Big Data with Data models using Artificial Intelligence. Turkish Journal of Computer and Mathematics Education (TURCOMAT), 10(12), 69–76. https://doi.org/10.17762/turcomat.v10i12.13373
Section
Articles