A Framework of Big Data as a Service Platform for Access Control & Privacy Protection Using Blockchain Network

Big data as a service is used in today’s scenario to handle and process the big amount of data which are generated from different source every day. Since data is stored on the cloud platform, the system could suffer a failure and give attackers the opportunity to launch various categories of attacks. Many researches have been done in this domain to provide security and protection to the data on cloud. The blockchain technology is a secure, distributed and privacy-preserving decentralized ledger where the transactions are flexible, secure, verifiable and permanent way. Here, the transaction data is encrypted and kept in a wrapped block (i.e., record) which are spread through the n/w in a provable and unabashed mode across the entire network to enhance information security and data privacy. In this paper we have proposed a framework for an access control with privacy protection in BDaaS based on blockchain technology. Here blockchain technology is used only for storing the transaction log information whenever any kind of event log occurred in system. keywords: Big Data as a Service, blockchain, privacy, access control


INTRODUCTION
As we know that in today's scenario Data is very important and key assets for any organization since many times decision are based on these data. Data is generated from dissimilar sources like sensor data, social networking sites data (Facebook, twitter, WhatsApp, etc.), educational data, and government data and so on. These kinds of data can be structured, semi structured and unstructured in nature. To handle and process such kind of huge data a new concept Big Data as a Service (BDaaS) was presented in [1], which is a new conceptual model that combines the storing and computation abilities of cloud computing with processing power of big data for delivering data, database, data analysis, and processing platform services, along with traditional service models (Paas, Saas and IaaS). BDaaS is a cloud-based framework for delivering the pointto-point Big Data solutions to the business organizations on the requirement basis. It is also defined as combined capabilities of Data as a Service (DaaS), Hadoop as a Service (HaaS) and Data Analytics as a Service (DAaaS). It includes different service models to fulfill the specific demands of big data systems. A BDaaS cloud infrastructure must offer functionalities as Big Data Storage, Computing, Data, Database and Analytics Software as a Service [1]. Although there are a lot of benefits of using BDaaS but at the same time access control and privacy of the data become very important and critical issue when data is in rest, in motion or in process since data is kept on cloud storage which are scattered. According to the published report of cloud security alliance in 2017, access control is one of the most critical security issues [12]. Researchers have developed various methods for the access control and privacy of data but still there is a need to analyze it such as data violation, data exposure, and malicious activities done by cloud users [2] [3]. Therefore, cloud service providers do not assure about what levels and type of protection are needed for suitable big data security and privacy. It means that the previously mentioned issues which are related to access control and privacy of the user's data must be taken into the process in adoption of cloud computing services.
In the recent years, the Blockchain technology can be a good solution which has introduced on the market to prove a secure decentralized atmosphere for information sharing [4] [5]. Blockchain was primarily designed for exchanging crypto currency as its basic and primary technology, but it can be used in other application areas for providing security and privacy to the data for example educational systems [9], Internet of Things (IoTs) [6], smart city [8], smart home [7], and healthcare [10]. Bitcoin is the first and most popular application of the numerous upcoming Blockchain operations in real world applications. Technically, Blockchain is a scattered and decentralized community ledger (record) that holds whole transactions grouped in blocks that ever completed in the N/W. The Blockchain technology works on Point-to-Point (P2P) N/W where each node keeps a copy of the Blockchain record. In this system there is no central regularity authority to manage the Blockchain databases. Blockchain technology ensures the protection of the data kept in Blockchain database and keeps safe from the security attacks. In this paper we have proposed a framework for access control & data protection in BDaaS based on Blockchain technology for the purpose of providing the access control and data protection to the users who need data storage, processing, computing, Analytics, etc. as a service from Big Data as a Service platform.
The rest of paper is organized as follows. In the next section, we have briefly described the background of BDaaS technology and Blockchain technology followed by the proposed BDaaS framework based on Blockchain N/W. Then, we have discussed the security and privacy significances of the proposed framework. And finally, at the last, we have concluded our work.

BIG DATA AS A SERVICE (BDAAS)
BDaaS is a new direction to get valuable and clear perception from big data. It is also a new class of service type. By enclosing diverse data as a service, it covers the variations on data structure and descriptions, and the users only concern about their need and get the service whenever and wherever they want to store the data, analyze the data and visualize the data [11]. It offers users popular Big Data-related services to improve productivity and minimize costs. It provides different levels of abstractions to the users and typically includes three layers as Big Data infrastructure as a Service (Storage as a Service and Computing as a Service), Big Data Platform as a Service (Data as a Service and Database as a Service) and Big Data Analytics Software as a Service [1]. Big Data as a Service platform offers a lot of benefits such as cost reduction, better and fast decision making, better data visualization, better quality, quick response, data management and data analytics. In today's scenario multiple companies like IBM, EMC, Amazon, Microsoft, Google, Oracle, SAP, Snaplogic, etc., have occupied Big Data as a Service market space and provide mainly big data storage and analysis service. For example, EMC offer services for big data storage and data analysis. Greenplum is a tool set of EMC for data storage and analysis, provide storages services and allows user to use the services of Hadoop for BDA. Amazon provides independent BDA services though Amazon Work Space Marketplace. Through Windows Azure Marketplace, Microsoft provides BDA services. Google provides BDA services through Google Big Query.
Since BDaaS is a cloud-based service platform and data is distributed on different servers, users are very much concern about the security and privacy of their data. Also Cloud computing suffers from multiple kind of security issues like data theft, data manipulations, data loss, denial of service, and suspicious or malicious insiders mostly generated from issues such as multi-tenancy, loss of control and trust over data [12] [13]. Therefore, the levels of data security and access control do not ensured by most of the cloud service providers in their SLAs as part of the prescribed terms and conditions b/w service providers and customers. Therefore, it is necessary to think over the term security of data and access control while using Big Data as a Service by all parties involved in it.

BLOCKCHAIN TECHNOLOGY
The Blockchain is new and one of the rising technologies that have grown quickly in current years, and Bitcoin is its highly successful application of it [14]. It is decentralized and distributed by nature. Blockchain is a defined as p2p (pear to pear) distributed database (ledger), used to maintain a list of regularly growing transaction records (known as blocks). These blocks are linked to each other and normally public key cryptography (pkc) is used to provide security in blocks [4]. Formally, a Blockchain is considered as a combination of two parts as blocks or storage units (used to store transaction records that ever completed in the system) and chain or the connection links of all time-stamped transaction records into continuous chain network [15]. Unlike centralized system, in blockchain N/W novel data or transaction is inserted into the blocks and these are distributed to all the nodes participated in that distribute system. Each and every block in this system is denoted through a hash value (securely created) using SHA256 which is the secure hash cryptographic algorithm [4]. In this mechanism the present block (parent) is connected with the subsequent continuous block (child) and hash value of parent is stored in the child block as in figure 1. In this way if the contents of any block are changed then secure hash value will also be modified with it and it will be broadcasted to the whole N/W to nullify the block. A private key is assigned to each and every participant of the Blockchain network for signing(digitally) and validating the transaction they make. As in figure 1 a block contains header and long list of transactions. Block header generally contains timestamp value (denote the time of block creation), a nonce randomly generated by consensus algorithm for computing block's hash value, block version number and PoD which is a securely created hash value that should be always fewer than the present hash value of the block. For summarizing all the transactions within a block, Blockchain technology uses a Merkle tree as in figure 2. Here all the transactions are connected together using this tree [4]. For validating the transactions, all nodes in this network use and run consensus algorithm [16] [17]. In consensus algorithm PoW, the miner nodes need to solve a difficult mathematical puzzle if a new block needs to be added in this network. This process requires great computational power.
In consensus algorithm PoS, pseudo-random voting method is used to choose a node to be the validator of the subsequent block depending on its wealth [17]. In this algorithm no need to solve the accurate puzzle, only the wealth of validator is needed for validating the transaction and block. DPoS is an extension of Proof of Stake (PoS) algorithm that is maintained by an election system for choosing nodes (called witnesses) for verifying the blocks. It is the responsibility of the witnesses to make and add blocks to the blockchain n/w, also to limit mischievous and dangerous nodes from contributing in the task of addition the blocks in the N/W.
Blockchain network can be categorized as private, public and consortium chain [17], [18]. The public chain (also known as permissionedless chain) is entirely decentralized N/W where any node of this N/W can contribute in the writing, reading, verification, and consensus processes of the records on the chain. The private chain (also known as permissioned chain) is a centralized Blockchain. where the access permission of the records on the chain is regulated by the central or chief authority, and only the restricted nodes are allowed to join the network. The consortium chain (mixture chain) is a partly distributed blockchain where previously elected nodes jointly determine the generation of each block. Other nodes of the N/W can have the rights to access the Blockchain for transactions, but they are not authorized participate in the consensus process.

REVIEW OF LITERATURES
Literature reviews reflects that at first BDaaS was proposed in [1], where the authors simply presented the three service layers as Big Data Analytics as a Service, Big Data Platform as a Service and Big Data Infrastructure as a Service without giving the details about how to design and provide the services to the cloud and also security aspect is also missing. Another framework was introduced in [19] where all the aspects of the big data life cycle (acquisition, storage, processing and visualization) were missing. Another framework was introduced in [20] where all aspects of big data lifecycle was defined but security aspect was missing. A lot of researches have been done in the direction of providing the security of the data in the cloud-based scenario.
In [21] the author has proposed an access control algorithm for Big data cloud to maintain the privacy of user using role-based access control prototype, symmetric encryption, and ciphertext attribute-based encryption. In this paper privacy of user's data is achieved along with access control but it is limited to only small data size which needs to be extended for bigger size data sets in future. In [22] an encryption (attributebased) was proposed by the authors to maintain the privacy using secure hash algorithm, symmetric key approach and Pailier algorithm. Here, anonymous authentication is achieved which provides user revocation and prevent replay attacks but protection of the data in the Cloud may be compromised since access policy for each record is known to the cloud. Another privacy preserving approach for cloud computing was proposed in [23] using Paillier encryption algorithm, elliptic curve encryption algorithm and Eigen-face encoding algorithm. This system takes more time during matching facial encryption and image data, also it fails when database is very small.it needs further improvement to develop automatic biometrics-based authentication system. In [24] a novel and productive system was proposed for sharing information in cloud computing environment using ABE algorithm, distributed hash table n/w, identity-based-time release encryption algorithm. Although, this system provides the security against various attacks but there is a problem in it that users have to depend upon the data owner for assess. In [25] another scheme 1024-bit DNA based encryption was proposed for providing data security in cloud computing environment. Experimental findings indicate that this method is more successful than other existing systems and produces better results than others but still improvements are needed. In [26] a novel concept CP-ABE along with effective authority test was proposed for preserving the privacy and enhancing the access control. There are so many advantages in this scheme but still it suffers from a restriction that it only supports "and" strategy and depend on a weak security model.
In [27] blockchain based frame work for e-governance was proposed to provide privacy and access control using digital signature and encryption techniques. Only high-level concepts are proposed and needs to discover its full potential. In [28] authors have designed a social media network: Ushare, based on block chain using Turing complete relationship system, blockchain, a hash table with encrypted content a local personal certification authority (PCA). This system was also a framework only and needs to be further mathematically valuated. In [29] a framework for bigdata security sharing was designed based on blockchain technology and smart contract. This model provides the solution for the architectural security and protection against forged block attack but the drawback is every node needs more storage and computing power for storing data in the blockchain which needs to Be further investigated. In [30] a secure distributed vehicular network architecture is proposed for smart cities based on block chain and smart contract. The authors have proposed a trust management system where blockchain stores the trust value of the nodes, used to determine the authenticity of the nodes which are involved in the network. This system is more efficient to share the information in vehicular network but there is an issue related to the cost of the system which can be increases when data size will be increases. In [31] an access control ecosystem using blockchain network was proposed for Big Data security using identity-based access control, hyper ledger fabric blockchain and role-based access control. The proposed system is auditable, highly secure and flexible enough to for big data security but there is one challenge that it is new concept which suffers from the proper stability. In [32] the authors have given an analysis report on privacy-preserving techniques for big data analysis in cloud platform. In this paper they also compare the different privacy preserving techniques. In [33] authors have proposed micro blockchain based intrusion detection system to configure ids dynamically based on their location dissimilarity. Another framework BPay (a payment system) of cloud computing outsourcing services was introduced in [34], which was based on the blockchain technology. Also, it was best suited with bitcoin and Ethereum blockchain. In [35] distributed security architecture of cloud storage was proposed where before uploading the files were partitioned into blocks of data which were encrypted, and then the file copy placement problem was solved using genetic algorithm. In [36] a blockchain based cloud database design was proposed to guarantee the integrity and reliability problem in cloud environment. In [37] BlockDS was proposed which is a secure technique of distributed records storage and keyword search facility to resolve the conventional trust on a reliable node in cloud storage system. In [38], the issues of reliability of data sources in cloud system was resolved through blockchain consensus algorithms. In [39] a cloud data deletion protocol was introduced for solving the behavior of corrupt users by altering with data removal outcomes when cloud server is not reliable. In [40] a cloud forensics scheme was proposed which is the combination of blockchain and cryptographic signature techniques.

PROPOSED FRAMEWORK
As we have discussed in the previous section, a lot of research have been done in the area of providing security to the data in cloud computing. Also, a lot research has been done to secure the data in big data. But still there is a requirement to work on analyzing the security requirements in big data as a service platform. Many researchers have proposed better solutions for the access control in cloud computing or big data but no one has extended their solution for the BDaaS platform. Therefore, we have proposed a framework for access control in BDaaS to provide data security using blockchain as tool. Previously framework and architecture of BDaaS were already proposed in [1], [11], [19], [20] but security aspect was not properly analyzed in these frameworks. As we know that blockchain network is most secure network till now. Our proposed framework is illustrated in following figure 3.

Fig3: Proposed Block Chain Based BdaaS Framework
As shown in figure 3 our proposed framework has four components as data owner or producer, BDaaS platform, block chain network and the consumers of the services. Data owner or data producer can be any source from where data is produced such as sensor network data, educational data, social network data, research institutional data, enterprise data, government data, and so on. Such kind of huge data is collected in massive form and can be organized, semi organized and unorganized in nature. BDaaS platform provides the different kind of services like analytics services, storage services, computation services, and so on. A blockchain network ensures the data security and access control during the processes in the network. And the consumers of the services can be any different kind of individuals including a single person or an organization. Consumers may initiate a transaction request of service, that requests the data producer or service producer to supply data or service usage privileges through blockchain and obtain an approved data set or service.
All components of this framework must register to the blockchain n/w. Data owner or producer of the data must register on the blockchain network before uploading the data on the cloud based BDaaS. Big data as a service register on the blockchain network to ensure the security and privacy of the user's data. Consumers of the services also register on the blockchain network before requesting for the specific service. In this framework the consumer of service sends a request for a service or resource to the BDaaS, which sends this query to the blockchain to check whether the service or resource requester is a legitimate user or not. And finally, if service requester's identity is authenticated by blockchain network then BDaaS provide the service which is requested to the service consumer.
Following steps demonstrate how the service requester is authenticated and verified by BDaaS through blockchain network. It is also demonstrated in figure 5. At the same way in process 2 defines the steps involved in granting the permission to upload the data contents by data owner to BDaaS which is also demonstrated in figure 6.

ANALYSIS AND DISCUSSION
As we know that the current internet network platform architecture is based on the centralized server, where all the activities of internet are monitored by a central server. Therefore, the security of the user contents is mostly dependent upon the security of central server. Different kind of security attacks such as DOS, DDOS attack, SQL injection attack, etc are possible on the central server [41]. But distributed blockchain network allows that the block information is maintained by each and every node in the blockchain network. In this scenario hackers can only control few nodes not all. And also, the information stored in blockchain network is encrypted by private key which guarantees the confidentiality of the user's data. If any kind of hacking activity is reported then the consensus model of the blockchain network will ensure that that activity will be rejected.
Our proposed framework will work on the principle of public key cryptography where log information/records stored in blockchain network are secured using public key cryptography that protects against any kind of possible attacks for modification or unauthorized access. Also, consumers and data producers are assigned with their private key through which they are validated. To ensure data Security and track access to the stored log information, the blockchain n/w utilizes digital signature and encryption algorithms. Normally, many of the consensus algorithm used in blockchain network needs to control at least 51% of the network nodes by an attacker for attempting unauthorized access and for modifying the records [42] which is generally impossible. Also, if an attacker wishes to modify any block in blockchain n/w, each and every copy of that block in the network must also be edited and all the nodes must also be convinced that the newly created block is valid, which is impossible, since blocks stored in the blockchain are hashed.
Also, our proposed framework fulfills the requirements of confidentiality, integrity, authenticity and accountability. This model ensures that the user's data stored on the BDaaS will not be revealed to or used by unlawful user. The information transmission among data consumer, data owner, blockchain and BDaaS are encrypted and access control permission among these is also encrypted, shows that the nature of confidentiality. At the time of data uploading by the authorized user, the user first verified by the blockchain through protection mechanism. In this way an unauthorized user or bad system administrator cannot enter into the network and cannot modify the user's data. It shows the integrity of the data. Since all the main components need to be verified and trusted by public key and private key combination in this model, it ensures the authenticity nature of the model. Here the public key of the data owner, consumer, blockchain and BDaaS is publicly available and at the time of receiving the information it needs to be decrypted by the private key of the receiver. And at the last this model keeps track all the log information of the components in the blockchain for the security analysis and track down the entity or events responsible for the security breaches.

CONCLUSION
In this article we have proposed a framework Big Data as a Service Platform to enhance the access control and privacy using blockchain technology as a tool. Since protection of user's data and data modernization is the most important aspect in any organization. Also, companies need timely and targeted analytics on existing big data in a secured manner which can easily be assured by our proposed system and increase the confidence among the companies who wish to port to the cloud. We have also discussed the security and privacy analysis on our proposed framework. All the transaction log data is stored on the blockchain network which ensures that any unauthorized user or even service provider will not be able to perform any changes.so many companies such as google, amazon, Microsoft, oracle, etc. Can be benefitted by using our proposed framework. This framework is only a high-level design and can be mathematically validated in future. Also, machine learning algorithms can be used as a tool for automatically finding and reporting the suspicious transaction.