An Improving Method for Performance of DNN using 2-bit Quantization
Main Article Content
Abstract
Background/Objectives: Recently, interest in AI(Artificial Intelligence) has increased, and many studies are being conducted to enable AI to be used in embedded and mobile environments. Among them, quantization is one of the methods to reduce the size of the model, and most quantization of less than 8 bits cannot be implemented without additional hardware such as FPGA. With this in mind, in this paper, we propose two new algorithms that can implement 2bit quantization in software.
Methods/Statistical analysis: In this paper, we propose a packing operation that quantizes a weight consisting of 32-bit real values into 2 bits, stores four 2-bit quantization weights in one 8-bit memory, and a Masking Matrix Multiplication function that performs the calculation of the packed weight and input values. These functions operate in parallel in the GPU memory.
Findings: The quantization model using the above function showed about 16 times more memory saving and 4 times faster when comparing the operation with the existing 32bit model. Nevertheless, the DNN model showed an error of around 1% in learning using MNIST and HandWritten data, and the CNN model showed an error of around 1% in learning using EEG (Electroencephalograpy) data.
Improvements/Applications: The function used in this study is focused on the domain of DNN, and although extended to CNN, quantization could be performed only in the FC (Fully Connected) part. To apply to the convolution layer, an additional function is required, and it is necessary to check whether the difference in accuracy is small even in a more complex data set in the future.
Downloads
Metrics
Article Details
You are free to:
- Share — copy and redistribute the material in any medium or format for any purpose, even commercially.
- Adapt — remix, transform, and build upon the material for any purpose, even commercially.
- The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
- Attribution — You must give appropriate credit , provide a link to the license, and indicate if changes were made . You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Notices:
You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation .
No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.