Implementation and Comparison of Radix-8 Booth Multiplier by using 32-bit Parallel prefix adders for High Speed Arithmetic Applications

This paper presents the implementation and design of Radix-8 booth Multiplier using 32-bit parallel prefix adders. High performance processors have a high demand in the industrial market. For achieving high performance and to enhance the computational speed multiplier plays a key role in performance of digital system. But the major drawback is it consumes more power , area and delay. To enhance the performance and decrease the area consumption and delay there are many algorithms and techniques. In this paper we designed a radix-8 Booth Multiplier using two parallel prefix adders and compared them for best optimized multiplier. The number of parital products generation can be reduced by n/3 by using radix-8 in the multiplier encoding. To further reduce the additions we have used booth recoding mechanism .We have implemented the design using Kogge stone adder and Brent kung adder. We observed that by using parallel prefix adders reduces the delay further more which results in significant increase in speed of the digital systems. The simulation results are carried out on XILINX VIVADO software.


Introduction
Now a day"s high performance processor has a high demand in the industrial market. For achieving high performance processors arithmetic operations like addition, multiplication, subtraction is invoked in various digital circuits to enhance the computational speed [1]. Multiplier is the slow element in the system, the system overall performance is determined by the performance of a multiplier. The multiplier is also area consuming, and has high power dissipation which effects the overall performance of the system. Multipliers are used in DSP, lossy applications, artificial neural networks, machine learning and many more . To enhance the performance of the multiplier we have many power reduction algorithms and also various multipliers. Booth algorithm takes less time of computation, it takes less area for the design and also power consumption is very less. To reduce the power consumption booth recoding algorithm is used [2][3][4]. The performance can be achieved by modified model with the help of parallel prefix adders [2]. The entire performance can be increased by using parallel prefix adders. The delay is another factor which can impact on the overall performance of the multiplier. By using parallel prefix adders we can also reduce the time delay in multiplication process. Hence we can achieve a high speed, high performance and low power consuming multiplier.
The multiplication operation has two major steps, those are a. Partial products generation.
b. Partial products addition.
Multiplier speed can be improved by partial products reduction and improving the speed of summation of partial products. b. Reducing the partial products c. Summation of the partial products In booth algorithm, the generation of partial products depends on recoding mechanism. The process uses booth recoding method based on the partial products that can be created for a set of 0`s and 1`s this is called Booths recoding. The main aim of this algorithm is to generate Partial products efficiently. There will be an rise of partial product[6] which depends on the Radix used for recoding. This recoding mechanism provides less power and area. In this booth algorithm we have four blocks they are working as follows:

i.Encoder:
The digital block encoder performs the opposite working of a decoder. It is having 2 n inputs and n-outputs. Here in the booth algorithm we have given the input of encoder as multiplicand B , the output of encoder has given to the partial product generator. The output of encoder is corresponding to the binary code of input value.

ii. Partial Product Generator:
The main purpose of the partial product generator is to simplify the computational process in multiplication.
Here the partial products can be generated for a set of 0`s and 1`s. the partial product reduction helps in improving the speed of summation of partial products.

iii. Adder:
After the generation and decrease of partial products the next step will be summation of the partial products. We can use carry save adder or carry look ahead adder to do this step. There are many other options available some of them are compressor adders, wallance tree adder, parallel prefix adders like kogge stone adder etc.,. By properly choosing an adder we can reduce the delay parameter.

iv. Final product:
The inputs multiplicand A and multiplicand B can be taken as two signed binary numbers , the final product will be the multiplied 2`s compliment number

Radix-8 booth algorithm:
This algorithm is similar to that of radix 4, but the only Difference is that we consider quartets of bits rather than that of triplets. This algorithm minimizes the number of partial products to n/3 where as in radix 4 it is reduced to n/2. This minimization of Partial products leads to fast computational speed and less power and area. Table1 represents the booth encoding table of Radix 8 booth recoding algorithm.
[5] For example , an 8x8 bit radix 8 considering the signed bit as 1 and x as input data of 8-bit ; y as input data of 8-bit and k as the output data , x=11111111 y= 11111111 then k = 1111111111111111.

Parallel Prefix Adders
The parallel prefix adders is another form of carry look ahead adder. The main advantage of the parallel prefix adders is that we can avoid the higher delay problem which we can observe in the existing carry adders [13][14][15]. The prefix adders can be designed in many different ways. Now a days, tree structure kind of adders are used to improve the addition function speed in processors. The parallel prefix adders are also known as Logarithimic delay adders .
The process of addition in parallel prefix adders takes place in 3 stages: Where the Ai and Bi are the inputs which are composed by AND logic.
As the P and G are done in parallel there doesn"t exists any increase in area consumption but the delay parameter will be completely depends on the bit length we considered.


In Intermediate stage the carry propagation and carry generation takes place. As the carry signal uses more than two inputs the increase in delay factor may be observed.


In final computation stage, it computes the summation of given inputs and generates the output. To enhance the performance of the multiplier we need to reduce the number of additions. Booth multiplication is a algorithm that provides fast multiplication by recoding the bits that are multiplied. Where as the normal conventional multiplier uses a large number of partial products. By using the booth algorithm it will decrease the additions required to give the output by using the recoder. So in order to obtain the high performance we have used the booth recoding mechanism and to reduce the number of partial products to n/3, Radix-8 booth multiplier is considered [9][10][11]. By using parallel prefix adders we can also reduce the time delay in multiplication process. Hence we can achieve a high speed, high performance and low power consuming multiplier [12][13][14].

KOGGE STONE ADDER:
• The Kogge stone adder is one of the parallel prefix adder. • the kogge stone adder has best performance in VLSI design implementations.
• It is widely known as parallel prefix adder that performs very fast addition.

BRENT KUNG ADDER:
• Richard P.Brent and H.T.Kung developed Brent Kung Adder in t 1982. It is a parallel prefix adder which gives a minimal number of stages .

•
This adder uses a less number of propagating and generate functions than the other adders.

•
The increase in performance in Brent-Kung adders is because of its tree structure which also leads to lower power consumption as of fewer stages.
The proposed design is to implement and simulate the Radix-8 booth multiplier using 32-bit Kogge Stone adder and 32-bit Brent kung adder[16-18] and compared the results in terms of Power , Area and Delay. The simulation results are carried out on XILINX VIVADO tool.

Experimental Observations:
The experimental observations were carried out on Xilinx Vivado 2016.4 the HDL code was written in Verilog. Below shows the Technology schematic, RTL Schematic and wave forms of Radix-8 booth multiplier using 32-bit Kogge stone adder and Radix-8 Booth multiplier using 32-bit Brent kung adder. The figure 4 represents the simulation result of radix-8 booth multiplier using 32-bit Kogee stone adder. The inputs are { a, b, x} and the output is represented as { s } . If input a and b is given as 1 and 1 then there takes place an addition i.e a+b for the given values it will generate the output as 2 . And assigned the input x value as 10 . The there takes place an multiplication between 2 and 10. hence the end result is finally generated as 20.

1.Results of Radix-8 Booth
ii. RTL Schematic: The Fig. 5 represents the RTL schematic of radix-8 booth multiplier using 32-bit Kogee stone adder, in which it gives the information regarding logic of the design in the form of symbols like adders , multipliers etc.

•
The Fig.6 represents the Technology schematic of radix-8 booth multiplier using 32-bit Kogee stone adder it gives the information regarding the logic of the design when it is targeted to specific device. The Fig.9 represents the Technology schematic of radix-8 booth multiplier using 32-bit Brent Kung adder . It gives the information regarding the logic of the design when it is targeted to specific device.

Performance Comparison
The device utilization summaries represents the number of LUT, slices, muxes, flip-flops, bonded IOB etc., are utilized and available in the design by which we can have a knowledge of how much area in the chip is utilized by our design. The device utilization summary of Radix-8 booth multiplier using 32-bit Koggestone adder can be observed in Table 2 and Radix-8 booth multiplier using 32-bit Brent kung adder can be observed in Table 3.   The table 4 represents comparison oof Radix-8 booth Multiplier using 32-bit Kogge stone adder and Radix-8 booth Multiplier using 32-bit Brent kung adder in terms of LUT, Delay and Power parameters. here, we can clearly observe that the Radix-8 booth multiplier using 32-bit Brent kung adder i.e., Proposed design-2 gives the best optimized results in terms of LUT and delay when compared with the Proposed design-1.  The Table 6 represents the results comparision with existed work in terms of LUT , power and delay. It clearly shows that our Proposed designs has succeeded to give the best optimized results when compared with the already existed work [16] in terms of LUT , Delay and power.

Conclusion:
In this paper, an efficient high speed[23-24] Radix-8 Booth multiplier using 32-bit kogge stone adder and 32bit Brent kung adder is designed successfully using Xilinx Vivado software. By using Radix-8 booth recoding algorithm we have reduced the number of partial products generation by n/3 and also decreased the addition operations by which we have achieved less area consumption and better performance. The parallel prefix adders helped in reducing the delay which is a great advantage. [19][20][21] The 32-bit kogge stone adder and 32-bit Brent Kung adder in our proposed design shown significant decrease in the delay and power consumption. The synthesis report shows that among the kogge stone adder and Brent kung adder the radix-8 booth multiplier using 32-bit Brent kung adder achieved less delay and less area consumption. This work can be extended for higher number of bits.