Resourceful Fast Discrete Hartley Transform to Replace Discrete Fourier Transform with Implementation of DHT Algorithm for VLSI Architecture

Deepti Gautam 1, Anshuman Singh2

1Noida Institute of Engineering & Technology, Greater Noida, India
2Noida Institute of Engineering & Technology, Greater Noida, India

Abstract: Discrete Fourier Transform (DFT) is used in a range of digital signal processing applications, such as signal and image compression systems, filter banks, signal representation or harmonic analysis. The Discrete Hartley Transform (DHT) can be used to effectively replace the DFT when the input sequence is real. This paper introduces and applies the Discrete Hartley Transform (DHT) algorithm, which is well suited to the VLSI architecture. The purpose of using the DHT algorithm is to reduce the complexity of the VLSI architecture. With the introduction of VLSI architectures, 100,000 transistors can be mounted on a single chip. For manufacturers, the verification of these circuits on breadboards is not feasible. Computer-aided design methods have now come into being, but programmers also had to make connections manually at the gate level. Subsequently, Hardware Description Language (HDL) such as Verilog and VHDL came into being. With the integration of the DHT algorithm and its execution, Verilog authors have worked to reduce the complexity of the VLSI circuits in this article. With reduced complexity, the power consumption is reduced and thus the area is therefore reduced. DHT is an algorithm for the Radix-2. It is especially useful for image and signal processing. The DHT algorithm for length N=8 is introduced here.

Keywords: Discrete Hartley Transform (DHT), Discrete Fourier Transform (DFT), DHT Domain Processing.

1. Introduction

Discrete Hartley Transform is a Radix-2 algorithm and is especially useful in DSP applications such as image compression, signal compression, filter banks[1] signal representation and harmonic analysis, etc.[2]. The key goal behind the use of DHT is to reduce the complexity of VLSI hardware architecture [3]. While DHT computing has become an intensive hardware complexity, it is reduced. The explanation behind this is that the forward and inverse of DHT is absolutely the same except for the scaling factor. Or DHT is the opposite of its own. The same hardware will also be used for both forward and inverse measurements. In other terms, we can assume that the same number of adders, multipliers, etc. are used by both forward and inverse implementations.

DHT is somewhat similar to the Discrete Fourier Transform (DFT) or DHT is derived from DFT. The DFT includes both actual and imaginary parts, while the DHT only has real parts. The calculation of imaginary parts is negligible and therefore complex arithmetic is avoided. In other words, DHT is used as a replacement of DFT when the sequence is real [4] [5].

In other terms, we can assume that the same number of adders, multipliers, etc. are used by both forward and inverse implementations.

DHT is somewhat similar to the Discrete Fourier Transform (DFT) or DHT is derived from DFT. The DFT includes both actual and imaginary parts, while the DHT only has real parts.

There are also many split radix algorithms available for computing. These can be implemented with even low arithmetic cost [10]-[11]. The disadvantage with the classical split-radix algorithm is its irregular computational structure.

Delays in VLSI architecture are mainly introduced by multipliers, as they consume a large portion of the chip area. Hence, to implement multipliers, memory-based solutions are being taken into consideration. Implementation of multipliers with look-up table-based solutions, it is required that one operand should be constant [12]. It then becomes possible to store complete result in partial ROM. And then the memory words are reduced from $2^\text{2L}$ to $2^\text{L}$ [1].
With DHT algorithm, VLSI architecture can be implemented based on high parallelism. Moreover, by sharing the multipliers by same constant and using sub-expression sharing technique, hardware complexity gets reduced [13][14].

2. Discrete Hartley Transform

Discrete Hartley Transform is abbreviated for DHT and this transform was proposed by R. V. L. Hartley in 1942. DHT is the analogous to Fast Fourier transform which provides the only real value at any cost. The main difference from the DFT is that it transforms the real inputs to real outputs with no intrinsic involvement of complex value. DFT can be used to compute the DHT, and vice versa.

**Figure 1: DHT Architecture.**

**Forward and Inverse DHT**

Let \( N \geq 4 \) be a power of 2. Since, it’s a Radix-2 algorithm. For any given real input sequence such as \( \{ y(i) \} i = 0,1,2, \ldots N-1 \)

DHT \((N)\) is defined by

\[
Y(k) = \frac{1}{N} \sum_{i=0}^{N-1} y(i) \cos \left( \frac{2\pi kn}{N} \right) \quad (1)
\]

For \( k = 0, 1, 2, 3, 4, \ldots \ldots N-1 \)

Inverse relation is defined as:

\[
Y(n) = \sum_{k=0}^{N-1} X(k) \cos \left( \frac{2\pi kn}{N} \right) + \sin \left( \frac{2\pi kn}{N} \right) \quad (2)
\]

For, \( n=0,1, \ldots \ldots , N-1 \)

Also, \( \text{cas}(y) = \cos(y) + \sin(y) \)

**DHT algorithm for length \( L=8 \)**

Here, algorithm is depicted for length 8.

\[
Y(0) = [(y(0) + y(4)) + y(2) + y(6))] + [(y(1) + y(5)) + y(3) + y(7))]
\]

\[
Y(2) = [(y(0) + y(4)) - y(2) + y(6))] + [(y(1) + y(5)) - y(3) + y(7))]
\]

\[
Y(4) = [(y(0) + y(4)) + y(2) + y(6))] \quad - [(y(1) + y(5)) + y(3) + y(7))]
\]

\[
Y(6) = [(y(0) + y(4)) - y(2) + y(6))] - [(y(1) + y(5)) - (y(3) + y(7))]
\]

\[
Y(1) = [y(0) - y(4)] + [y(2) - y(6)] + c[y(1) - y(5)]
\]

\[
Y(3) = [y(0) - y(4)] - [y(2) - y(6)] + c[y(3) - y(7)]
\]

\[
Y(5) = [y(0) - y(4)] + [y(2) - y(6)] - c[y(1) - y(5)]
\]

\[
Y(7) = [y(0) - y(4)] - [y(2) - y(6)] - c[y(3) - y(7)]
\]

With, \( c = \sqrt{2} \)

It can further reduce the number of multipliers since we are multiplying with the same constant ‘c’. For the same purpose we can use same multiplier.
Therefore, Radix–8 FHT algorithm divides an $N$–point DHT given by (1) into four $N/8$–point DHTs defined by equations below and one $N/2$–point DHT defined by (4). The next step in deriving the algorithm, is by applying further the Radix–4 and Radix–2 FHT algorithms to the odd- and even-indexed samples of (4) respectively. Therefore, we get the following decompositions

i. For even-indexed samples

\[
Y(4k) = \sum_{n=0}^{N/4-1} x(n) + x(n + \frac{N}{2}) + x(n + \frac{N}{2} + \frac{N}{4}) + x(n + \frac{N}{2} + \frac{N}{4} + \frac{N}{8}) (\cos[2k\pi]) \quad \text{………………..(3)}
\]

ii. For odd-indexed samples

\[
Y(8k+2) = \sum_{n=0}^{N/8-1} x(n) + x(n + \frac{N}{8} - n) - x(n + \frac{N}{8} - n + \frac{N}{4}) - x(n + \frac{N}{8} - n + \frac{N}{4} + \frac{N}{8}) (\cos[2k\pi]) \quad \text{………………..(4)}
\]

Finally, the preferred algorithm (radix–2/4/8 FHT) can be obtained by applying radix–2 FHT algorithm into (3), resulting in the remaining decompositions:

\[
Y(8k+6) = \sum_{n=0}^{N/8-1} x(n) + x(n + \frac{N}{8} - n) - x(n + \frac{N}{8} - n + \frac{N}{4}) - x(n + \frac{N}{8} - n + \frac{N}{4} + \frac{N}{8}) (\cos[2k\pi]) \quad \text{………………..(5)}
\]

The Figure 2 shows that 4 point Radix 2 FFT butterfly diagram to compute the DFT points. The radix 2 algorithms are the simplest FFT algorithms. The decimation-in-time (DIT) radix 2 FFT recursively partitions a DFT into two half-length DFTs of the even-indexed and odd-indexed time samples. The outputs of these shorter FFTs are reused to compute many outputs, thus greatly reducing the total computational cost.

\[
Y = \sum_{n=0}^{N-1} x(n) - \sum_{n=0}^{3N-1} x(n) + x(n + \frac{N}{8}) + x(n + \frac{N}{8} + \frac{N}{4}) + x(n + \frac{N}{8} + \frac{N}{4} + \frac{N}{8}) + x(n + \frac{N}{8} + \frac{N}{4} + \frac{N}{8} + \frac{N}{8}) (\cos[8k\pi]) \quad \text{………………..(6)}
\]
Performing the computation in-place using double butterfly with 16-points provides an in-place butterfly of the proposed radix−2/4/8 algorithm.

Therefore, the odd harmonic components are calculated from $u_o$ after some adjustment terms. Evaluating the multiplicative complexity. The above proceed can iterative be applied since that $N$ is a power of two. We arrive then to the block diagram shown in Figure 2 below.

3. Hardware flow diagram

The equations (3) – (7) can be implemented in hardware by considering the flow diagram depicted in Figure 1. The block simply consists of few ADDERs' and two MULTIPLIER blocks for multiplication with 'c'.

![Figure 4: Hardware Flow Diagram.](image-url)
In the above figure 4 the hardware flow diagram of the DHT is shown with the 2N multiplications.

1. **RTL Flow Diagram**
   The Figure 2 shows the synthesis of the block diagram in the form of an RTL, synthesized using XILINX ISE 14.7. All the blocks are simple adders and subtractors.

   ![RTL Flow Diagram](image1)

   Figure 5: RTL Flow Diagram.

   The RTL shown in Figure 5 can also be represented as:

   ![RTL Block Diagram](image2)

   Figure 6: RTL Block Diagram
4. Literature review

Shirali Parsaie et al. [1], Discrete Hartley Transform (DHT) is one of the transformations used to convert data from a time - frequency domain to a fourier transform just by using true values. DHT can be used for highly modular as well as parallel data analysis in VLSI applications. We have proposed a new algorithm for 2N duration DHT computation, where N=3 and 4. We also added a multiplier to update to a simple multiplication used in conventional DHT. This paper provides a comparison between the standard DHT algorithm as well as the proposed DHT algorithm in terms of time and area.

Doru Florin Chiper [2], Efficient algorithms for forward as well as inverse relationship discreet Hartley transforms have been developed over the last 30 years (DHTs). They are similar to the Fast Fourier Transform (FFT) algorithms used to measure the discreet Fourier Transform (DFT). Any of these methods are designed to minimise the complexity of calculations and/or the number of operations. A new approach to estimating the fast Hartley Transform (FHT) radix -2 has been implemented. The proposed algorithm, based on a two-band decomposition of input data, has a very normal form, prevents input or output data from shuffling, requires considerably less multiplication than the existing approaches, but increases the number of additions.

Michela Svaluto Moreolo [3], Real transforms require less computational complexity and much less physical memory than complex transformations. But the discreet fractional transformations of Fourier as well as Hartley are complex transformations. In this paper, we suggest reality-preserving the fractional variants of the discreet transformations of Fourier, Hartley, Generalized Fourier and Generalized Hartley. All the suggested discreet fractional transformations have as many parameters as possible and are therefore very robust. The proposed real discreet fractional transformations have arbitrary proprietary converters and have only two distinct proprietary values 1 and 1. The properties as well as relations of potential real discreet fractional transformations are analysed. In addition, we propose their alternative reality-preserving fractionalizations based on diagonal-like matrices to further boost their flexibility for true, conventionally discreet Hartley and generalised discreet Hartley transformations. Proposed real transition share all the positive properties needed to be discreet fractional transformations. Finally, because the proposed new transformations have random outputs and several parameters, they are all suitable for data protection applications such as image encryption and watermarking.

Chin-Kuo Jao, Syu-Siang Long [4], In this brief, we assume the quantum-dot cellular automation (QCA) of the discreet Hadamard transform (DHT). The study of a full-parallel solution based on an efficient multibit extension to QCA is discussed for the first time. We're seeing that this contributes to a wide region as well as a delay. We then suggest a bit-serial pipelined architecture for QCA-based DHT. The proposed architecture is based on a new one-bit adder-subtractor needing only six majority gates and a feedback lock requiring only one majority gate and minimal wiring. The method results in a reduction in the area-delay-cycle product of 74 per cent and 91 per cent (over a full-parallel solution) for 4 and 8 word-lengths, respectively. The findings of the QCADesigner simulations are also discussed.

Saad Bouguezel, M. Omair Ahmad [5], A modern, very large-scale integration (VLSI) algorithm for the 2Nlength discrete transformation of Hartley (DHT) that can be efficiently implemented on a highly modular and parallel VLSI architecture with a standard structure is introduced. The DHT algorithm can be easily divided into many parallel sections that can be performed at the same time. In addition, the proposed algorithm is well adapted for sub-expression sharing approaches that can be used to greatly minimise the hardware complexity of the extremely parallel VLSI implementation. Using the benefits of the proposed algorithm and the fact that we can easily share multipliers with the same constant, the number of multipliers has been substantially decreased in such a way that the number of multipliers is very small relative to the current algorithms. Or at any time, constant multipliers can be easily applied in VLSI.

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>No. of Symbols</td>
<td>398</td>
<td>350</td>
<td>317</td>
</tr>
<tr>
<td>No. of 4 input LUTs</td>
<td>684</td>
<td>617</td>
<td>593</td>
</tr>
<tr>
<td>No. of bounded LUTs</td>
<td>256</td>
<td>256</td>
<td>256</td>
</tr>
<tr>
<td>Maximum combinational path delay (in ns)</td>
<td>39.931 ns</td>
<td>29.531 ns</td>
<td>37.574 ns</td>
</tr>
</tbody>
</table>

Table 1: Summary of Literature Review.
In the above table 1 the summary of the literature review is given on that basis our proposed work is carried forward.

5. Simulation & results

The comparison of DHT algorithm with existing algorithms is presented in Table 2, this has been observed that architectures based on DHT algorithm used less number of multipliers and adders. Hence, the complexity of VLSI architecture gets reduced.

![Figure 7: Block Diagram of VLSI [11].](image)

In the above figure 7 the VLSI architecture has been shown.

Table 2 Comparison with other Algorithms

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>N</td>
<td>MUL</td>
<td>AD</td>
<td>MUL</td>
<td>AD</td>
</tr>
<tr>
<td>8</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>-</td>
</tr>
</tbody>
</table>

MUL-Multiplier  ADD-Adder
6. Conclusion

In this paper, a DHT algorithm for highly parallel and regular VLSI structure is presented for the length $N = 8$. The power consumption remains a major concern while dealing with VLSI circuits. The application of DHT algorithm leads to reduced hardware complexity, and hence reduced power consumption. The DHT algorithm implements sub-sharing expression technique with sharing of multipliers which have same constants. The DHT algorithm is mainly used in signal processing and image compression applications.

7. Future scope

The new algorithm can enhance the discovery efficiency and coverage. It is useful for service discovery of mobile application.

References

A. Shirali Parsaian and Swapnil Jain, “VHDL Implementation of Discrete Hartley Transform Using Urdhwa Multiplier”, 978-1-4673-9542-7/15/$31.00 ©2015 IEEE.

B. Doru Florin Chiper, Senior Member, IEEE “Radix-2 Fast Algorithm for Computing Discrete Hartley Transform of Type III” IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 59, NO. 5, MAY 2012

D. Chin-Kuo Jao, Syu-Siang Long, and Muh-Tian Shiue, Member, IEEE “DHT-Based OFDM System for Passband Transmission Over Frequency-Selective Channel” IEEE SIGNAL PROCESSING LETTERS, VOL. 17, NO. 8, AUGUST 2010


F. Liang Tao and Hon Keung Kwan, Senior Member, IEEE “Multirate-Based Fast Parallel Algorithms for 2-D DHT-Based Real-Valued Discrete Gabor Transform” IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 7, JULY 2012

G. Pramod Kumar Meher, Senior Member, IEEE, Thambipillai Srikanthan, Senior Member, IEEE, and Jagdish C. Patra, Member, IEEE “Scalable and Modular Memory-Based Systolic Architectures for Discrete Hartley Transform” IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 53, NO. 5, MAY 2006


K. R. Nicole, “Title of paper with only first word capitalized,” J. Name Stand. Abbrev., in press.
