Design and Implementation of Pipelined Floating Point Fast Fourier Transform Processor

Similar documents
A Novel VLSI Based Pipelined Radix-4 Single-Path Delay Commutator (R4SDC) FFT

Floating Point Fast Fourier Transform v2.1. User manual

International Journal of Engineering Research-Online A Peer Reviewed International Journal

International Journal of Advanced Research in Electronics and Communication Engineering (IJARECE) Volume 6, Issue 6, June 2017

Implementation of High Throughput Radix-16 FFT Processor

CHAPTER 7 APPLICATION OF 128 POINT FFT PROCESSOR FOR MIMO OFDM SYSTEMS. Table of Contents

High Speed Reconfigurable FFT Processor Using Urdhava Thriyambakam

An Area Efficient 2D Fourier Transform Architecture for FPGA Implementation

A Novel Architecture for Radix-4 Pipelined FFT Processor using Vedic Mathematics Algorithm

A 128-Point FFT/IFFT Processor for MIMO-OFDM Transceivers a Broader Survey

Implementing Efficient Split-Radix FFTs in FPGAs

THE combination of the multiple-input multiple-output

HB0267 CoreFFT v7.0 Handbook

Advanced Digital Signal Processing Part 4: DFT and FFT

We are IntechOpen, the first native scientific publisher of Open Access books. International authors and editors. Our authors are among the TOP 1%

VLSI implementation of high speed and high resolution FFT algorithm based on radix 2 for DSP application

EFFICIENT ARCHITECTURE FOR PROCESSING OF TWO INDEPENDENT DATA STREAMS USING RADIX-2 FFT

Reconfigurable Computing Lab 02: Seven-Segment Display and Digital Alarm Clock

CRUSH: Cognitive Radio Universal Software Hardware

A Comparative Study of Different FFT Architectures for Software Defined Radio

by Ray Escoffier September 28, 1987

Laboratory Exercise #6

A 0.75 Million Point Fourier Transform Chip for Frequency Sparse Signals

Low Complexity FFT/IFFT Processor Applied for OFDM Transmission System in Wireless Broadband Communication

A Novel RTL Architecture for FPGA Implementation of 32- Point FFT for High-Speed Application

Reconfigurable FPGA-Based FFT Processor for Cognitive Radio Applications

COMPUTER ENGINEERING PROGRAM

Cadence Design of clock/calendar using 240*8 bit RAM using Verilog HDL

An FPGA Spectrum Sensing Accelerator for Cognitive Radio

Methodology of Implementing the Pulse code techniques for Distributed Optical Fiber Sensors by using FPGA: Cyclic Simplex Coding

OpenCL for FPGAs/HPC Case Study in 3D FFT

DIGITAL SECURITY SYSTEM USING FPGA. Akash Rai 1, Akash Shukla 2

Design and Implementation of Real-Time 16-bit Fast Fourier Transform using Numerically Controlled Oscillator and Radix-4 DITFFT Algorithm in VHDL

AN10943 Decoding DTMF tones using M3 DSP library FFT function

A Novel Coefficient Ordering based Low Power Pipelined Radix-4 FFT Processor for Wireless LAN Applications

Design of Humidity Monitoring System Based on Virtual Instrument

A High-Throughput Radix-16 FFT Processor With Parallel and Normal Input/Output Ordering for IEEE c Systems

A Continuous-Flow Mixed-Radix Dynamically-Configurable FFT Processor

COST REDUCTION OF SECTION CHANNEL BY VALUE ENGINEERING

A compression algorithm for field programmable gate arrays in the space environment

CFD Analysis of a 24 Hour Operating Solar Refrigeration Absorption Technology

STUDY OF AUTOMATED HOME SECURITY SYSTEM USING FPGA.

Dynamic Simulation of Double Pipe Heat Exchanger using MATLAB simulink

Performance Evaluation on FFT Software Implementation

Advanced Hardware and Software Technologies for Ultra-long FFT s

AUTOMATIC OIL LEAK DETECTION CONVEYED OVER GSM R.SUNDAR. Assistant Professor, Department of EEE Marine, AMET University, Chennai, Tamil Nadu, India

[Patil* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

Easily Testable and Fault-Tolerant FFT Butterfly Networks

Fin-Fan Control System

Range Dependent Turbulence Characterization by Co-operating Coherent Doppler Lidar with Direct Detection Lidar

Hazardous Gas Detection using ARDUINO

Smart Gas Booking and LPG Leakage Detection System

Microcontroller based design for Tele-sensing and extinguishing fire

Analysis of Evaporative Cooler and Tube in Tube Heat Exchanger in Intercooling of Gas Turbine

Effect of Placement of Footing on Stability of Slope

Numerical Studies On The Performance Of Methanol Based Air To Air Heat Pipe Heat Exchanger

IOT Based Intelligent Bin for Smart Cities

Application Note 35. Fast Fourier Transforms Using the FFT Instruction. Introduction

Security Alarm System

Design and Realization for the Environmental Protection Type of Pig-Farm Model Based on the Sensor Technology

Wearable Carbon Monoxide Warning System Using Wireless Sensors

Construction of Wireless Fire Alarm System Based on ZigBee Technology

Efficient FFT Network Testing and Diagnosis Schemes

Design & Control of an Elevator Control System using PLC

Micro-controller X C C1 C2 AL1 AL2 AL3. Model: PXR4/5/9. Operation Manual SEL PXR. ECNO:406e

A Novel Approach to Garbage Management Using Internet of Things for Smart Cities

ANALYSIS OF OTDR MEASUREMENT DATA WITH WAVELET TRANSFORM. Hüseyin ACAR *

Equipment Required. WaveRunner Zi series Oscilloscope 10:1 High impedance passive probe. 3. Turn off channel 2.

The design of the human body infrared thermometer

LineGuard 2300 Program User Manual (FloBoss 107)

Modeling and Simulation of Axial Fan Using CFD Hemant Kumawat

CONTROLLING POWER PLANT BOILER AND DRAFT FAN USING LABVIEW

Study and Design of Diaphragm Pump Vibration Detection Fault Diagnosis System Based on FFT

Dr. Rita Jain Head of Department of ECE, Lakshmi Narain College of Tech., Bhopal, India

Chapter 1 Introduction

A Novel Algorithm for Intelligent Home Energy Management System

AUTOMATIC STREET LIGHT CONTROL SYSTEM USING MICROCONTROLLER

MODELING OF THE SINGLE COIL, TWIN FAN AIR-CONDITIONING SYSTEM IN ENERGYPLUS

Performance Study of Triple Concentric Pipe Heat Exchanger

The Cell Processor Computing of Tomorrow or Yesterday?

AUTOMATION OF BOILERS USING LABVIEW

A Study on the 2-D Temperature Distribution of the Strip due to Induction Heater

SAFETY PRE-ALARM MONITOR SYSTEM BASED ON GSM AND ARM

RC802/ B 8E1 Modular Fiber-Optic Multiplexer (Rev. M) User Manual

Temperature Control of Heat Exchanger Using Fuzzy Logic Controller

Connections, displays and operating elements C D E G H. Installing the control unit

Performance of Shell and Tube Heat Exchanger under Varied Operating Conditions

Heat Transfer in Evacuated Tubular Solar Collectors

Minimal Electric Charge Detection Device for Perimeter Security Systems

Operating Functions Temperature Parameters

LPG GAS LEAKAGE DETECTION, MONITORING AND CONTROL USING LabVIEW

Design and Development of General Purpose Alarm Generator for Automated System

Characteristics of different sensors used for Distance Measurement

Heat Transfer Enhancement using Herringbone wavy & Smooth Wavy fin Heat Exchanger for Hydraulic Oil Cooling

Adsorption refrigeration system using waste heat

Programming the Cell BE

Design and Development of Industrial Pollution Monitoring System using LabVIEW and GSM

Fuzzy Logic Based Coolant Leak Detection

CALICE: status of a data acquisition system for the ILC calorimeters. Valeria Bartsch, on behalf of CALICE-UK Collaboration

The Development Of High Efficiency Air Conditioner With Two Compressors Of Different Capacities

Transcription:

IJIRST International Journal for Innovative Research in Science & Technology Volume 1 Issue 11 April 2015 ISSN (online): 2349-6010 Design and Implementation of Pipelined Floating Point Fast Fourier Transform Processor Veera Kamat Dhakankar Department of Electronics & Telecommunications Engineering Goa College of Engineering, Farmagudi, Goa, India Prof. Sonia Kuwelkar Department of Electronics & Telecommunications Engineering Goa College of Engineering, Farmagudi, Goa, India Abstract There are several methodologies and techniques that already offer hardware and software solutions for computing Fast Fourier Transform (FFT), which have advantages for specific applications. These solutions are developed for running in several platforms, such as GPU, DSP, FPGA and ASIC and they are usually described in C/C++ language or HDL. Implementation intended for reconfigurable logic is usually described in HDL, such as VHDL or Verilog. Different FFT algorithms have been proposed to exploit certain signal properties to improve the trade-off between computation time and hardware requirements. For years the common practice for implementing FFTs has been to run them on a digital signal processor (DSP). But as FPGAs have evolved and have begun to accommodate function specific computing cores, FPGAs are beginning to displace DSP as the optimum FFT solution. In this project the implementation of FFT radix 2 algorithm design of floating point FFT in FPGA is discussed. The VHDL Design was tested using Modelsim measure its performance in term of speed and scalability and also the processor is synthesized on an FPGA to measure the performance parameters such as maximum operating frequency, area and power consumption. Synthesis tool is Xilinx ISE Design Suit 10.1 and FPGA is Sparten III. Keywords: FFT processor, Pipelined processor I. INTRODUCTION FFT is efficient algorithm to calculate the DFT. FFT algorithms use the symmetry and periodicity properties of the DFT recursively divide the calculation. Radix is the size of FFT decomposition. Twiddle factors are the coefficients used to combine results from a previous stage to form inputs to the next stage. The number of steps depends on the number of inputs into the algorithm. In1965 when Cooley and Tukey re-developed the first well known version in a Mathematical Computing paper [1]. Algorithm Computation of DFT/FFT of a time domain digital signal x(n) results in converting it into a frequency domain signal. Analysis and processing of a discrete signal in frequency domain is more efficient than its analysis in time domain. The FFT algorithm was first developed and presented by Cooley and Tukey in [1]. It was developed in order to reduce number of complex multiplications additions in DFT. An N-point DFT is given by eq.(1), According to eq.(1) DFT computations require N 2 -N complex additions and N2 complex multiplications. An N point FFT equation is given by eq(2), ( ) According to eq.(2) the number of multiplications and additions are reduced to (N/2)log2(N) and (N)log2(N) respectively. Preferring FFT over DFT for hardware implementation means increased speed, reduced power consumption, reduced area and reduced cost. In DIT algorithm, inputs are in bit reversed order and the outputs are in natural order. Whereas in DIF algorithm, the inputs are in natural order and the outputs are in bit reversed order. The DIT algorithm provides better signal-to-noise ratio when compared to DIF algorithm for a finite word length. Based on size of FFT decimations, the algorithm can be radix-2, radix-4, radix-8 or split-radix type. In radix-2 algorithm FFT size is a power of two, radix-4 FFT size is a power four while radix-8 FFT size is power of eight. And split-radix type involves mix of any of the specified radix combinations. In radix-2 DIT FFT algorithm can be depicted as a butterfly diagram as shown in figure 1. Considering N-point FFT, there are log2n no of stages and each stage requires N/2 butterfly operations. The butterfly operation is pictorially described as shown in figure 2. Butterfly operation can be given in equation as: X = P + W Q and Y = P W Q Where, P, Q : Complex inputs and W : Complex input Twiddle Factor X, Y : Complex output values. All rights reserved by www.ijirst.org 512

II. LITERATURE REVIEW FFT processors can be divided into three main classes, Simple FFT has simple structure, but requires large memory and power consumption is high. Pipelined FFTs utilize concurrent processing of different stages to achieve high throughput. In fully parallel FFTs the implementation is hardware intensive and is not practical for large FFTs. Used for real time operations. Different types of architectures proposed to design FFT processors. Each architecture has its own limitations and benefits and are proposed by keeping in mind its potential application where it is going to be used. Single Memory Architecture is the simplest FFT system, in which the FFT processor reads from a memory and writes back to the same location. The memory is N times the data size capacity and the entire data is written once for every Log2N stages of the FFT. This kind of design has the minimum requirement of hardware resulting in high computation time. Some FFT processors use two memories, one for input and the other for output. Thus while data processing is carried out using one memory, the processed data fed into the other memory. In ping pong memory design, the first stage data is read from the memory one and after computation the data is written in memory two. At the next stage the data is read from memory two and written into memory one. Ref [1] : J. Cooley and Tukey presented the FFT algorithm in their paper in 1965, which gives the basic derivation. In particular research is focused on FFT algorithm and finds efficient hardware solutions based on Specific wireless standards [6], Execution speed [9], Area [7], Power efficiency, Hardware complexity [10], Flexibility and precision. Ref [6], [7], [9] : all presented FFT processor with two butterfly units and a two port block RAM i.e. using pipelined architecture Ref [9] : proposed a reconfigurable FFT with a butterfly unit having three multipliers and carry look ahead adder to increase the speed but had to compensate with increase in area consumption.[10] used a generalized conflict free memory addressing scheme for continuous addressing for parallel memory based architecture but for limited number of inputs. Majority of the work focus on a specific standard in order to meet various standards a scalable FFT is a need of an Hour. To design balanced a scalable FFT processor which can be scaled according to the application and meets the requirement of multiple wireless standards. An attempt is made to find reasonable balance between low area, low cost, high speed flexibility and scalability. The IEEE Standard Format for Floating Point Data : IEEE -754 floating point standard is used. The target device selected is Spartan III FPGA used in VDO PCI board. Spartan III PCI Board (MXS3FA-PCI-FG900). III. DESIGN OF FFT PIPELINED PROCESSOR By considering all the architectures proposed by many people, we have proposed a novel architecture which combines the advantages of dual memory architecture and pipelined architecture. The data flow of the pipelined dual memory The data flow of the pipelined dual memory architecture is shown in figure 3. The block diagram of one stage of FFT processor is shown in figure 4. There are three main processes controlling this design. Loading Twiddle Coefficients into the internal memory of FPGA. Loading Data in internal memory. Computation of FFT. Fig. 1: 8 Point DIT with Bit Reversed Inputs Fig. 2: Single Butterfly Unit On the initial reset of the system the Twiddle coefficients need to be stored in the Twiddle Ram and simultaneously, the incoming data samples are stored in data memory M0. The FFT Processor begins computation simultaneously by accessing the All rights reserved by www.ijirst.org 513

data from M1 which is initialized to Zeros. Once all the data has been stored in M0. FFT processor must have accessed all the location of M1. Now it shifts to M0 and accesses M0. At this time all odd stages begin computation with Memory M0 while all even stages begin computation with memory M1. It takes 10 clock cycles after reset for the first processed data to be available at the output of the first stage. The corresponding address of the data is also available at the output of the stage. Also stage done signal is asserted which is used by the next stage to synchronize the storing of input data which is nothing but the output from the previous stage. Address Generator block generates the required addresses considering the butterfly structure for each stage for the memory to be accessed and also the address for the output sample where it is to be stored in the next stage s memory. While the FFT computation is being carried out the new sets of data are being loaded in the alternate memories of the first stage and this process repeats till FFT of a set of data points are calculated. The 1024 point FFT processor is shown in figure 5. The data samples are accepted in sequence and they are given to the Bit reversal module. The bit reversal module reverses the address and stores these samples in memory M0 and similarly in memory M1. These samples are then accessed by Stage1 in sequential manner. Once the first FFT output is ready at the FFT out the stage1 done signal is generated for synchronization with the next stage2. Similarly all stages generates corresponding Stage done signal to synchronize the full process. Fig. 3: Dual Memory Pipelined Data A. Design of Building Blocks of FFT: The main component of the FFT is the Butterfly, which carries out floating point computations and the address generating unit. The detailed design of these units is described in subsequent sections. B. Pipelined Floating Point Butterfly Unit: The butterfly is the basic computational block with which FFTs are calculated. The butterfly is given by Equations (1) and (2). One butterfly operation requires four real multiplications and six real additions. The design of pipelined butterfly unit as shown in figure 6 has only two multipliers and two adder/subtractors as compared to four multiplier and six adders required for parallel operation. The Complex multiplication of two numbers (Re1+Img1) (Re2+Img2) can be broken down into real and imaginary parts as follows, Real Part = (Re1 Re2)-(Img1 Img2) and Imaginary Part = (Re1 Img2)+(img1 Re2) Fig. 4: Single Stage of Pipelined FFT Butterfly design The clockwise functioning of the pipelined complex butterfly can be explained as follows:- All rights reserved by www.ijirst.org 514

1) Clock1: Twiddle real (real part of Twiddle factor) is selected by the multiplexer0 and Tw_img is selected by multiplexer1 and applied to the two Multipliers along with In2_ real and In2_ img respectively. 2) Clock2: Multipliers provide outputs for (In2_ real Tw_ real) and (In2_ img Tw_ img) and these outputs are stored in R1 and R2 registers. Also new set of values i.e. Tw_ img and In2_ real are given to multiplier1 and In2_ img and Tw_ real are given to multiplier2 through multiplxers. 3) Clock3: R1 and R2 are given to the subtractor since we need to subtract (In2_real Tw_real) and (In2_img Tw_ img) 4) Clock4: output of the subtractor is stored in R3 also new set of inputs from R1 and R2 i.e. (In2_real Tw_img) and (In2_ img Tw_real) are given to the adder. 5) Clock5: Result of the adder is stored in R4 and output of R3 is selected through multiplexer and given to the Adder to add with IN1 coming from delay pipe. 6) Clock6: IN2 real goes through a delay of 4 clock cycles and is given as input to the Adder along with output of R3. 7) Clock7: Result of this addition is stored in R5 also In2_img and output from R4 is applied again to the adder. 8) Clock8: Result of this addition is stored in R7. Also the other set of outputs are stored in R6 and R8 (the second set of inputs IN1 and IN2). 9) Clock9: The values in R5(Real) and R6(Img) are combined to form OUT1. Clock8: Result of this addition is stored in R7. Also the other set of outputs are stored in R6 and R8 (the second set of inputs IN1 and IN2). 10) Clock 10: The values in R7(Real) and R8(Img) are combined to form OUT2. C. Pipelined Floating Point Multiplier Unit: The multiplier has been designed as a three stage pipeline. The pipelined multiplier carries out one multiplication at every clock and has a latency of 3 clocks. The basic operation involves multiplying the mantissa of both the numbers and adding their exponent values. The Exception handler checks for the following exceptions, Infinity, NaN, Zero. Any number greater than 1.0 2128 is set as infinity and the overflow bit is set. Any multiplication by a NaN or infinity results in the output being to set to infinity. Any multiplication by a zero results in a zero. And any number less than less than 1.0 2 127 results in the output being set to zero. Fig. 5: Full design of FFT pipelined processor D. Pipelined Floating Point Adder Unit: The adder designed is also a three stage pipeline and carries out one addition every clock with a latency of 3 clocks. The floating point addition operation can be explained in the following steps, The exception handler checks for exceptions similar to those in floating point multiplier design. E. Address Generator: The address generator unit is one of the important important blocks of FFT design. It has to be synchronized perfectly with the butterfly operation. For each butterfly operation, we have to read a pair of data inputs from the memory and after the required computation, the results have to be written back to the same memory locations. Each butterfly operation requires a particular twiddle factor for its computation, so we need to generate the addresses of input data as well as corresponding twiddle addresses. The address generating unit should be able to generate addresses for any FFT size in the radix 2 algorithm and generate different addresses for different stages. The address generator designed is for 1024 point FFT and generates all the required addresses of the Radix 2 algorithm. The IN2 address is generated first and IN1 address on the next clock. The Twiddle address is available for the two clock periods. F. Butterfly Operation: For each butterfly operation we have to read a pair of data inputs from the memory and after the required computation, the results have to be written back to the same memory locations. Each butterfly operation requires a particular twiddle factor for its computation, so we need to generate the addresses of input data as well as corresponding twiddle addresses. The address generating unit should be able to generate addresses for any FFT size in the radix2 algorithm and generate different addresses for different stages. The address generator designed is for 1024 point FFT and generates all the required addresses of the Radix 2 All rights reserved by www.ijirst.org 515

algorithm. The IN2 address is generated first and IN1 address on the next clock. The Twiddle address is available for the two clock periods. Fig. 6: Single Stage Pipelined Butterfly[9] IV. RESULT AND EVALUATION A. Simulation: The FFT processor was simulated before synthesizing it on an FPGA device. Simulation was carried out in order to verify the functionality of the processor and to validate the results. B. Pre-Simulation: Before simulating FFT processor necessary environment need to be set up. Simulation of FFT processor operation requires RAM and ROM inputs. RAM and ROM data are provided in text files as inputs to simulation. RAM stores data samples whereas ROM stores twiddle factors required for FFT computation. For simulation, the tool used was ModelSim SE 5.7 from Mentor Graphics Inc. Load the design. After compilation, design file was created with the name of top level entity inside the work directory. Add ports and signals to wave window. To observe the state and values of various ports and signals, they were loaded onto the wave window. Synthesis Power Analysis For synthesis Xilinx ISE Design Suit 10.1 is used. C. Synthesis and Place and Route Timing Report: The Single Precision (32 bit IEEE 754 format) FFT design was synthesized for various data sizes. The device selected was Xilinx Spartan III. The synthesis and timing report is given in Table. Table Synthesis and Timing report No. of Points Min Period(ns) Max Freq (MHz) No of Slices Slice FF 256 27 37.03 19930 19930 512 29 34.48 22452 22452 1024 29 34.48 25059 14458 V. CONCLUSION A radix 1024 N-point novel FFT processor architecture based on a reasonable balance between performance, power, area parameters are proposed. The FFT processor was designed, implemented using VHDL, simulated using ModelSim and synthesized on Spartan III (FPGA device MXS3FA-PCI-FG900). The processor architecture can be adopted in number of devices where a reasonable balance between specific parameters is required. REFERENCES [1] J.W.Cooley and J.W.Tukey, An algorithm for the machine calculation of the complex fourier series, Mathematics of Computation, vol. 19 no. 90, pp. 297 301, 1965. [2] Architectures, Modeling, and Simulation (SAMOS XIII), 2013 International Conference on, Date of Conference: 15-18 July 2013 Page(s): 19 27, 2013.. [3] S. Shalivahanan, A Vallavraj, C. Gnanapriya, Digital Signal Processing, Tata McGraw-Hill Publishing Co Ltd. [4] S. Palani, Anoop K. Jairath, Signals and Syatem, Ane Books Pvt, Ltd. [5] Z H Derafshi, J F Hamed Taghipour, A High Speed FPGA Implementation of a 1024-Point Complex FFT Processor, 978-0-7695-4042-9/10, 2010 IEEE. [6] Q Zhang, N Meng, A Low Area Pipelined FFT Processor for OFDM Based Systems, 978-1-4244-3693-4/09/, 2009 IEEE. [7] S Huang, S.Chen, A High-Throughput Radix-16 FFT Processor with Parallel and Normal Input/Output Ordering for IEEE 802.15.3c Systems, IEEE Transaction on circuit and systems, VOL. 59, NO. 8, Aug2012. [8] F Liu, X Pan, Research on implementation of FFT based on FPGA, Published in: ICCASM, (Volume:7 ), Page(s): V7-152-155, 2010 [9] J. Takala and K. Punkka, Butterfly unit supporting radix-4 and radix-2 FFT, in Proc. Int. Workshop Spectral Methods and Multirate Signal Process., Riga, Latvia, June 20-22 2005, pp. 47 53. [10] Spartan III Platform FPGAs: Introduction and Overview. (http://www.xilinx.com/partinfo) All rights reserved by www.ijirst.org 516