Performance Evaluation on FFT Software Implementation

Similar documents
Research Article Reconfigurable and Reusable Mutual Module Based Parameterization Approach for DWT and FFT Algorithm

Advanced Digital Signal Processing Part 4: DFT and FFT

Design and Implementation of Pipelined Floating Point Fast Fourier Transform Processor

International Journal of Engineering Research-Online A Peer Reviewed International Journal

A Novel VLSI Based Pipelined Radix-4 Single-Path Delay Commutator (R4SDC) FFT

An Area Efficient 2D Fourier Transform Architecture for FPGA Implementation

International Journal of Advanced Research in Electronics and Communication Engineering (IJARECE) Volume 6, Issue 6, June 2017

Extension of a Virtual Refrigerant Charge Sensor

Scheduling Smart Home Appliances Using Mixed Integer Linear Programming

Implementation of High Throughput Radix-16 FFT Processor

High Speed Reconfigurable FFT Processor Using Urdhava Thriyambakam

Low Complexity FFT/IFFT Processor Applied for OFDM Transmission System in Wireless Broadband Communication

Drying Dynamics of Ulva Prolifera with the Heat Pump Microwave Method

EFFICIENT ARCHITECTURE FOR PROCESSING OF TWO INDEPENDENT DATA STREAMS USING RADIX-2 FFT

CHAPTER 4 EXPERIMENTAL STUDIES

Technical Description

Technical Description

SIMULATION STUDY ON HEAT TRANSFER RATE ON AIR CONDITIONING CONDENSER BY VARYING MATERIAL PROPERTIES AND ORIENTATION

Technical Description

A Comparative Study of Different FFT Architectures for Software Defined Radio

AN10943 Decoding DTMF tones using M3 DSP library FFT function

THE combination of the multiple-input multiple-output

Implementing Efficient Split-Radix FFTs in FPGAs

PRODUCT INFORMATION VICOTEC320 AIR QUALITY TUNNEL SENSORS TO CONTROL VENTILATION IN ROAD TUNNELS. Tunnel sensors

Floating Point Fast Fourier Transform v2.1. User manual

PRODUCT PORTFOLIO OVERVIEW SENSOR SOLUTIONS: FOR MORE EFFICIENT AUTOMATION. Sensors, systems, and services

We are IntechOpen, the first native scientific publisher of Open Access books. International authors and editors. Our authors are among the TOP 1%

A 128-Point FFT/IFFT Processor for MIMO-OFDM Transceivers a Broader Survey

CHAPTER 7 APPLICATION OF 128 POINT FFT PROCESSOR FOR MIMO OFDM SYSTEMS. Table of Contents

IJSRD - International Journal for Scientific Research & Development Vol. 3, Issue 03, 2015 ISSN (online):

A Novel Architecture for Radix-4 Pipelined FFT Processor using Vedic Mathematics Algorithm

Laboratory Exercise #6

REPRINT ZKG PROCESS PROCESS

exsilentia Integrated Safety Lifecycle Tool

Application Note 35. Fast Fourier Transforms Using the FFT Instruction. Introduction

ENGINEERING STANDARD FOR FIRE-FIGHTING SPRINKLER SYSTEMS ORIGINAL EDITION. Nov. 1993

COST REDUCTION OF SECTION CHANNEL BY VALUE ENGINEERING

Eagle PIR-018 / PIR-045 / PIR-100 Outdoor Passive Infrared Detectors

Easter Lily Leaf Unfolding Technique Royal Heins, Department of Horticulture, Michigan State University

PLANNING REPORT PROPOSED RESIDENTIAL PLAN OF SUBDIVISION

Ministry of Agriculture and natural resources Natural Resource Management Directorate. A Field Guideline on Bench Terrace Design and Construction

PROCEEDINGS SOLAR 95

Precision Air Conditioners

Numerical Study on the Performance Characteristics of a Rotary Regenerator according to Design Parameters

Indoor Dimensions units Ducted Type LG Electronics 6.2 Outdoor units Ducted Type CALOR SRL LG Electronics

MHI Centrifugal Chiller

Advanced Hardware and Software Technologies for Ultra-long FFT s

Performance Comparison of Ejector Expansion Refrigeration Cycle with Throttled Expansion Cycle Using R-170 as Refrigerant

ANALYSIS OF A BUBBLE PUMP DRIVEN ABSORPTION REFRIGERATION SYSTEM (EINSTEIN-SZILARD REFRIGERATOR VARIANT)

Design and Implementation of Real-Time 16-bit Fast Fourier Transform using Numerically Controlled Oscillator and Radix-4 DITFFT Algorithm in VHDL

Effect Of Special Organic Fertilizer For Phyllostachys Praecox On Soil Basic Properties Under Intensive Cultivation Management

IN THE MATTER of the Resource Management Act AND applications for resource consent under Part 6

DUAL TECHNOLOGY OCCUPANCY/VACANCY SENSOR SWITCH (WS XX DT XX/WS OS DTDR XX)

EXPERIMENTAL ANALYSIS OF SOLAR AIR DRYER FOR AGRICULTURAL PRODUCTS

A Novel Coefficient Ordering based Low Power Pipelined Radix-4 FFT Processor for Wireless LAN Applications

A Continuous-Flow Mixed-Radix Dynamically-Configurable FFT Processor

Easily Testable and Fault-Tolerant FFT Butterfly Networks

VLSI implementation of high speed and high resolution FFT algorithm based on radix 2 for DSP application

Improving Control of Dual-Duct Single-Fan Variable Air Volume Systems

10TH ANNUAL GREENSBORO RUN/WALK FOR AUTISM SATURDAY, SEPTEMBER 29, 2018 GREENSBORO JAYCEE PARK 9 AM

Infrared sensor, Pyroelectric coefficient, electrical conductivity, Percolation threshold.

City of Tamarac Parks, Recreation, and Social Services Master Plan

THE INTELLIGENT CHOICE IN COMFORT SINGLE & MULTI TYPE AIR CONDITIONERS

Warning. Specifications, designs and other content appearing in this brochure are current as of May 2013 but subject to change without notice.

RECONSTRUCTION PLAN OF OTSUCHI TOWN, KAMIHEI COUNTY, IWATE PREFECTURE

A tool for professionals.

R10 IV B.Tech I Semester Supplementary Examinations, Feb/Mar REFRIGERATION & AIR CONDITIONING (Mechanical Engineering)

high performance great savings

City Hall 455 North Rexford Drive 4th Floor Conference Room B Beverly Hills, CA Tuesday, December11, :15 PM AGENDA

SAFETY CONSIDERATIONS

HB0267 CoreFFT v7.0 Handbook

A High-Throughput Radix-16 FFT Processor With Parallel and Normal Input/Output Ordering for IEEE c Systems

Pneumatic Seed Drill DL. The Inventors of Pneumatic Seeding Technology

Dry. The vacuum kiln. V-Basic V-Comfort V-Premium. energy saving efficient

P4490 Kill A Watt Edge Operation Manual

Do you want a versatile solution to make significant savings? AIR-TO-WATER HEAT PUMP

A compression algorithm for field programmable gate arrays in the space environment

Reconfigurable FPGA-Based FFT Processor for Cognitive Radio Applications

Efficient FFT Network Testing and Diagnosis Schemes

THE GREAT BRITISH BOILER. Visit Call Get advice, latest prices and help on choosing the right boiler for you.

PROCEEDINGS OF THE CORNELIUS LANCZOS INTERNATIONAL CENTENARY CONFERENCE

RECYCLING\CONTAINER SITE ATTENDANT MANUAL

Product overview 2-3. Technology of Inverter 4-5. Inverter Wall Mounted 6-7. Inverter Ceiling Cassette E Series 8-9

2017 TRIANGLE RUN/WALK AUTISM FOR SATURDAY, OCTOBER 14, 2017 DOWNTOWN RALEIGH HALIFAX MALL 9 AM

Heyam Daod. Keywords Fine-grained soils, liquid limit, microwave drying, moisture content

PCSHK1549. Cooling only -50Hz- Heat pump -50Hz-

SERVICE MANUAL. Commercial Air Conditioning. MULTI upgrade series

M.A.M.I. Silentron. Outdoor Wireless Dual Technology Intrusion Detector

The Use of Fuzzy Spaces in Signal Detection

Dynamic Simulation of Double Pipe Heat Exchanger using MATLAB simulink

HEATINGSERIES POWERFUL SELECTION ZUBADAN SERIES MSZ-FD VABH SERIES

Brief review of dust explosion test methodologies peaks and pi8alls

The Mode of Urban Renewal Base on the Smart City Theory under the Background of New Urbanization

Development of a Compact Vacuum Freeze Drying for Jelly Fish (Schypomedusae)

THERE is significant research activity to minimize energy

Fin-Fan Control System

Presenting The Fundamentals Parts One - Five

TEST REPORT #68. Chiao M. Lee Paul Laurentius. Hussmann Corporation St. Charles Rock Road Bridgeton, MO United States of America

Installation Manual. Power Plus Box THERMAL MODULES CONDENSING. Installation Manual

Daikin air conditioners for small rooms in multi application

Transcription:

Proceedings of the International MultiConference of Engineers and Coputer Scientists 9 Vol II IMECS 9, March 8 -, 9, Hong Kong Perforance Evaluation on FFT Software Ipleentation Xiangyang Liu, Xiaoyu Song, and Yuke ang Abstract It is known that FFT algoriths have coplexity of O(nlog n), where n is the input size. Many new algoriths clai certain theoretical advantage; however, their real perforance in application is questionable. The paper presents a systeatic perforance evaluation on different FFT software ipleentations. Different code techniques such as recursive, iterative, TFBBGM, TFRM with further expansion are used for real application sizes fro 5 to 8 points. Contrary to the coon belief that recursive progras are slower, we find that recursive progras are not necessarily slower for coonly used FFT. Our coparative study constitutes the first attept to evaluate the real perforance of different FFT approaches. Index Ters FFT, DFT. I. ITRODUCTIO In digital signal processing, the discrete Fourier transfor (DFT) plays an iportant role in the analysis, design and ipleentation of discrete-tie signal-processing algoriths and systes [, ]. The fast Fourier transfors (FFT) are efficient algoriths to copute DFT. FFT is used widely in digital signal processing fields. Its perforance is critical for any real-tie applications. The FFT algoriths are based on the principle of decoposing the coputation of DFT into sequences of saller DFTs. The first FFT algorith was discovered by Guass in the 8th century and rediscovered by Cooley and Tukey [3] in 96s. Significant advances include higher radix FFT algoriths [], ixed-radix FFT algoriths [5], split-radix FFT algoriths [6][7], recursive FFT algorith [8], and the deciation-in-tie (DIT) and the deciation-in-frequency (DIF) FFT algoriths [9]. Most of these algoriths illustrate FFT with siilar FFT diagras, which are evolved fro the nature of the FFT algoriths and constructed by basic butterfly structures, such as the 8-point radix- FFT diagra shown in Figure. FFT algoriths can be ipleented on ultiple platfors. For exaple, FFT algoriths have been ipleented on application specific integrated circuits (ASIC) as FFT processors [] for high-speed or low power hardware design. However, FFT algoriths designed in hardware processor are often tailored to specific application, hence is not flexible. FFT has also been ipleented in software on general- Xiangyang Liu is with the Departent of Coputer Science, University of Texas at Dallas, Richardson TX, 758, USA (e-ail: xxl63@ utdallas.edu). Xiaoyu Song, is with Departent of Electrical and Coputer Engineering, Portland State University, P.O.Box 75 Portland, OR 977-75, USA (e-ail: song@ee.pdx.edu). Yuke ang is with the Departent of Coputer Science, University of Texas at Dallas, Richardson, TX, 758, USA (e-ail: yuke@utdallas.edu). Figure. The 8-pt radix- DIT FFT diagra. purpose processors as building block of siulation data processing systes []. Software-based ipleentations of FFT on general processors are less cost and flexible, but they are typically slower than hardware on coparable technologies. Digital signal processors (DSPs) are specific processors optiized for various signal-processing applications such as FIR, IIR filters and FFT. Software ipleentations of FFTs on DSPs are getting popular for their excellent tradeoff aong cost, perforance, flexibility, and ipleentation coplexity. It is known that FFT algoriths have coplexity of O(nlog n), where n is the input size. Many new algoriths clai soe advantage in ters of a constant iproveent; however, their real perforances are unknown. The paper presents a systeatic and synergic study on the efficiency of different ipleentations of FFT prograing. In particular, we propose different ways to progra FFT. Different code techniques such as recursive, iterative, TFBBGM [], TFRM [3] with further expansion are explored. An extensive experient is conducted for input size fro 5 to 8 points. Soe iportant findings are obtained on FFT codes on existing ajor DSP architectures. Further anual tuning optiizations are possible. Contrary to the coon belief that recursive progra is slower, we find that recursive progras are not necessarily slower for coonly used FFT. Instead its perforances are deterined by any other factors. Our coparative study constitutes the first attept to understand the real perforance evaluation of different approaches. The paper is organized as follows. In Section II, we give the preliinaries of DIF/DIT FFT algoriths and code techniques such as TFRM, TFBBGM, etc. Section III describes the ipleentations of twenty FFT codes. Experient results are shown in Section IV. Section V concludes the paper. II. PRELIMIARIES e first present the basic ideas of DIT FFT and DIF FFT. Then we describe two code structures: iterative and recursive codes. Two ethods of TFRM and TFBBGM are introduced to reduce the nuber of eory references due to twiddle factor. ISB: 978-988-7-7-5 IMECS 9

Proceedings of the International MultiConference of Engineers and Coputer Scientists 9 Vol II IMECS 9, March 8 -, 9, Hong Kong A. DIT and DIF FFT The DFT of discrete signal can be directly coputed = nk X [ k ] = x[ n] as n, k =,,..., () where nk j( π / ) nk = e, and X[k] are sequences of coplex nubers, and j = -. The basic ideas of DIT and DIF FFT algoriths are to decopose the input sequence and output sequence X[n] of () into saller sequences. E.g. the radix- DIT and DIF FFT algoriths are obtained by splitting the input sequence and output sequence X[n] into odd and even indexed eleents. Figures (b) and (b) show the coputation diagras of the DIT and DIF algoriths, respectively. Figure. The 8-pt radix- DIF FFT diagra The butterflies are coputed according to the index order of the stages and groups by partitioning the radix- DIT and DIF FFT diagras. ithin the sae group, the butterflies are coputed fro top to botto. Figure 3(a) shows the iterative C code ipleentation of n-points radix- DIF FFT algorith taken fro TI s DSP library [], where n is given as an input paraeter to the C code. Figure 3(b) shows the corresponding iterative C code ipleentation of n-points radix- DIT FFT algorith. Figure 3. The C code of radix- DIF FFT and radix- DIT FFT. The C code in Figure 3 shows a three-loop iterative structure: ) the outerost loop, the k-loop, counts the stages, loops for log ties; ) the second loop, the j-loop, counts the groups within each stage and decides which twiddle factor to load; 3) the innerost loop, the i-loop counts the nuber of butterflies within each group. Variables k and j indicate the stages and group nuber, respectively. Variables i and l indicate the upper and lower input indexes of the butterfly coputed by the innerost loop, respectively. Variable ia indicates the index of the twiddle factor to be loaded. B. Recursive and Iterative Code Structures Recursion plays an elegant role in solving probles in design and analysis of coputer algoriths and coplexity theory [5]. A coplex proble can be decoposed into saller probles of the sae structure. Figure (a) shows the recursive code of factorial function. It is only the ultiplication process that deterines the code coplexity. Hence, the coplexity of the original proble can be decreased. Figure. Exaple of recursive code and iterative code. As Figure (a) illustrates, the recursive code structure involves function call inside the sae function, thus it needs eory stack operation to fulfill this task. Due to the expensiveness of eory operation in clock cycles, the recursive code structure also increases the nuber of clock cycles to soe extent. Iterative code structure is the coon structure in which the sae phase of code is executed ultiple ties. Figure (b) shows the iterative code of factorial function. It will not incur eory stack operation due to function call within the sae function, hence it requires fewer clock cycles than recursive code. However, the iterative structure is ore coplex than recursive structure in code size, which also akes it require ore clock cycles to soe extent. e explore the overall perforance of these two code structures by perforing thorough experient on various iterative and recursive FFT codes. C. TFBBGM The TFBBGM (twiddle-factor-based butterfly grouping ethod) groups the butterflies in the radix- FFT diagra according to the twiddle factor. Each twiddle factor is loaded only once in the coputation order, thus the nuber of redundant eory references due to twiddle factor in conventional radix- DIF FFT algorith can be reduced. Since there are log twiddle factors for a -points radix- DIF FFT algorith, the coputation requires only log steps. Fro Figure (b), we have soe iportant observations. There are / different twiddle factors in the first stage of radix- -points DIF FFT diagra, expressed as, where =,,, 3,, /-. The twiddle factors of odd aong / twiddle factors in the first stage do not occur in later stages. At any stage s, the twiddle factor for any butterfly s n od s s nod s is, so it is clear that will not be odd when s is greater than. Thus, butterflies of twiddle factor with =, 3, 5,, /- can be grouped and thus / butterflies are coputed in the first stage of the TFBBGM. In the s-th stage, except those butterflies coputed in the first stage, butterflies with twiddle factors that do not occur after Stage s of radix- -points DIF FFT diagra as in Figure (b) will be coputed. There are / butterflies in Stage s, /8 butterflies in Stage s-, /6 butterflies in Stage s-,, and / s+ butterflies in Stage. Twiddle factors of the corresponding butterflies are, where = s-, 3 s-, 5 s-,, (/ s- ) s-. Particularly, when s is, / butterflies in Stage and /8 butterflies in Stage of radix- DIF FFT diagra in Figure ISB: 978-988-7-7-5 IMECS 9

Proceedings of the International MultiConference of Engineers and Coputer Scientists 9 Vol II IMECS 9, March 8 -, 9, Hong Kong (b) will be coputed in stage of the new ethod. Twiddle factors of these butterflies are where =, 6,,, (/)-. The last stage coputes totally - butterflies with twiddle factor, together in the radix- -point DIF FFT diagra. By using these new stages, the new ethod loads each twiddle factor only once in the coputation. e redraw the coputation diagra of radix- DIF FFT as shown in Figure (b) into Figure 5. Fro the new diagra, it is easy to see that totally - butterflies with twiddle factor = will be coputed without ultiplication, which is conducive to reduce the nuber of clock cycles. Figure 5. The radix- DIF FFT diagra with TFBBGM. Siilarly, TFBBGM can also be applied to radix- DIT FFT with TFBBGM. But due to the difference between radix- DIT FFT and radix- DIF FFT, butterflies with twiddle factor 6 need to be grouped and coputed before butterflies with other twiddle factors are grouped and coputed. In the later step s, TFBBGM groups and coputes butterflies with twiddle factor, where = / s, 3 / s, 5 / s,, ( s- -) / s. Figure 6 shows the radix- DIT FFT diagra redrawn by grouping the butterflies with identical twiddle factors. Figure 6. The 8-pts radix- DIT FFT diagra with TFBBGM. D. TFRM Based on the coplex properties of the twiddle factor, TFRM (Twiddle factor Reduce Method) can reduce the 3 8 nuber of twiddle factor to be referenced. For exaple, can be replaced by j 8 in Figure and Figure by using the property of the coplex nuber, here is the derivation procedure: 3 j (π / 8) 3 j (π / 8) j (π / 8) 8 = e = e e = j 8 The siilar derivation can be applied to 8. Hence, only 8 and 8 are actually required in the coputation of 8 points radix- DIF and DIT FFT. By using the property of coplex nuber, the twiddle factor has the following property: [, ) = j [, ) = = [,3 ) 3 3 3 = j [3, ) Also, as observed fro radix- DIF FFT diagra in Figure (b), we know any single butterfly in the Stage s of radix- -point DIF FFT can be illustrated in the diagra forat as in Figure 7. x[n+/ s ] n od s s Figure 7. Single butterfly in Stage s of -pt radix- DIF FFT. The butterfly with as the upper input and x[n+/s] as the s nod s lower input uses as twiddle factor. For exaple, in the stage of Figure, the butterfly with the upper input x[] and x[+8/], naely x[6] as lower input 8 od 8 uses twiddle factor = 8. Hence, we have the following property for the radix- DIF FFT diagra: Two butterflies in stage s as illustrated in Figure 8(a) can be coputed by loading one twiddle factor, where s = ( n od ) s as illustrated in Figure 8(b). x[n+/ s+ ] x[n+/ s ] x[n+3/ s+ ] +/ x[n+/ s+ ] x[n+/ s ] x[n+3/ s+ ] -j (a) (b) Figure 8. Two butterflies coputation using one twiddle factor in DIF FFT diagra. Likewise, after applying TFRM to DIT FFT diagra, we have the siilar property for radix- DIT FFT diagra: Two butterflies in stage s as illustrated in Figure 9(a) can be coputed by loading one twiddle factor, where = (n od s ) s- as illustrated in Figure 9(b). x[n+/ s ] x[n+/ s+ ] x[n+3/ s+ ] +/ (a) x[n+/ s ] x[n+/ s+ ] x[n+3/ s+ ] -j Figure 9. Two butterflies coputation using one twiddle factor in DIT FFT diagra. III. FFT IMPLEMETATIOS TFRM and TFBBGM can reduce the nuber of eory references due to twiddle factors, thus decreasing the nuber of clock cycles. However, they also increase the code coplexity to soe extent, thus increasing the nuber of clock cycles. In this section, these ethods are ipleented (b) ISB: 978-988-7-7-5 IMECS 9

Proceedings of the International MultiConference of Engineers and Coputer Scientists 9 Vol II IMECS 9, March 8 -, 9, Hong Kong with iterative and recursive codes respectively. To perfor thorough study on different FFT code perforance, we further expand those iterative codes applied with TFBBGM to ake the in different iterative structures which require fewer loops but take ore code space. A. DIF/DIT FFT with TFBBGM and TFRM TFRM reduces the eory references due to twiddle factor by half in each stage of coputation, and coputes the butterflies in last two stages without twiddle factors [3], so the nuber of eory references due to twiddle factors is reduced to (log -) /. TFBBGM reduces the nuberof eory references due to twiddle factor by grouping the butterflies with identical twiddle factor and the butterflies with twiddle factor do not need twiddle factor to coplete ultiplication, so the nuber of eory references due to twiddle factor is /-. After applying TFRM and TFBBGM together, the nuber of eory references due to twiddle factor in the new radix- -points DIF FFT code will be reduced to /-. Figure shows the coputation diagra of a 8-points radix- DIF-FFT with TFRM and TFBBGM. Since the butterflies coputed in the last step do not need twiddle factor due to the fact that =, only twiddle factors are required. Figure. Radix- DIF FFT diagra with TFBBGM and TFRM. e also apply TFBBGM and TFRM to radix- DIT FFT. Like radix- DIF FFT with TFBBGM and TFRM, the nuber of eory references due to twiddle factor will be greatly reduced. However, due to the difference of DIF and DIT FFT, the input of radix- DIT FFT should be in bit-reversed order before TFBBGM and TFRM are applied together to radix- DIT FFT. Figure shows the coputation diagra of 8-points radix- DIT FFT with TFBBGM and TFRM. Figure. Radix- DIT FFT diagra with TFBBGM and TFRM. Fro the diagra, it is easy to see that only twiddle factor are used during coputation. Since =, twiddle factor will not be loaded in Stage. Hence, only twiddle factors are loaded during the coputation. B. Recursive DIF and DIT FFT ipleentation -point radix- DIF and DIT FFT can be prograed in recursive structures which will decrease the code coplexity to soe extent. Like iterative code structure, TFBBGM and TFRM can still be applied to the recursive code ipleentation of -point radix- DIF and DIT FFT. Recursive C code ipleentation is also based on the FFT diagra, but the way butterflies are grouped is different fro Iterative C code. Figure shows the coputation and the partitioning of the 8-points radix- DIF and DIT FFT diagra. Figure. Partitioning of DIF and DIT FFT diagra according to overlapping twiddle factors. The diagra is also partitioned into stages and the butterflies in the sae stage are grouped according to their positions, not according to the sae twiddle factor. Thus, butterflies which overlap with each other are grouped. The butterflies in Stage s of -point radix- DIF and DIT FFT are divided into s- groups. All butterflies are coputed according to the group order. E.g. butterflies in group 3 are coputed after butterflies in group are coputed, and butterflies in the sae group are coputed fro top to botto. In Figure (a), twiddle factors are increased by value s- for butterflies in the sae group in Stage s. e apply TFRM, TFBBGM on recursive DIF FFT and recursive DIT FFT respectively, thus we have recursive DIF FFT with TFRM, recursive DIF FFT with TFBBGM, recursive DIT FFT with TFRM and recursive DIT FFT with TFBBGM. Since TFBBGM and TFRM can be applied together, we apply both of the to get recursive DIF FFT with TFBBGM & TFRM and recursive DIT FFT with TFBBGM & TFRM. Experients of these codes with different input sizes are perfored to get the clock cycle data in the following Section. C. DIF and DIT FFT with expansion In order to copletely study how the code techniques reflect the perforance, we further expand the iterative DIF and DIT FFT applied with TFBBGM (as illustrated in coputation diagra in Figure 5 and 6) into 3 steps: For iterative DIF FFT with TFBBGM in Figure 5, the first step coputes butterflies with twiddle factor as, where is odd nuber; the second step groups the butterflies with identical twiddle factors and copute the in only two loops, the third step coputes the butterflies with twiddle factor =, naely coputes butterflies without ultiplication. For iterative DIT FFT with TFBBGM in Figure 6, the first step coputes butterflies without ultiplication like third step in iterative DIF FFT with TFBBGM, the second step is siilar to the second step as iterative DIF FFT with TFBBGM with only the order changed, and the third step is the siilar to step in DIF, with order changed. Thus we get another two codes: iterative DIF FFT with TFBBGM in 3 steps and iterative DIT FFT with TFBBGM in 3 steps. ISB: 978-988-7-7-5 IMECS 9

Proceedings of the International MultiConference of Engineers and Coputer Scientists 9 Vol II IMECS 9, March 8 -, 9, Hong Kong e also apply the sae further expansion to iterative DIF FFT with TFBBGM & TFRM and iterative DIT FFT with TFBBGM & TFRM to get iterative DIF FFT with TFBBGM & TFRM in 3 steps and iterative DIT FFT with TFBBGM & TFRM in 3 steps. VI. PERFORMACE EVALUATIO e have ade total different FFT codes based on different integrations of iterative, recursive, TFBBGM, TFRM with further expansion. Due to the space liitation, FFT codes are identified using shortcut. Code : Iterative DIF FFT, Code : Iterative DIF FFT with TFBBGM, Code 3: Iterative DIF FFT with TFBBGM in 3 steps, Code : Iterative DIF FFT with TFRM, Code 5: Iterative DIF FFT with TFBBGM and TFRM, Code 6: Iterative DIF FFT with TFBBGM and TFRM in 3 Steps, Code 7: Recursive DIF FFT, Code 8: Recursive DIF FFT with TFBBGM, Code 9: Recursive DIF FFT with TFRM, Code : Recursive DIF FFT with TFBBGM and TFRM, Code : Iterative DIT FFT, Code : Iterative DIT FFT with TFBBGM, Code 3: Iterative DIT FFT with TFBBGM in 3 steps, Code : Iterative DIT FFT with TFRM, Code 5: Iterative DIT FFT with TFBBGM and TFRM, Code 6: Iterative DIT FFT with TFBBGM and TFRM in 3 Steps, Code 7: Recursive DIT FFT, Code 8: Recursive DIT FFT with TFBBGM, Code 9: Recursive DIT FFT with TFRM and Code : Recursive DIT FFT with TFBBGM and TFRM. These codes are tested on ajor DSP processors in the industry: TI TMS3C6xx, ARM ARM7TDMI processor, ADSP-TS TigerSHARC processor, and freescale SC3 StarCore DSP. After testing each FFT code on DSP processors at different input sizes of 5, and 8, we sort the FFT codes for each DSP processor in ters of clock cycle. As illustrated in Figure 3, we find that the recursive codes always interleave between iterative codes. This is contrary to the belief that recursive code is always slower than iterative code. Due to space liitation, we use digit to to identify code to code in Figure 3. e also copare the slowest code and the fastest code within recursive FFT and iterative FFT codes. But for the space liitation, we could not list the table to copare their speed, here the fastest code is copared with slowest code through Figure 3 only. For instance, we copare the nuber of clock cycle of the fastest recursive code with the slowest recursive code when platfor is freescale SC3 StarCore DSP and the input is 8-pts, we find the fastest one is. faster than slowest one, and siilar result applies to the rest cases. Such as FFT codes tested on the platfor TI TMS3C6xx processor at input size of 5, the fastest iterative code is.8 ties faster than the slowest itetative code. Hence, it is clear that he code techniques presented in this paper can increase the FFT code execution speed around ties within iterative codes or recursive codes. ISB: 978-988-7-7-5 IMECS 9

Proceedings of the International MultiConference of Engineers and Coputer Scientists 9 Vol II IMECS 9, March 8 -, 9, Hong Kong Figure 3. FFT codes Clock cycles on DSP architectures at input of 5, and 8 pts. V. COCLUSIO e presented a systeatic and synergic study on the efficiency of different FFT software ipleentations. Different code techniques such as recursive, iterative, TFBBGM, TFRM with further expansion were explored. An extensive experient was conducted for input fro 5 to 8 points. e ipleented FFT codes on different DSP processors. Contrary to the coon belief that recursive progras are slower than iterative progras, we find that recursive progras are not necessarily slower for coonly used FFT. Instead its perforances are deterined by any factors. Also we find the code techniques presented in this paper can increase FFT code execution speed ties for both iterative codes and recursive codes. REFERECES [] C.S. Burrus and T.. Parks, DFT/FFT and Convolution Algoriths and Ipleentation, Y John iley & Sons, 985. [] A. V. Oppenhei and C. M. Rader, nd ed., Discrete-Tie Signal Processing. Upper Saddle River, J: Prentice-Hall, 989. [3] J.. Cooley and J.. Tukey, An algorith for the achine calculation of coplex Fourier series, Math. Copu., vol. 9, pp. 97-3, 965. [] G.D. Bergland, A Radix-Eight Fast-Fourier Transfor Subroutine for Real-Valued Series, IEEE Trans. Electroacoust., vol. 7, no., pp. 38-, June 969. [5] R.C. Singleton, An Algorith for Coputing the Mixed Radix Fast Fourier Transfor, IEEE Trans. Audio Electroacoust., vol., no., pp. 93-3, June 969. [6] P. Duhael, and H. Hollann, Split Radix FFT Algorith, Electronics Letters, vol., pp. -6, Jan. 5, 98. [7] D. Takahashi, An Extended Split-Radix FFT Algorith, IEEE Signal Processing Letters, vol. 8, no. 5, pp. 5-7, May. [8] A. R. Varkonyi-Koczy, A Recursive Fast Fourier Transfor Algorith, IEEE Trans. Circuits and Systes, II, vol., pp. 6-66, Sep. 995. [9] A. Saidi, Deciation-in-Tie-Frequency FFT Algorith, Proc. ICAPSS, pp. III:53-56, April 99. [] B.M. Baas, A low-power, high-perforance, -point FFT processor IEEE J.Solid-State Circuits, vol. 3, issue 3, pp.38-387. March 999. [] Mathwork Inc., Matlab function reference FFT, http://www.athworks.co/access/helpdesk/help/techdoc/ref/fft.sht l?bb= [] Y. Jiang, Y. Tang, and Y. ang, Twiddle-factor-based FFT algorith with reduced eory access, Proc. IDPDS, pp. 653-66,. [3] Y. Tang, L. Qian, Y. ang, and Y. Jiang, Twiddle Factor Based Meory Reduction Method for FFT Ipleentation on DSP. to be published [] Texas Instruent, TMS3C6x DSP Library Prograer's Reference (Rev. B), SPRU565A, Oct. 3, 3. [5] Tanner.R. A recursive approach to low coplexity codes IEEE Transactions on Inforation Theory. ISB: 978-988-7-7-5 IMECS 9