DESIGN OF LOW POWER FFT PROCESSORS USING MULTIPLIER LESS ARCHITECTURE Senhil Sivakumar M., Gurumekala T. 2, Sundaram A. 3, Thandaiah Prabu R. 4, Arpuharaj T. 5 and Banupriya M. 6 Deparmen of Elecronics, Madras Insiue of Technology, Chennai, India 2 Deparmen of Compuer Science Engineering, PSR Engineering College, Sivakasi, India 3 Deparmen of Elecronics Engineering, Universiy of Wolkie, Ehiopia 4 Deparmen of Elecronics and Communicaion Engineering, Prahyusha Insiue of Technology and Managemen, Chennai, India 5 Deparmen of Elecronics and Communicaion Engineering, S. Joseph College of Engineering and Technology, Tanzania 6 Deparmen of Compuer Science Engineering, Naional Engineering College, Kovilpai, India ABSTRACT In his paper, we presen a novel resrucured coefficien ordering based 6 poin pipelined FFT processor. The projeced novel FFT has been designed wih he use of fixed radix-4 and single pah pipelined archiecure. Higher hroughpu rae is gained from his pipelined archiecure when compared o ordinary pipelined archiecure. The power consumpion issue is fixed by reducing he swiching aciviy wih he use of leas ransiion in Hamming disance. Through his, he swiching aciviy of widdle compuaion is reduced from 92 o 78 which is consising 59% of reducion. Inroduced muliplier less archiecure cus down he number of compuaions o realize complex muliplicaion. The 6-poin FFT implemenaion is done wih Verilog HDL and synhesized using 0.8um Cadence RTL compiler. The power evaluaion of FFT has been obained from he circui ne lis using a clock frequency of 00MHz. Keywords: fas fourier ransform (FFT), radix-4, parallel archiecures, OFDM, Verilog HDL.. INTRODUCTION The FFT processor is a criical block in orhogonal frequency division muliplexing (OFDM) echnology. Due o he naure of unconrollable processing on he same clock frequency of sampling daa, mos preference is given o pipeline FFT especially for a low power soluion or high hroughpu. The commuaor and he complex muliplier blocks a each sage conribue a dominaing par of he enire power consumpion in he pipelined archiecure. This paper proposes an opimal design o minimize one of he significan power consuming facors known as he swiching aciviy. The coefficien ordering mehod is followed o reduce he amoun of swiching aciviy beween successive coefficiens which are used by complex mulipliers. The coefficien ordering requires a consisen daa sequence as per new ordering OF coefficiens. Thus, we can aain he less hardware complexiy and maximum efficiency. The archiecure of digial processing based MC- CDMA receiver consiss of FFT block, combiner and Vierbi decoder. In which he logic block FFT is consuming more power compared o oher logic blocks. The oal power used for he receiver can be reduced significanly by reducing he power consumpion of hese blocks paricularly in FFT block. The logic srucure of Muliplier-less based parallel-pipelined FFT archiecures for applicaions of wireless communicaion is proposed in [7] by Han, Arslan e al. In his paper, he logic block of muliplier is replaced by adders and shif regisers. The logic blocks inroduced insead of complex muliplier are reducing he oal swiching aciviies of FFT and minimizing he coun of compuaions. Through hese advancemens, i reduced he oal power consumpion of FFT considerably. Senhil Sivakumar M, e al. inroduced he parallel-pipelined FFT archiecure for reducing he power consumpion of he FFT. The complex logic blocks, i.e., commuaor, muliplier, buerfly archiecures are modified and inroduced for low power consumpion. So as o reduce he swiching aciviy, he low power IDR commuaor is inroduced [], [2], [3] and [5]. To reduce he power consumpion [], [2] and [5], muliplier less archiecure has been implemened. The buerfly archiecure is replaced wih 2 s complemen operaion and simple adders which diminish he power consumpion of FFT noably [], [2] and [5]. Besides ha, hese low power archiecures are proposed in [4], [6] and [8] wih various FFT archiecures. The Simulaion, design and analysis of low power MIMO-OFDM sysem and is implemenaion on FPGA is performed by Bhaacharjee e al. The low power FFT processor s FPGA implemenaion is suggesed in [0] and [] wih an orhogonal frequency division muliplexing mehod ha increase he daa ransfer rae of ransceivers on parallel sub-carriers. This can be used in he mulicarrier ransceivers o increase he hroughpu of he device. In his paper, FFT algorihm is described in Sec II. Nex secion III describes implemenaion of coefficien reordered pipelined FFT. Reordering of coefficiens and accordingly he reordering of inpu informaion are explained in deail wih he reference of flow graph of FFT. Sec IV presens he proposed minimum swiching aciviy design and muliplier less approach wih he reference of coefficien reordering. The FFT processor has been implemened wih 6 bi complex daa and is presened in Sec V. 4937
2. ALGORITHM The N poin DFT can be expressed as N X ( K) x( n), () n 0 W nk N The exp(-j2пnk/n) is usually represened wih W N nk ; Where, W = exp(-j2п/n). Le, N is a composie number of v inegers such ha N= r r 2 r 3.. r v, and define N = N /r r 2 r 3. r, and v- (2) Where, is number of sages of he decomposed DFT and r is he radix. Wih he use of recursive propery and he Nj k k relaionship W Ni Nj = W Ni for radix r Equaion () becomes N r r X ( K ) W qk X ( Np q) W pk (3) N r q 0 p 0 The below menioned equaion defines he compuaion used for he firs sage. The final sage is saed as follows. 3. REREORDERED PIPELINED FFT STRUCTURE A pipelined N poin radix- 4 FFT Processor is shown in Figure- in which he srucure is represened based on he previously described algorihm and ha will have log 4 N sages. Each sage yields one oupu wihin each word cycle where each sage conains a commuaor, a buerfly and a complex muliplier. The gained consecuive oupu of each sage mus be well-ordered in accordance wih he value of m. For example, in sage of Figure-, he oupu relaed wih m =0 is produced in firs four word cycles, hen hose relaed wih m = in he nex four cycles and so on. r v q v 0 m v q v rv X ( rr... r m... rm m ) W x ( q, m ) 2 v v 2 The compuaion used for inermediae sages are given by he following recursive equaion, r q m N p 0 pm x ( q, m ) W W x ( N p q, m ) (5) Where, r For r = 4, based on he above formulaion he flow graph of a 6 poin FFT is consruced as shown in Figure-. The corresponding equaion is as follows, 3 qm2 X (4m2 m) W (, ) q 0 4 X q m (6) Where, v qm 3 X ( q, m) W qm W (4 ) 6 q 0 4 X p q In he Figure- each open circle represens a summaion, while he dos define boundaries of he sage. The ineger ouside he open circle is represening he power of FFT widdle facor W N used. The value of m (for sage ) or m2 (for sage 2) are used o denoe ineger inside each open circle. The FFT coefficien is denoed by he number ouside he open circle is he FFT coefficien. v v (4) Figure-. Signal flow graph of Radix-4 6-poin FFT. 4. IMPLEMENTATION OF FFT The coefficiens of FFT are reordered o minimize he hamming disance for each coefficien ransiion which is helpful o minimize swiching aciviy beween successive coefficiens. The hamming disance is saed as he number of s presen in he XOR operaion beween wo binary coefficiens. The 5 bi fixed poins are used o encode boh he original and ordered coefficien sequence. Transiion marix beween each coefficien based on he hamming disance is developed o acquire he minimum swiching aciviy. Table-. The ransiion marix of swiching aciviy beween wo coefficiens. W 0 W W 2 W 3 W 4 W 6 W 9 W 0 0 5 7 9 3 2 3 W 5 0 4 6 2 20 24 W 2 7 4 0 4 4 6 4 W 3 9 6 4 0 6 2 8 W 4 3 2 4 6 0 20 6 W 6 2 20 6 2 20 0 6 W 9 3 24 4 8 6 6 0 4938
From he ransiion marix, we can arrange he widdle facor FFT o minimize he swiching aciviy easily. By his way swiching aciviy of FFT reduced from 92 o jus 78, a reducion of 58% achieved by his approach. To have a change in coefficien reordering corresponding daa o be reordered accordingly. This reordered daa sequence has o be convered back ino normal daa sequence for is sage 2. From he ransiion marix i is proved ha he swiching aciviy decreases from 92 o jus 78, a reducion of 58%. The muliplier less archiecure proposed in his paper uses shifer, adders and subraors o replace he complex mulipliers of convenional muliplier using common sub expression sharing mehod. The proposed shif regisers and adders reduce he oal power consumpion of FFT processor by reducing he swiching aciviy, amoun of compuaions which are exising wih he complex mulipliers. Table-2. Ordered and convenional coefficien sequence for a 6 poin radix -4 FFT. Convenional Coefficiens Ordered Coefficiens W 3b20,e782 W0 4000,0000 W2 2d4,d2be W0 4000,0000 W3 87d,c4df W4 0000,c000 W0 4000,0000 W 3b20,e782 W2 2d4,d2be W2 2d4,d2be W4 0000,c000 W2 2d4,d2be W6 d2bf,d2bf W3 87d,c4df W0 4000,0000 W3 87d,c4df W3 87d,c4df W6 d2bf,d2bf W6 d2bf,d2bf W6 d2bf,d2bf W9 c4e0,87e W9 c4e0,87e Toal number of Swiching ransiions-92 Toal number of Swiching ransiions - 78 Figure-2. Signal flow graph of Radix-4 ordered 6- poin FFT. The change in coefficien ordering involves corresponding daa reordering accordingly. The ordering can be generaed by a commuaor which pipelines he serial inpu daa o a four parallel oupu for a radix- 4 buerfly compuaion. a) Muliplier-less R4SDC FFT archiecure The number of muliplicaions ha are used as a key meric for comparing FFT algorihms required addiional evaluaion ime, which impac large on he execuion ime and power consumpion of FFT processor. In his paper, we presen 6 poin FFT buerfly, which reduces he complexiy presen in muliplier by using real, consan muliplicaions. In his approach Canonical Signed-Digi (CSD) and Common sub expression sharing echniques are used o reduce he power consumpion of he proposed muliplier. Canonical Signed-Digi (CSD) is one widely used redundancy of signed digi code o replace he convenional muliplier digis. Common sub expression sharing segmens he sub expression among several muliplicaion-accumulaion operaions o reduce he oal number of addiion and shif operaions [] and [2]. b) Complex muliplicaion Firs, we discuss he effecuaions of realizing complex muliplicaions wih real muliplicaion procedure. The produc of complex numbers=a+jb and Y=C+jD is (A+jB) (C+jD)=(AC-BD)+j(AD+BC). The direc compuaions of complex muliplicaions implemened by employing four real muliplicaions and wo addiions which increase he chip area and power consumpion. Alernaively o reduce he complex muliplicaions he original compuaions are modified as follows. m0=(a+b)(c+d) m=ac m2=bd (A+jB)(C+jD)=(m-m2)+j(m0-m-m2) Hence he complex muliplicaions can be reduced o hree real muliplicaions and hree addiions. The above implemenaions are applicable for general complex muliplicaions, i.e., he daa and coefficiens are variable. 5. RESULTS AND ANALYSIS The convenional and ordered pipelined FFT% processor archiecures have been implemened and hen synhesizaion is done wih he use of 0.8um Cadence RTL Compiler. Power evaluaion was hen carried on he circui ne lis using a clock frequency of 6MHz. The swiching aciviy decreases from 92 o 78, a reducion of 58%. The comparaive power consumpions for differen FFT lenghs, FIFO implemenaion modes and various low power mulipliers are given in Table-3. The FIFO was implemened in wo differen ways, namely SR (shif regiser based) and DM (dual por RAM based). A 6 poin FFT processor design has been carried 4939
ou using hree differen muliplier namely Wallace ree, carry save array (CSA) and Non booh coded Wallace ree (NBW) ypes of mulipliers. I is proved from Table-3 ha he ordered archiecure gives power savings for wo muliplier ypes. Table-3. Power consumpion able of 6-Poin FFT. FFT 6 Poin FFT processor Carry save array Type of mulipliers Non-booh coded Wallace ree Wallace ree Convenional SR based in (mw) 76.65 78.03 75.33 Convenional DM based in 82.97 70.56 7.6 (mw) SR based (mw) 70.97 68.45 73.3 DM based (mw) 24.47 4.47 2.39 Ordered (mw) 65.49 58.24 70.92 % saving (SR) 8 4 3 % saving (DM) 47 49 4 Wallace NBW CSA 0 00 200 Power Consumpion of FFT (in mw) Ordered Convenional DM based Convenional SR baed DM Based SR Based Figure-3. Power consumpions of ordered and convenional FFT. 6. CONCLUSIONS In his paper, he modified coefficien ordering echnique is applied o 6 poin FFT processor. The inroduced muliplier less archiecure is implemened wih hree differen echniques: Carry save adder, Non-booh coded Wallace ree, Wallace ree. The commuaor of FFT is implemened wih shif regisers and DRAM mehods and he power consumpions of differen srucures are compared wih each oher. The FFT processor also implemened wih convenional and proposed mehods using Verilog HDL and synhesized using cadence design ool. The swiching aciviy decreases from 92 o 78 in a whole 6 poin cycle, a reducion 6%. I is shown ha he swiching aciviy is minimized hrough he ransiion marix. REFERENCES [] Senhil Sivakumar M., Banupriya M. and Arockia Jayadhas. 202. Design of Low Power High Performance 6-Poin 2-Parallel Pipelined FFT Archiecure, Inernaional Journal of Elecronics, Communicaion & Insrumenaion Engineering Research and Developmen (IJECIERD), Vol.2, No. 3 Sep 202, pp. 2-26. [2] Senhil Siva Kumar M., Arockia Jayadhas S., Arpuharaj T. and Banupriya M. 203. Design of adapive MC-CDMA receiver using low power parallel-pipelined FFT archiecure IEEE proc. on PACT, pp.29-33. [3] Wei Han, Ahme T. Erdogan, Tughrul Arslan and Mohd. Hasan. 2008. High-Performance Low-Power FFT Cores, ETRI Journal, Vol. 30, No. 3, June. [4] W. Han, A.T. Erdogan T. Arslan and M. Hasan. 2004. A Novel Low Power Pipelined FFT Based on Sub expression Sharing for Wireless LAN Applicaions, IEEE Signal Processing Sysems Workshop (SIPS), Ocober. [5] Senhil Sivakumar M., S. A. Jayadhas, Arpuharaj T. and Banupriya M. 203. Design of Dynamically Reconfigurable Adapive MC-CDMA Receiver using FFT Archiecure, African Journal of Informaion and Communicaion Technology (AJICT), Vol.7, No.2, pp.49-6. [6] Molisch A. 20. Orhogonal Frequency Division Muliplexing (OFDM), Wiley-IEEE Press ebook, pp. 47-443. [7] Han, Arslan, Erdogan and Hasan. 2005. Muliplierless based parallel-pipelined FFT archiecures for wireless communicaion applicaions, (ICASSP '05), IEEE, Vol. 5, pp. 45-48. [8] Wei Han, Ahme T. Erdogan, Tughrul Arslan and Mohd. Hasan. 2008. High-Performance Lo w-power FFT Cores, ETRI Journal, Vol. 30, No. 3, June. [9] Bhaacharjee, Sil, Dey, Chakrabari. 20. Simulaion, design and analysis of low power MIMO-OFDM sysem and is implemenaion on FPGA, (ReTIS), IEEE, pp. 88-93. [0] Zhuo Qian, Nasiri N, Segal O, Margala M, FPGA implemenaion of low-power spli-radix FFT processors 24 h Inernaional Conference on Field Programmable Logic and Applicaions (FPL), 204, pp. 2. [] In-Gul Jang, Zhe-Yan Piao, Ze-Hua Dong, Jin-Gyun Chung and Kang-Yoon Lee. 20. Low-power FFT 4940
design for NC-OFDM in cogniive radio sysems, IEEE Inernaional Symposium on Circuis and Sysems (ISCAS), pp. 2449 2452. 494