A 0.75 Million Point Fourier Transform Chip for Frequency Sparse Signals Ezz El Din Hamed Omid Abari, Haitham Hassanieh, Abhinav Agarwal, Dina Katabi, Anantha Chandrakasan, Vladimir Stojanović International Solid-State Circuits Conference 1of 42
Leverage Sparsity Output of FFT is Sparse: Most frequencies have zero energy or noise Amplitude Frequency Only few frequencies are active and have energy Use sparse FFT algorithm Find and compute the active frequencies International Solid-State Circuits Conference 2of 42
Applications Spectrum Sensing GPS Radio Astronomy GHz spectrum sensing and acquisition Pattern matching by convolution with long code (e.g. GPS) Radio astronomy: data compression International Solid-State Circuits Conference 3of 42
Wideband Spectrum Sensing 100 80 Occupancy Graph Seattle January 7, 2013 (Microsoft Spectrum Observatory) Occupancy (%) 60 40 20 0 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 Frequency (GHz) International Solid-State Circuits Conference 4of 42
Sparse FFT Algorithm Outline Micro architecture and Implementation Measurement Results Conclusion International Solid-State Circuits Conference 5of 42
How Does Sparse FFT Work? Values V1 V2 V3 0 1 2 3 4 5 6 7 8 9 10 11 Frequencies 1. Bucketize 2. Estimate 3. Resolve collisions International Solid-State Circuits Conference 6of 42
How Does Sparse FFT Work? Values V1 V2 V3 V2+V3 V1 0 1 2 3 4 5 6 7 8 9 10 11 1. Bucketize Frequencies 0 1 2 3 Buckets Sub sampling time Aliasing the frequencies International Solid-State Circuits Conference 7of 42
How Does Sparse FFT Work? Values V1 V2 V3 V2+V3 V1 0 1 2 3 4 5 6 7 8 9 10 11 2. Estimate Frequencies 0 1 2 3 Buckets International Solid-State Circuits Conference 8of 42
How Does Sparse FFT Work? Values V1 V2 V3 V2+V3 V1 0 1 2 3 4 5 6 7 8 9 10 11 2. Estimate Frequencies Repeat bucketization with a time shift 0 1 2 3 Buckets Time Domain Freq Domain Phase Rotation International Solid-State Circuits Conference 9of 42
How Does Sparse FFT Work? Values V1 V2 V3 V2+V3 V1 0 1 2 3 4 5 6 7 8 9 10 11 Frequencies 3. Resolve Collisions Repeat bucketization with a different subsampling rate (co prime) 0 1 2 3 Buckets V1+V3 V2 0 1 2 Buckets International Solid-State Circuits Conference 10 of 42
Implementation of Sparse FFT Sparse FFT size: N=3 6 x2 10 =746496 Subsample by 3 6 in time FFT of size 1024 (2 10 ) Subsample by 2 10 in time FFT of size 729(3 6 ) Sparsity: outputs up to 750 active frequencies Output resolution: 12 bit real, 12 bit imaginary International Solid-State Circuits Conference 11 of 42
Sparse FFT Chip Block Diagram Memory 1 2 10 3 6 Memory 2 2 10 3 6 Memory Interface Memory 3 2 10 3 6 Subsamples of new signal 2 10 -point FFT Scheduler Update Recovered freq. of processed signal I/O interface 3 6 -point FFT Bucketization Estimation Collision Detection Reconstruction 3 Pipelined Stages International Solid-State Circuits Conference 12 of 42
Sparse FFT Chip Block Diagram Memory 1 2 10 3 6 Memory 2 2 10 3 6 Memory Interface Memory 3 2 10 3 6 Subsamples of new signal 2 10 -point FFT Scheduler Update Recovered freq. of processed signal I/O interface 3 6 -point FFT Bucketization Estimation Collision Detection Reconstruction Input: sub sampled signal (by 2 10 and 3 6 ) Output: positions and complex values of active frequencies International Solid-State Circuits Conference 13 of 42
Sparse FFT Chip Block Diagram Memory 1 2 10 3 6 Memory 2 2 10 3 6 Memory Interface Memory 3 2 10 3 6 Subsamples of new signal 2 10 -point FFT Scheduler Update Recovered freq. of processed signal I/O interface 3 6 -point FFT Bucketization Estimation Collision Detection Reconstruction 3 Memory sets, 2 SRAM blocks each Rotated across the 3 pipeline stages International Solid-State Circuits Conference 14 of 42
Sparse FFT Chip Block Diagram Memory 1 2 10 3 6 Memory 2 2 10 3 6 Memory Interface Memory 3 2 10 3 6 Subsamples of new signal 2 10 -point FFT Scheduler Update Recovered freq. of processed signal I/O interface 3 6 -point FFT Bucketization Estimation In place FFT Radix 4 butterfly for 2 10 point FFT Radix 3 butterfly for 3 6 point FFT Three FFTs of each type sharing the memory Collision Detection Reconstruction International Solid-State Circuits Conference 15 of 42
2 10 FFT Architecture Three radix 4 butterflies running in parallel Two 12b x 12b real multipliers per butterfly Multiplexing Real and Imaginary International Solid-State Circuits Conference 16 of 42
2 10 FFT Architecture BF BF 1024 4 Butterflies Memory BF Memory BF Memory Memory BF Frame log 1024 frames In place 1024 point FFT using a radix 4 butterfly 1024 additions (10 extra bits to account for overflow) International Solid-State Circuits Conference 17 of 42
Dynamic Scaling Block Floating Point Writing to memory every 2 clock cycles Current Frame Exponent 12 Real Imag 12 Decision for next frame Tracking Overflow MSBs Current Frame Adjustment Reduce the required bit width Tracking large values to avoid unnecessary shifting >> >> 14 14 MUX MUX Real 1,2,3 & 4 Imag 1,2,3 & 4 International Solid-State Circuits Conference 18 of 42
Sparse FFT Chip Block Diagram Memory 1 2 10 3 6 Memory 2 2 10 3 6 Memory Interface Memory 3 2 10 3 6 Subsamples of new signal 2 10 -point FFT Scheduler Update Recovered freq. of processed signal I/O interface 3 6 -point FFT Bucketization Estimation Collision Detection Reconstruction Reconstruction stage has 4 sub blocks International Solid-State Circuits Conference 19 of 42
Sparse FFT Chip Block Diagram Memory 1 2 10 3 6 Memory 2 2 10 3 6 Memory Interface Memory 3 2 10 3 6 Subsamples of new signal 2 10 -point FFT Scheduler Update Recovered freq. of processed signal I/O interface 3 6 -point FFT Bucketization Estimation Collision Detection Reconstruction International Solid-State Circuits Conference 20 of 42
Estimation Sub block Estimate frequency position where 0 1 For 0.75 million Use 20 bits to represent International Solid-State Circuits Conference 21 of 42
Estimation Sub block Estimate frequency position where 0 1 For 0.75 million Use 20 bits to represent For 2 10 bucketization: frequency aliases to bucket number : 10 LSBs of can be estimated from bucket number 10 remaining MSBs are estimated using phase rotation International Solid-State Circuits Conference 22 of 42
Estimation Sub block Use phase rotation from a time shift to estimate the 10 MSBs of : 2 2 International Solid-State Circuits Conference 23 of 42
Estimation Sub block Use phase rotation from a time shift to estimate the 10 MSBs of : 2 2 If 1,phase wraps around 2 create same phase rotation loose highest MSBs Use time shift to get MSBs International Solid-State Circuits Conference 24 of 42
Estimation Sub block Use phase rotation from a time shift to estimate the 10 MSBs of : 2 If 1,phase wraps around 2 create same phase rotation loose highest MSBs Use time shift to get MSBs 2 Noise in creates error Use large time shift to average noise in International Solid-State Circuits Conference 25 of 42
Estimation Sub block Use phase rotation from a time shift to estimate the 10 MSBs of : 2 2 If 1,phase wraps around 2 create same phase rotation loose highest MSBs Use time shift to get MSBs Noise in creates error Use large time shift to average noise in Use two different time shifts and to estimate the 10 MSBs of International Solid-State Circuits Conference 26 of 42
Estimation Sub block Frequency Recovery from 2 10 bucketization : 20 bits 5 bits 5 bits 10 bits from shift 1 from shift 32 from bucket number International Solid-State Circuits Conference 27 of 42
Estimation Sub block Frequency Recovery from 2 10 bucketization : 20 bits 5 bits 5 bits 10 bits from shift 1 from shift 32 from bucket number Problem: Small error propagates Affect MSBs Solution: Use 3 bit overlap to detect errors 8 bits 8 bits 3 bits 3 bits 10 bits International Solid-State Circuits Conference 28 of 42
Estimation Sub block Estimation Complex values of 3 FFTs CORDIC Step 1 Frequency Recovery + + 7:3 2:0 7:5 4:0 Frequency Recovery + 12:8 1 7:0 Phase Rotation 2 Estimated Frequency International Solid-State Circuits Conference 29 of 42
Estimation Sub block Estimation Complex values of 3 FFTs CORDIC Step 2 Frequency Recovery 2 3 12:0 Phase Rotation 2 X Bucket number 12:3 2:0 9:7 6:0 Frequency Recovery Estimated Frequency + 1 19:10 9:0 20 bits International Solid-State Circuits Conference 30 of 42
Estimation Sub block For 3 6 bucketization: frequency aliases to bucket number : 6 LSBs cannot be replaced directly by the bucket number Naïve implementation: 1. Estimate from the phase rotation 2. Calculate the remainder mod 3 3. Subtract this remainder from 4. Then add the bucket number b 2 Steps 2 and 3 are done indirectly by truncating the LSBs of International Solid-State Circuits Conference 31 of 42
Estimation Sub block Estimation Complex values of 3 FFTs CORDIC Step 2 2 3 12:3 2:0 Phase Rotation + X 1 Frequency Recovery X 9:7 Bucket number 6:0 2 1 2 3 Frequency Recovery Estimated Frequency + 19:0 20 bits International Solid-State Circuits Conference 32 of 42
Sparse FFT Chip Block Diagram Memory 1 2 10 3 6 Memory 2 2 10 3 6 Memory Interface Memory 3 2 10 3 6 Subsamples of new signal 2 10 -point FFT Scheduler Update Recovered freq. of processed signal I/O interface 3 6 -point FFT Bucketization Estimation Collision Detection Reconstruction International Solid-State Circuits Conference 33 of 42
No Collision: Collision Detection Single active frequency Time shift causes only change in phase and not in magnitude of the bucket International Solid-State Circuits Conference 34 of 42
No Collision: Collision Detection Single active frequency Time shift causes only change in phase and not in magnitude of the bucket Collision: Multiple active frequencies sum up with different phase rotations Time shift causes the magnitude of the bucket to change International Solid-State Circuits Conference 35 of 42
Die photo Technology Core Area Core Supply Voltage 45nm SOI CMOS 0.6mm x 1.0mm 0.66 1.18 V Clock Frequency 0.5 1.5 GHz Core Power Latency 14.6 174.8 mw 63 21 μs International Solid-State Circuits Conference 36 of 42
Measurement Result International Solid-State Circuits Conference 37 of 42
Measurement Result (0.66 V, ) 48 sfft/ms 298 nj/sfft International Solid-State Circuits Conference 38 of 42
Measurement Result 88x Speed up over intel i7 (3.4GHz) (1.18 V,. ) 146 sfft/ms 1.2 µj/sfft International Solid-State Circuits Conference 39 of 42
Measurement Result At 0.66V: Effective throughput 36GS/s Consuming 0.4pJ/sample International Solid-State Circuits Conference 40 of 42
Comparison ISSCC 11 JSSC 08 JSSC 12 This work Technology 65 nm 90 nm 65nm 45 nm Signal Type Any Signal Any Signal Any Signal Freq. Sparse signal Size 2 10 2 8 2 7 to 2 11 3 6 x2 10 Word width 16 bits 10 bits 12 bits 12 bits Area 8.29mm 2 5.1mm 2 1.37mm 2 0.6mm 2 Effective 240MS/s 2.4GS/s 1.25 20MS/s 36 109GS/s Throughput Energy/Sample 17.2pJ 50pJ 19.5 50.6pJ 0.4 1.6pJ International Solid-State Circuits Conference 41 of 42
Comparison ISSCC 11 JSSC 08 JSSC 12 This work Technology 65 nm 90 nm 65nm 45 nm Signal Type Any Signal Any Signal Any Signal Freq. Sparse signal Size 2 10 2 8 2 7 to 2 11 3 6 x2 10 Word width 16 bits 10 bits 12 bits 12 bits Area 8.29mm 2 5.1mm 2 1.37mm 2 0.6mm 2 Effective 240MS/s 2.4GS/s 1.25 20MS/s 36 109GS/s Throughput Energy/Sample 17.2pJ 50pJ 19.5 50.6pJ 0.4 1.6pJ Sparse FFT works only for frequency sparse signals International Solid-State Circuits Conference 42 of 42
Comparison ISSCC 11 JSSC 08 JSSC 12 This work Technology 65 nm 90 nm 65nm 45 nm Signal Type Any Signal Any Signal Any Signal Freq. Sparse signal Size 2 10 2 8 2 7 to 2 11 3 6 x2 10 Word width 16 bits 10 bits 12 bits 12 bits Area 8.29mm 2 5.1mm 2 1.37mm 2 0.6mm 2 Effective 240MS/s 2.4GS/s 1.25 20MS/s 36 109GS/s Throughput Energy/Sample 17.2pJ 50pJ 19.5 50.6pJ 0.4 1.6pJ International Solid-State Circuits Conference 43 of 42
Comparison ISSCC 11 JSSC 08 JSSC 12 This work Technology 65 nm 90 nm 65nm 45 nm Signal Type Any Signal Any Signal Any Signal Freq. Sparse signal Size 2 10 2 8 2 7 to 2 11 3 6 x2 10 Word width 16 bits 10 bits 12 bits 12 bits Area 8.29mm 2 5.1mm 2 1.37mm 2 0.6mm 2 Effective 240MS/s 2.4GS/s 1.25 20MS/s 36 109GS/s Throughput Energy/Sample 17.2pJ 50pJ 19.5 50.6pJ 0.4 1.6pJ International Solid-State Circuits Conference 44 of 42
Comparison ISSCC 11 JSSC 08 JSSC 12 This work Technology 65 nm 90 nm 65nm 45 nm Signal Type Any Signal Any Signal Any Signal Freq. Sparse signal Size 2 10 2 8 2 7 to 2 11 3 6 x2 10 Word width 16 bits 10 bits 12 bits 12 bits Area 8.29mm 2 5.1mm 2 1.37mm 2 0.6mm 2 Effective 240MS/s 2.4GS/s 1.25 20MS/s 36 109GS/s Throughput Energy/Sample 17.2pJ 50pJ 19.5 50.6pJ 0.4 1.6pJ International Solid-State Circuits Conference 45 of 42
Conclusion 0.75 million point sparse FFT in 45nm CMOS IBM SOI More than 40x better energy efficiency than traditional FFTs 88x improvement in run time over the software implementation International Solid-State Circuits Conference 46 of 42
International Solid-State Circuits Conference 47 of 42