Advanced Hardware and Software Technologies for Ultra-long FFT s

Similar documents
CRUSH: Cognitive Radio Universal Software Hardware

An FPGA Spectrum Sensing Accelerator for Cognitive Radio

Implementing Efficient Split-Radix FFTs in FPGAs

CHAPTER 7 APPLICATION OF 128 POINT FFT PROCESSOR FOR MIMO OFDM SYSTEMS. Table of Contents

The Cell Processor Computing of Tomorrow or Yesterday?

OpenCL for FPGAs/HPC Case Study in 3D FFT

CUG2016. On Enhancing 3D-FFT. Performance in VASP. Florian Wende. Zuse Institute Berlin. Martijn Marsman, Thomas Steinke. London, UK, May 8-12

Floating Point Fast Fourier Transform v2.1. User manual

International Journal of Advanced Research in Electronics and Communication Engineering (IJARECE) Volume 6, Issue 6, June 2017

Design and Implementation of Pipelined Floating Point Fast Fourier Transform Processor

Implementation of High Throughput Radix-16 FFT Processor

Programming the Cell BE

A 0.75 Million Point Fourier Transform Chip for Frequency Sparse Signals

Advanced Autoclave Controller with Recording + 4 Channel Mapping + Pressure Indication with PC Software & Printer Module

Temperature + Humidity (%RH) Control & Recording System with 8 / 16 Channel Mapping. Touch Panel PPI

Zilog s ZMOTION Detection and Control Family. Featuring PIR Technology

A Novel VLSI Based Pipelined Radix-4 Single-Path Delay Commutator (R4SDC) FFT

AN10943 Decoding DTMF tones using M3 DSP library FFT function

IOT Based Intelligent Bin for Smart Cities

FIMD: Fine-grained Device-free Motion Detection

CALICE: status of a data acquisition system for the ILC calorimeters. Valeria Bartsch, on behalf of CALICE-UK Collaboration

Calcolatori, Internet e il Web

ECE 4510/5530 Microcontroller Applications Week 10

Cell Software Model. Course Code: L2T1H1-11 Cell Ecosystem Solutions Enablement. Systems and Technology Group

International Journal of Engineering Research-Online A Peer Reviewed International Journal

Reconfigurable FPGA-Based FFT Processor for Cognitive Radio Applications

Construction of Wireless Fire Alarm System Based on ZigBee Technology

Di-440 The Modular Analyser

First results and developments of the multi-cell SDD for Elettra and SESAME XAFS beam lines

Gas Detection Instrument Based on Wireless Sensor Networks

Flexibility right from the start. PAVIRO Public Address and Voice Evacuation System with Professional Sound Quality. boschsecurity.

ENACTMENT OF SCADA SYSTEM FOR JUDICIOUS DWELLING

4/8/12/16 Channel Temperature Scanner with FREE Data Logging PC Software RS485

High Speed Reconfigurable FFT Processor Using Urdhava Thriyambakam

Sensor Technology. Summer School: Advanced Microsystems Technologies for Sensor Applications

Advanced Digital Signal Processing Part 4: DFT and FFT

Towards a Detector Control System for the ATLAS Pixeldetector

1 Introduction - The rating level. 3 Subjective or objective analysis? 2 Determination of the adjustment for annoying tones

THE combination of the multiple-input multiple-output

Laboratory Exercise #6

Cold chain monitoring technologies. Facility monitoring

A Cost Effective Multi-Spectral Scanner for Natural Gas Detection

An Area Efficient 2D Fourier Transform Architecture for FPGA Implementation

A 128-Point FFT/IFFT Processor for MIMO-OFDM Transceivers a Broader Survey

Y-Plant Alert. Alarm Annunciator and Sequence of Event Recording System. BU 04M01A02-01E-A

PORTABLE ISOTOPE IDENTIFIER Search Tool / Sample Counting System

Chapter 1 Introduction

Design and Development of General Purpose Alarm Generator for Automated System

A Novel Architecture for Radix-4 Pipelined FFT Processor using Vedic Mathematics Algorithm

LSM MANUAL PRINCIPLES OF LSM

For Complete Fire and Gas Solutions

Enterprise GIS Architecture Deployment Options

RTD TEMPERATURE SENSING SYSTEM

Features of Shinko advanced package and Optical sensor developed with Fraunhofer ENAS. Yuichiro Shimizu SHINKO ELECTRIC INDUSTRIES CO., LTD.

VLSI Design EEC 116. Lecture 1. Bevan M. Baas Thursday, September 28, EEC 116, B. Baas 1

How SWIR Imaging contributes to the optimization of automatic quality control in the food industry

DeltaV Live. Introduction. Benefits. Modern, built-for-purpose operations experience

Networked Public Address and Voice Evacuation

THERE is significant research activity to minimize energy

Praetorian Fibre Optic Sensing

Thermal challenges in IC s: Hot spots passive cooling and beyond

IS-6500 TCP/IP Video Door Phone System

USER MANUAL FOR OPERATING SYSTEM

Intelligent Fire Detection and Visual Guided Evacuation System Using Arduino and GSM

Praetorian Fibre Optic Sensing

Learning from High Energy Physics Data Acquisition Systems Is this the future of Photon Science DAQ?

On-chip optical detectors in Standard CMOS SOI

i.net Engineered system solutions for

HB0267 CoreFFT v7.0 Handbook

Driver Assistance Systems and Vehicle Environment Sensing

US CMS Trigger. DOE-NSF Review Wesley H. Smith, U. Wisconsin CMS Trigger Project Manager April 9, 2003

Research on Evaluation of Fire Detection Algorithms

LHCb Rich Detectors Control and High Voltage Systems

Performance of Benchmarks on the Cell Processor

The ISP Technologies Advantage

New Seismic Unattended Small Size Module for Foot-Step Detection

Basic Input Data Needed to Develop a High Performance Fan/Blower with Low Noise, Energy Saving and High Efficiency

Expected needs in Electronics for the CERN Experiments. 5 October 2015

The Compact Muon Solenoid Experiment. Conference Report. Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland

Oracle Real Application Clusters (RAC) 12c Release 2 What s Next?

An Overview of Procedural Fire. How to model procedurally spreading fire

Experion LX Direct Station Specification

Packaging in Consumer Goods

Engineering Specification

Bench Mark 6SigmaET Thermal Simulation Software

Smart Sensor in Wide Area Network Environment How can sensors improve your safety? Tadeusz Pietraszek

On a GPU Acceleration of the Stochastic Grid Bundling Method

Research on the Monitor and Control System of Granary Temperature and Humidity Based on ARM

Wireless Leak Detector and Inhibitor System. Group 6: Petar Arnaut Olivier Thomas J Chris Fontaine Barry Zou

2 Overview of the Pixel Detector Control System

Call stations and extensions

Before you install ProSeries Express Edition software for network use

CHAM Case Study Numerical Simulation of a Air-to-air Cross-flow Heat Exchanger

System Description AutroMaster V Presentation System

Application Note 35. Fast Fourier Transforms Using the FFT Instruction. Introduction

The SLAC Detector R&D Program

HEUFT SPECTRUM. Modular HEUFT systems of the new generation. Highly automated, self-explanatory and SIMPLY EASY!

Smart Water Networks for Efficiency Gains

CXI Detector Status, Cost & Schedule

THERMAL VACUUM TESTS OF A FIVE WATT PSEUDO CHIP SEM-X ELECTRONIC MODULE CLAMPED IN A SATELLITE BOX

Transcription:

Advanced Hardware and Software Technologies for Ultra-long FFT s Hahn Kim, Jeremy Kepner, M. Michael Vai, Crystal Kahn HPEC 2005 21 September 2005 HPEC 2005-1 This work is sponsored by the Department of the Air Force under contract FA8721-05-C-0002. Opinions, interpretations, conclusions and recommendations are those of the authors and are not necessarily endorsed by the United States Government

Outline Background FPGA-Based Hardware Technology Parallel Software Technology Conclusions HPEC 2005-2

Introduction Real-time Embedded ISR 1 Gigapoint complex FFT 150M ops ~1M words Radar ELINT SIGINT 1 Kilopoint complex FFT 50K ops ~1K words Non Real-Time Data Analysis HSI Monte Carlo IMINT HPEC 2005-3

Motivation for Real-time, Ultra-long FFT s Wide-Band Digital Receiver RF Receiver ADC 1 G-pt FFT Signal Processing Application Channelized Narrow-Band Digital Receiver ADC 100 M-pt FFT RF Receiver Analog Channelizer...... Signal Processing Application ADC 100 M-pt FFT Can use FFT to detect weak signals Reduce noise floor Longer intervals result in higher gain Real-time, ultra-long FFT processor is a critical enabling component HPEC 2005-4

FFT Technology Space 10 GSPS 04 RFEL HyperSpeed IP (FPGA) 32 K-pt @ 3.2 GSPS Sustained Data Rate 1 GSPS 100 MSPS 10 MSPS 04: LL SYCO (FPGA) 8 K-pt @ 1.2 GSPS 99: LL Discoverer II (ASIC) 128 pt @ 0.5 GSPS 03: Xilinx IP (FPGA) 16 K-pt @ 0.2 GSPS OFDM Communication Standards Req. 02: Transtech DSP Quixilica IP (FPGA) 8 K-pt @ 0.16 GSPS 03: DSP Architectures DSP-24 (ASIC) 64 M-pt 1 K-pt @ 0.06 GSPS @ 0.064 GSPS 01: CRI Pathfinder II (ASIC) 1 M-pt @ 0.04 GSPS 03: Analog Devices ADSP-21262 (DSP) 1 K-pt @ 0.02 GSPS 04 RFEL HyperLength IP (FPGA) 256 M-pt @ 0.01 GSPS State-of-The-Art FFT Technology 1G-pt @ 1.2 GSPS 128 M-pt @ 0.128 GSPS 64 G-pt @ 0.04 GSPS 04 LL pmatlab (128 Proc. w/ GigE) 4 G-pt @ 0.01 GSPS HPEC 2005-5 100 10K 1M 100M 10G 1T 100T FFT Length (Points) Objective: Extend state-of-the-art to ultra-long FFT s

Outline Background FPGA-Based Hardware Technology Parallel Software Technology Conclusions HPEC 2005-6

80G Ultra-long FFT Implementation Challenges ory Requirement (Bytes) 8G 0.8G 80M 8M 0.8M 80K 8K 8 K-pt FPGA Embedded ory Only 32 K-pt 100 M-pt Off-Chip ory Required 1 G-pt 10K 100K 1M 10M 100M 1G 10G FFT Length (Points) Beyond ~32 K-pt FFT, off-chip memory for FIFO and twiddle factors is required Full duplex memory access is a challenge Lincoln architecture selected for ultra-long FFT A Systolic FFT Architecture for Real Time FPGA Systems, HPEC 2004. HPEC 2005-7

Real-Time 1G-Point FFT Architecture MN-pt FFT can be implemented with an M-pt FFT and an N-pt FFT E.g. 1 G-pt FFT M = 32 K-pt, N = 32 K-pt Each 32 K-pt FFT fits into an FPGA Corner Turn 32K FFT Corner Turn Twiddle Factors 32K FFT Corner Turn 9 8 7 6 5 4 3 2 1 0 Corner Turn Example (for an 8-Point FFT):. 9 8 3 2 1 0 7 6 5 4 9.. 8 7 3 6 2 5 1 4 0 Corner Turn (Matrix Transpose).. 8.. 9........ 4 0 5 1 6 2 7 3 Lincoln has developed a corner turn architecture that operates at 1 GSPS HPEC 2005-8

Real-Time FFT Architecture 1 Gigapoint FFT Configuration ADC 1 GSPS HOLD DMUX and Switch Cache and MUX FPGA DMUX and Switch Cache and MUX FPGA DMUX and Switch Cache and MUX Real-Time Corner Turn 32K FFT Real-Time Corner Turn 32K FFT Real-Time Corner Turn Reconfigurable architecture allows for multiple implementations: e.g. 1 G-pt @ 1 GSPS or 100 M-pt @ 100 MSPS X 10 channels HPEC 2005-9

Real-time Example FFT Implementation Current FFT Capabilities Future Ultra-long FFT Capabilities HPEC 2005-10 13 6 13 Symbiotic Communications (SYCO) real-time processor 8 K-pt FFT 450 GOPS @ 130 Watts 208 FFT butterflies No on-board memory Rapid Prototyping of a Realtime Range Compression Processor, HPEC 2005 Poster Session * Initial estimate based on double-precision operation Processor enhancement Provides on-board memory banks for performing real-time corner turns Performs form-factor optimization Develop a universal FFT architecture 100M-Pt FFT X 10 channels 1G-Pt FFT

Outline Background FPGA-Based Hardware Technology Parallel Software Technology Conclusions HPEC 2005-11

Motivation for Out-of-Core Technology Data are often larger than memory on a single processor. ory Disk Can address memory limitation with: 1. Multiple CPUs 2. Using disk as memory Hughes, G.F. Wise Drives. IEEE Spectrum. Aug 2002. Out-of-core technology uses memory as a window into data stored on disk. HPEC 2005-12

MatlabMPI & pmatlab Software Layers Application Input Analysis Output Parallel Library Parallel Hardware Vector/Matrix Comp Conduit Library Layer (pmatlab) Messaging (MatlabMPI) Kernel Layer Math (Matlab) Task User Interface Hardware Interface Can build a parallel library with a few messaging primitives MatlabMPI provides this messaging capability: M PI_Send(dest,co m m,tag,x); X = MPI_Recv(source,com m,tag); HPEC 2005-13 Can build a application with a few parallel structures and functions pmatlab provides parallel arrays and functions X = ones(n,mapx); Y = zeros(n,mapy); Y(:,:) = fft(x);

Out-of-Core ory Management: pmatlab extreme Virtual ory (XVM) Virtual ory + Transparent to user (ease of use) Disk access patterns are often sub-optimal for most algorithms Out-of-Core + Provides control of disk at object level (performance) Exposes swapping mechanism to user Goal of pmatlab XVM is to add out-ofcore capability to pmatlab that is: 1. Easy to use 2. Has high performance 3. Can transparently distribute data pmatlab + Provides mechanism for transparently distributing data across multiple CPUs HPEC 2005-14

Data Organization 1. Data starts as a vector with length MN MN 2. Divide into M vectors with length N N N 3. Reorganize as a MxN matrix M 4. Distribute rows across processors P0 P1 HPEC 2005-15

Hierarchical Matrices and Maps Matrix Global map Out-of-core map HPEC 2005-16 32K P0 P1 P2 P3 32K + + P0 P1 P2 P3 Global matrix Out-of-core matrices 32K Core block (out-of-core) Core block (in-core) 8K 2K Hierarchical matrices span processors and the storage hierarchy User constructs hierarchy by specifying global and out-of-core maps

Ultra-long FFT 0 FFT twiddle corner turn FFT Ultra-long FFT 1 2 3 0 1 2 3 Software-based Implementations MATLAB pmatlab C/MPI pmatlab XVM Testbed Lincoln HPC cluster (LLGrid) 80 dual CPU nodes 2.8-3.06 GHz Pentium 4 Xeon 4 GB memory Gigabit Ethernet HPEC 2005-17

Scalability 1 Gigapoint FFT pmatlab XVM supports ultra-long FFT s with little degradation in performance Maximum problem size = size of available disk space 1 TB represents a 64 G-pt double-precision, complex FFT HPEC 2005-18

Outline Background FPGA-Based Hardware Technology Parallel Software Technology Conclusions HPEC 2005-19

System Development Methodologies System Prototyping Modify word size, precision, etc. N Bit-true simulation Satisfy requirements? Y Initial design Architecture Analysis 1 Gigapoint Performance Verification Compare HPEC 2005-20

Summary Presented FPGA architecture for real-time, ultra-long FFT s Can implement 1 G-pt FFT with smaller FFT s Use SYCO processor to implement FFT s Developed real-time corner turn architecture Future Develop a universal ultra-long FFT architecture Allows multiple configurations in same hardware 1 G-Pt FFT 100 M-Pt FFT X 10 channels Application-specific precision and dynamic analysis HPEC 2005-21

Summary Presented parallel software architecture for ultra-long FFT s Added out-of-core capability to pmatlab Supports ultra-long FFT s with little degradation in performance Demonstrated 64 G-pt FFT (1 TB) Future Expand cluster to enable 64 T-pt FFT (1 PB) HPEC 2005-22

Acknowledgements Ken Senne Gerry Banner Leslie Alger Bob Bond Cy Chan Hector Chan Tim Currie Preston Jackson Andy McCabe Peter Michaleas Michael Moore Charles Rader Albert Reuther Jonathan Scalera Nadya Travinin Edmund Wong HPEC 2005-23