The Cell Processor Computing of Tomorrow or Yesterday?

Similar documents
Programming the Cell BE

Cell Software Model. Course Code: L2T1H1-11 Cell Ecosystem Solutions Enablement. Systems and Technology Group

Performance of Benchmarks on the Cell Processor

Advanced Hardware and Software Technologies for Ultra-long FFT s

A 0.75 Million Point Fourier Transform Chip for Frequency Sparse Signals

Floating Point Fast Fourier Transform v2.1. User manual

Parallel computing with Python Delft University of Technology

AND DYNAMIC CIRCUITS ARE USED IN CRITICAL AREAS. TIGHT COUPLING OF THE INSTRUCTION SET ARCHITECTURE, MICROARCHITECTURE, AND PHYSICAL

System Description AutroMaster V Presentation System

Implementing Efficient Split-Radix FFTs in FPGAs

On-chip optical detectors in Standard CMOS SOI

CRUSH: Cognitive Radio Universal Software Hardware

GERDA Slow Control Status. R. Brugnera, A. Garfagnini and D. Zinato. June 9, 2008

Advanced Autoclave Controller with Recording + 4 Channel Mapping + Pressure Indication with PC Software & Printer Module

CUG2016. On Enhancing 3D-FFT. Performance in VASP. Florian Wende. Zuse Institute Berlin. Martijn Marsman, Thomas Steinke. London, UK, May 8-12

Construction of Wireless Fire Alarm System Based on ZigBee Technology

Microcontroller based design for Tele-sensing and extinguishing fire

An FPGA Spectrum Sensing Accelerator for Cognitive Radio

Design and Implementation of Pipelined Floating Point Fast Fourier Transform Processor

Calcolatori, Internet e il Web

Threat Warning System

Maintenance Manual ALARM SHELF 19D902821P1 & P2. ericssonz LBI-38496C. TABLE OF CONTENTS (Vendor Documentation)

Implementation of High Throughput Radix-16 FFT Processor

Storage Basics. P. J. Denning For CS471/ , P. J. Denning

Temperature + Humidity (%RH) Control & Recording System with 8 / 16 Channel Mapping. Touch Panel PPI

High Speed Reconfigurable FFT Processor Using Urdhava Thriyambakam

Gas Detection Instrument Based on Wireless Sensor Networks

ENEA GRID resources and ITM-TF

AmpLight Lighting Management System

Alarm Correlation Research and Implementation Based on Similar Data Sources

Installation Guide for inbiox60 Series Access Control Panel

Secure Your Way of Life. Now Compa ble With. Climax Home Portal Platform. Enable a Connected Future

A Novel VLSI Based Pipelined Radix-4 Single-Path Delay Commutator (R4SDC) FFT

Features of Shinko advanced package and Optical sensor developed with Fraunhofer ENAS. Yuichiro Shimizu SHINKO ELECTRIC INDUSTRIES CO., LTD.

Thermal challenges in IC s: Hot spots passive cooling and beyond

OpenCL for FPGAs/HPC Case Study in 3D FFT

XBee Based Industrial And Home Security System

4/8/12/16 Channel Temperature Scanner with FREE Data Logging PC Software RS485

LSM MANUAL PRINCIPLES OF LSM

Dräger VVP 1000 System Components

VLSI Design EEC 116. Lecture 1. Bevan M. Baas Thursday, September 28, EEC 116, B. Baas 1

CX/CXB Self-Managed Ethernet Switches N-664 Network Appliance Data and Alarm I/O Device MX Series Multi-Port Switches

Conveyor and Batch Reflow Ovens

Hugs Product Guide Specification

Power Savings Running LINPACK Liquid Cooled

International Journal on Emerging Technologies 4(2): 42-46(2013) ISSN No. (Print) : ISSN No. (Online) :

Experion LX Direct Station Specification

Oracle Retail Merchandise Financial Planning

Power Management and Environmental Monitoring

AMD FirePro GPUs for HPE ProLiant Servers

RC802/ B 8E1 Modular Fiber-Optic Multiplexer (Rev. M) User Manual

Specifications Wavelength Range: 190 nm to 800 nm continuous

ECE 4510/5530 Microcontroller Applications Week 10

FEATURE ARTICLE. The PLCA-800 Series Inline Particle Sensor for Chemical Solutions. Yoshihito Yuhara. Abstract

Conduction Cooling Techniques for Rugged Computers

Oracle Real Application Clusters (RAC) 12c Release 2 What s Next?

UD-VMS510i. Surveillance Management Center

Break-in Detector and Alarm System

BlackBerry AtHoc Networked Crisis Communication Siemens Indoor Fire Panel Installation and Configuration Guide Release Version 7.

ARES-G2 Rheometer. Site Preparation Guide. Revision A Issued June 2016 Page 1

LXI Consortium 2010 Report to the Test Industry. Von Campbell, LXI Consortium President

Learning from High Energy Physics Data Acquisition Systems Is this the future of Photon Science DAQ?

NETx Voyager Visualization

Using the HT46R48 in a Gas Water Heater Application

KEY FEATURES EASY ACCESS. Through web browser for both clients and administrators CLIENT MONITORING. Not dependent on access to site network

Universal modular MWD platform adapted for your needs

RC801/803/ B 16E1 Fiber-Optic Multiplexer (Rev. M) User Manual. Raisecom Technology Co., Ltd. (04/2005)

FastReporter 2 DATA POST-PROCESSING SOFTWARE

Active Scanning Beam 2: Controlling Delivery

CHAPTER 7 APPLICATION OF 128 POINT FFT PROCESSOR FOR MIMO OFDM SYSTEMS. Table of Contents

(HPC) Kamy Sepehrnoori ;Baris Guler;Tau Leng ;Victor Mashayekhi ;Reza Rooholamini I/O (OS) (MPI)

Innovative protection that combines power and flexibility

Multi-sensor System For Indoor Environment Monitoring Zheng-yu Wanga, and Xiao-ru Zhangb

ARCHITECTURAL AND ENGINEERING SPECIFICATION

Enterprise GIS Architecture Deployment Options

Basic Input Data Needed to Develop a High Performance Fan/Blower with Low Noise, Energy Saving and High Efficiency

Modular Fire Panel 5000 Series Just as flexible as your plans

Title Page. Report Title: Downhole Power Generation and Wireless Communications. for Intelligent Completions Applications

CALICE: status of a data acquisition system for the ILC calorimeters. Valeria Bartsch, on behalf of CALICE-UK Collaboration

Smart Personal Security Alert Walking Stick For Visually Challenged People

ENACTMENT OF SCADA SYSTEM FOR JUDICIOUS DWELLING

DSE ELECTRONIC SIRENS Advantages & Functions

Virtual Boundary System Using Passive Infrared Sensor

Datasheet Video Based Fire & Smoke Detection

Process Safety - Market Requirements. V.P.Raman Mott MacDonald Pvt. Ltd.

DATA SHEET CHEETAH XI 50 INTELLIGENT SUPPRESSION CONTROL SYSTEM DESCRIPTION APPROVALS: SYSTEM OPERATION. Form No. D

2015 Honeywell Users Group Europe, Middle East and Africa

A Cost Effective Embedded Design for Security and Automation in Industries

FastReporter 2 DATA POST-PROCESSING SOFTWARE

Rosenberger Intergrated Operation and Maintenance Center for all repeaters(2g/3g/lte) following CMCC and TMN specification.

Moving to the Cloud: The Potential of Hosted Central Station Services

SMT DESOLDERING DX LONER ATMOSCOPE Self-Contained SMT Deluxe Hot Air Station, Page 22

Regional Training. Seminar. » EasyPower Hands-On» Protective Device Coordination» Arc Flash Hazard Analysis. March 12-16, 2018 Austin, TX

SmartStruxure Lite Small Building Systems

Advanced Test Equipment Rentals ATEC (2832)

General Specifications

THE ALICE EXPERIMENT CONTROL SYSTEM

Recording Server PRELOADED

An Overview of Procedural Fire. How to model procedurally spreading fire

Based on years of Industry experience it s ready to run right out of the box, simply assign the IP address and connect to the embedded web server.

Transcription:

Computing of Tomorrow or Yesterday? Torsten Höfler Department of Computer Science TU Chemnitz 28th February 2006

Outline 1 The Vision 2 3 4 Cell Applications Conclusion

The Researchers started in mid 2000 by Sony Toshiba and IBM Sony has PS2 arch - needs chip for PS3 Toshiba has memory experience - needs Chips for HDTV IBM has technical knowledge in processor manufacturing billions of dollars have been invested high throughput multi purpose processor

Known Problems to Overcome memory latency/bandwidth gap instruction latency gap (control logic) old fashioned x86 arch hardware architecture other than ISA implicit mem hierarchy (caching) longest path (clock speed) what to do with millions of transistors (Moores Law)

Where are all these Transistors? Superscalar Architecture/Tomasulo Speculative/OOO Processing Hyper/Deep Pipelining Virtual Memory/Caches... and did it pay off? partially because: High Throughput, Clock Speed Pipelining Hide Pipelining Effects Speculative Execution... doubling # Transistors increases Performance by 1.4 (P. Gelsinger, Intel)

So what? Loosing 30% every 18 Months? no! - introduce explicit parallelism Multi Core Vector Processing SIMD (cmp. Altivec, MMX, SSE... ) more Registers explicit SRAM GPU parallelism...

From x86 to the future...? RISC design in order CPU (no OOO) SoC design (different cores) heterogeneous Multiprocessing

Outline The Vision 1 The Vision 2 3 4 Cell Applications Conclusion

1st Patent - Ken Kutaragi (Sony,1999) cell = software or hardware cell software cell = program + data hardware cell = execution logic + local memory hardware cells process software cells no fixed architecture (network distribution) cell computer is created ad-hoc (multiple devices) floating software cells

Heterogeneous Multiprocessing - 9 Core Processor SPU - Synergistic Processing Unit MFC - Memory Flow Controller SPE - Synergistic Processing Element (=SPU+MFC) PPE - PowerPC Processing Element 1PPE + 8 SPEs = Cell Processor 9 full blown CPUs

connected via EIB - Element Interconnection Bus FlexIO - IO/Processor Interconnect (BIC) MIC - dual port XDR Memory Interface hardware DRM :-( interrupt controller routes only to PPE effective NoC (Network on Chip) single precision 256 GFlops peak (FMADD)

source & copyright: IBM

First Prototype 90nm SOI, 8 copper layers 241 Mill. Transistors, 235mm 2 (Rev. DD2) 60-80 W (prototype) only 6-7 SPEs enabled (manufacturing errors) IBMs virtualization technology 1.1V, >4GHz

Outline The Vision 1 The Vision 2 3 4 Cell Applications Conclusion

FlexIO SPE SPE SPE SPE PPE EIB MIC SPE SPE SPE SPE XDR XDR

dual-threaded (SMT) 64 bit Power Architecture includes VMX (aka Altivec) ISA simple architecture, only in-order execution super-scalar with deep 2-way pipeline (>20) 2 instructions issued per cycle 32kB+32kB L1, 512 kb L2 Cache supports virtualization logical partitioning (memory, I/O, time) simplified Power Architecture

Outline The Vision 1 The Vision 2 3 4 Cell Applications Conclusion

FlexIO SPE SPE SPE SPE PPE EIB MIC SPE SPE SPE SPE XDR XDR

fully blown vector CPUs with own RAM ISA: not VMX compatible! ISA: 32 bit fixed length 21 Mill transistors (14 Mill SRAM, 7 Mill logic)

no branch prediction or scheduling logic (software) two independent short and simple pipes can issue two instructions in parallel one memory and one SIMD computation strictly in order instructions work with 128 bit combound data 4 SP FP per cycle (not fully IEEE754 compliant - 32GFlops (FMADD)) slow DP arithmetic (fully IEEE754-3-4GFlops) 4 INT per cycle (16 GOps)

256kB Local Storage Instruction Fetch Data Path 2-way issue Even -4xSP/2xDP FP -4xINT -128bit logical -Byte ops Odd -Load/Store -Branch hints 128 -Branch REGs

256kB local storage (LS) memory accessable in 128bit lines 128 128bit registers (2 cycles latency) registers are layered (hold all data types) no virtual memory, no coherency no processing in main memory DMA to move data between LS and main memory MFC connects to EIB, acts like MMU + synch

Programming typically user level threads at SPE IDL supports RPC programming assembler or C/C++ Programming C/C++ intrinsics + vector datatypes no access to system control (page table) LS unprotected and untranslated Load/Store Architecture MFC DMA + SPU support overlapping

The Vision Opteron: 3GHz, 2 FP/cycle, 114mm 2 - SPE: 4GHz, 4 SP FP/cycle, 15mm2

Outline The Vision 1 The Vision 2 3 4 Cell Applications Conclusion

FlexIO SPE SPE SPE SPE PPE EIB MIC SPE SPE SPE SPE XDR XDR

four 128 bit wide concentric rings optimized for 1024 bit blocks 96 byte/cycle buffered point-to-point ring (cmp. SCI) SPE buffers and routes scalable (more SPEs increase latency) guaranteed bandwidth of 1/# devices Real Time capable

Outline The Vision 1 The Vision 2 3 4 Cell Applications Conclusion

FlexIO SPE SPE SPE SPE PPE EIB MIC SPE SPE SPE SPE XDR XDR

12 uni-directional byte-wide lanes 96 pairs in the whole 6.4 GB/s per lane (76.8 GB/s in the whole) 7 lanes (44.8 GB/s) out, 5 lanes (32 GB/s) in coherent (cc-numa) and non coherent links also used as processor interconnect (cmp. HT) connect two processors glueless needs switch for more

Outline The Vision 1 The Vision 2 3 4 Cell Applications Conclusion

FlexIO SPE SPE SPE SPE PPE EIB MIC SPE SPE SPE SPE XDR XDR

two channel Rambus XDR Memory - 25.2 GB/s ECC protected (why?) PPE/SPE have conventional protection system to access main memory (cmp. MMU) PPE/SPE use virtual addresses no cache moved to software

no abstraction layer like x86 difficult programming and optimization same problems as for Multi-Core or SMPs direct programming of SPEs, 256kB storage + 128 Regs no need for assembly - but it is available :) programmable in C/C++ SPEs are allocated in software SPE virtualization by OS? (more SPE tasks than SPEs) PPE code must only be PPC970 Linux already running SPE code must be self-contained autovectorizing Compiler from IBM: Octopiler

Programming Models: Job Queue Self Multitasking SPE Stream Processing Software Managed Cache MPI?

Cell Applications Conclusion Outline 1 The Vision 2 3 4 Cell Applications Conclusion

Cell Applications Conclusion Gaming... PS3 only 3.2 GHz only 7 SPEs (?) preinstalled Linux (?) Server... Mercury Cell blades 2 Cell Processors 8 SPEs Linux???... find it :)

Cell Applications Conclusion Outline 1 The Vision 2 3 4 Cell Applications Conclusion

Cell Applications Conclusion general purpose Power CPU with 8 vector processors (SPEs) RISC approach - moving functionality to software more complicated compiler (cmp. VLIW) very fast I/O System coupled with task distribution 14116 Cells (1434 SP) to beat BlueGene/L (131072 procs) Realtime capabilities because of independent SPE intended to be the standard of tomorrow (hopefully) integrated in PS3 and Toshiba TVs (hdtv) in 2006

Cell Applications Conclusion Resources / Additional Information: http://www.research.scea.com/research/html/cellgdc05/ http://www.blachford.info/computer/cells/cell0.html http://www.research.ibm.com/cell/ http://www.cell-industries.com/ http://www- 306.ibm.com/chips/techlib/techlib.nsf/products/Cell http://www-128.ibm.com/developerworks/power/cell/

Cell Applications Conclusion Let the Cells glow!