Transcript www.fpl.uni
SBCCI 2006 - the 19th Symposium on Integrated Circuits and System Design Ouro Preto, Minas Gerais, Aug 28 - Sept 1, 2006 SBMICRO 2006 - the 21st Symposium on Microelectronics Technology and Devices Reiner Hartenstein TU Kaiserslautern Re-definition of Low Power Design for HPC: a Paradigm Shift >> Outline << TU Kaiserslautern • Preface • The Supercomputing Crisis • The Solution ignored for decades • Fine-grained vs. coarse-grained • The wrong Road Map for CS Curricula • Conclusions http://www.uni-kl.de © 2006, [email protected] 2 http://hartenstein.de TU Kaiserslautern Pervasiveness of FPGA application More recently FPGAs as accelerators went also into every area of scientific computing Compute-intensive: my talk does not really cover performance of bulk storage, discs, etc. highlights the supercomputing paradigm trap and a fully ignored early solution illustrates why behind FPGA success there is a hidden paradigm shift What we learn for Low Power Design © 2006, [email protected] 3 http://hartenstein.de TU Kaiserslautern Up to 4 orders of magnitude For many published speed-up factors obtained from software-to-FPGA migration see Jürgen Beckers part of Monday tutorial But before FPGAs came up, DPLA* (a programmable PLA) was successful inside the MoM colmputer architecture *) designed at Kaiserslautern and fabricated via the German multi university E.I.S. project infrastructure © 2006, [email protected] 4 http://hartenstein.de 1986: Xputer Lab at Kaiserslautern: MoM I and II TU Kaiserslautern © 2006, [email protected] 5 http://hartenstein.de The Reconfigurable Computing TU Kaiserslautern paradox the effective integration density of FPGAs is behind the Gordon Moore curve by more than 4 orders of magnitude • wiring overhead • reconfigurability overhead • routing congestion • Low clock frequency • Power-hungry • Going worse for larger FPGAs © 2006, [email protected] 6 http://hartenstein.de An Example: FPGAs in Oil and Gas .... (1) [Herb Riley, R. Associates] TU Kaiserslautern „Application migration [from supercomputer] has resulted in a 17-to-1 increase in performance" For this example speed-up is not my key issue (Jürgen Becker‘s tutorial showed much higher speed-ups - going upto a factor of 6000) For this oil and gas example a side effect is much more interesting than the speed-up © 2006, [email protected] 7 http://hartenstein.de An Example: FPGAs in Oil and Gas .... (2) [Herb Riley, R. Associates] TU Kaiserslautern „Application migration [from supercomputer] has resulted in a 17-to-1 increase in performance" Saves more than $10,000 in electricity bills per year (7¢ / kWh) - .... per 64-processor 19" rack did you know … … 25% of Amsterdam‘s electric energy consumption goes into server farms ? … a quarter square-kilometer of office floor space within New York City is occupied by server farms ? © 2006, [email protected] 8 http://hartenstein.de TU Kaiserslautern Oil and Gas as a strategic issue Low power design: not only to keep the chips cool You know the amount of Google’ s electricity bill? It should be investigated, how far the migrational achievements obtained for computationally intensive applications, can also be utilized for servers Recently the US senate ordered a study on the energy consumption of servers © 2006, [email protected] 9 http://hartenstein.de FPGA use: A new direction in low power Design as a panelist at: TU Kaiserslautern http://www.patmos-conf.org Father of ISLPED Sept. 13-15, 2006, Montpellier, France 2006 International Symposium on Low Power Electronics and Design, (ISLPED), October 4-6, 2006 Rottach-Egern, Tegernsee, Germany http://www.islped.org/ © 2006, [email protected] 10 http://hartenstein.de Reconfigurability per se is not the key TU Kaiserslautern It’s the paradigm coming along with it Note: no instruction fetch at run time ! Data streams instead of instruction streams Enabling technology for data sequencers brings further performance improvements A non-reconfigurable example is the BEE project (Bob Broderson et al., UC Berkeley) © 2006, [email protected] 11 http://hartenstein.de TU Kaiserslautern Earth Simulator MDGrape-3 W/Gflops $/Gflops factor Petaflops by GRAPE 128 8000 (non-reconfigurable) 0.2 15 640 533 massive pipelining and on-chip distributed memory GRAvity PipE: special purpose computer for astrophysical N-body simulations, and, Molecular Dynamics Simulations MDGRAPE-3 (aka Protein Explorer): Petaflops-GRAPE [Univ. of Tokyo & Genomic Sciences Center at RIKEN institute] © 2006, [email protected] 12 http://hartenstein.de >> Outline << TU Kaiserslautern • Preface • The Supercomputing Crisis • The Solution ignored for decades • Fine-grained vs. coarse-grained • The wrong Road Map for CS Curricula • Conclusions http://www.uni-kl.de © 2006, [email protected] 13 http://hartenstein.de TU Kaiserslautern Explanation of the RC paradox each technology providing a factor of 10 or more improvements over an established one, can be expected to become disruptive [Andy Grove]. The analysis of the Supercomputing crisis explains why the “bad” FPGA are so disruptive © 2006, [email protected] 14 http://hartenstein.de TU Kaiserslautern Going toward “connected thinking” The heyday of reductionism has passed. [pwc.com] Impenetrable obstacles have been encountered which cannot be solved by the classical simple reductionist approach. This is the reason of the growing worldwide significance of transdisciplinary notions We need Coherence instead of fragmentation into specialists’ niche areas This is heralding a new era © 2006, [email protected] 15 http://hartenstein.de The basic model paradigm trap TU Kaiserslautern frustrates interdisciplinary education efforts fragmentation in CS even betw. subdisciplines High performance computing stalled for decades by the von Neuman paradigm trap: the wrong road map. The right roadmap kept by another trap for decades ! © 2006, [email protected] 16 stolen from Bob Colwell http://hartenstein.de TU Kaiserslautern Transdisciplinary Education? Computer Science not prepared Lacking intradisciplinary cohesion between the mind sets of: •Theoreticians (Math background) •Hardware People •Computer Architects •Embedded Syst. Designers •Software People (Application Development) for decades: the Hardware / Software chasm turns into: the Configware / Software chasm © 2006, [email protected] 17 http://hartenstein.de TU Kaiserslautern © 2006, [email protected] migration of the lemings 18 [David Padua, John Hennessy, et al.] Flag ship conference series: IEEE ISCA Jean-Loup Baer http://hartenstein.de The Dead Supercomputer Society TU Kaiserslautern •ACRI •Alliant •American Supercomputer •Ametek •Applied Dynamics •Astronautics •BBN •CDC •Convex •Cray Computer •Cray Research •Culler-Harris •Culler Scientific •Cydrome •Dana/Ardent/ Stellar/Stardent © 2006, [email protected] Research 1985 – 1995 [Gordon Bell, keynote ISCA 2000] •DAPP •Denelcor •Elexsi •ETA Systems •Evans and Sutherland •Computer •Floating Point Systems •Galaxy YH-1 •Goodyear Aerospace MPP •Gould NPL •Guiltech •ICL •Intel Scientific Computers •International Parallel Machines •Kendall Square Research •Key Computer Laboratories •MasPar 19 •Meiko •Multiflow •Myrias •Numerix •Prisma •Tera •Thinking Machines •Saxpy •Scientific Computer •Systems (SCS) •Soviet Supercomputers •Supertek •Supercomputer Systems •Suprenum •Vitesse Electronics http://hartenstein.de Monstrous Steam Engines of Computing TU Kaiserslautern Crossbar weight: 220 t, 3000 km of thick cable, 5120 Processors, 5000 pins each ES 20: TFLOPS peak or sustained? © 2006, [email protected] 20 http://hartenstein.de Illustrating the von Neumann paradigm trap the watering can model TU Kaiserslautern [Hartenstein] The instruction-stream-based approach many watering cans The data-stream-based approach has no von Neumann bottleneck von Neumann bottleneck © 2006, [email protected] 21 http://hartenstein.de TU Kaiserslautern The Memory Wall (1) Moving data to the processor: © 2006, [email protected] 22 http://hartenstein.de Data meeting the Processing Unit (PU) TU Kaiserslautern We have 2 choices routing the data by memory-cycle-hungry instruction streams placement of the execution locality by Software by Configware optimize a pipe network: place PU in data stream © 2006, [email protected] 23 http://hartenstein.de TU Kaiserslautern The Memory Wall (2) Key problem is the inefficiency and complexity of moving data, not processor performance. Most important goal is the minimization of the number of main memory cycles. Tear down this Wall ! © 2006, [email protected] Supercomputing urgently needs a fundamentally different approach toward interconnect efficiency. 24 http://hartenstein.de >> Outline << TU Kaiserslautern • Preface • The Supercomputing Crisis • The Solution ignored for decades • Fine-grained vs. coarse-grained • The wrong Road Map for CS Curricula • Conclusions http://www.uni-kl.de © 2006, [email protected] 25 http://hartenstein.de The right road map to HPC: TU Kaiserslautern there ignored for decades massively reducing memory cycles DPA DPU operation is transport-triggered | - - - x x x - - - - x x x x x x - - nor thru common memory - - - - - x x x | | | | | | | | | | | x x x where were the supercomputing people ? © 2006, [email protected] | 26 input data streams | x x x x x x - no instruction streams no message passing x x x x x x x x x x x x output data streams | x x x http://hartenstein.de TU Kaiserslautern The Systolic Array nice time/space notation - defines: ... which data item time at which time at which port x x x (pipe network) DPA* *) DataPath Array (array of DPUs) DataPath Unit has no program counter! it’s no CPU! time (H. T. Kung paradigm) | input data stream | | x x x x x x - port # - - - x x x time - - - - x x x x x x - - - - - - - x x x port # | | | | | | | | | | | x x x © 2006, [email protected] x x x x x x CS Mathematicians‘ hobby, early 80ies time 27 x x x port # output data streams | x x x http://hartenstein.de Terminology TU Kaiserslautern term CPU CPU DPU** DPU progra m counter DPU execution program triggered counter by instructioninstruction streamfetch based yes data arrival* no **) does not have a program counter © 2006, [email protected] paradigm 28 datastreambased *) “transport-triggered” http://hartenstein.de The new paradigm: how the data are traveling TU Kaiserslautern [Jack Lipovski, EUROMiCRO, better not by instruction execution Nice, 1975] An old hat: transport-triggered + instruction-driven DPU pipeline, or chaining DPU DPU vN Move Processor instruction-driven super systolic array P&R: move locality of operation, not data ! © 2006, [email protected] 29 http://hartenstein.de Mathematicians X-ing TU Kaiserslautern Systolic Synthesis Mathematicians like the beauty and elegance of Systolic Arrays. Due to a lacking intradisciplinary view, their efforts yielded poor synthesis algorithms. Reiner Hartenstein © 2006, [email protected] 30 http://hartenstein.de TU Kaiserslautern Synthesis Method? of course, algebraic ! Algebraic means linear projection, restricted to uniform arrays, only with linear pipes useful only for applications with strictly regular data dependencies: Mathematicians caught by their own paradigm trap for more than a decade rDPA: Generalization* by a transdisciplinary hardware guy: Rainer Kress discarded their algebraic synthesis methods and replaced it by simulated annealing. 1995 *) super-systolic © 2006, [email protected] 31 http://hartenstein.de TU Kaiserslautern Generating the Data Streams Who generates the data streams ? Mathematicians: it‘s not our job DPA x x x x x x | x x x | | x x x x x x - - - - x x x - - - - x x x x x x - - © 2006, [email protected] - - - - - x x x | | | | | | | | | | | x x x (it‘s not algebraic) 32 input data streams x x x output data streams | x x x http://hartenstein.de TU Kaiserslautern No machine paradigm Only one half of the machine Defined only the data path, however, without the sequencing resources Mathematicians considered that providing the enabling technology is somebody else‘s job © 2006, [email protected] 33 http://hartenstein.de Disclaimer TU Kaiserslautern But there are mathematicians who are no reductionists e. g., fully spanning the transdisciplinary cohesion from Term Rewriting Systems, over to dynamically reconfigurable system design & synthesis © 2006, [email protected] 34 http://hartenstein.de use data counters, no program counter x x x | | | x x x - - 32 ports, or n x 32 ports © 2006, [email protected] | | | | | | | | | | x x x x x x 35 | x x x ASM other example | ASM 50 & more on-chip ASM are feasible x x x x x x x x x - ASM implemented ASM by distributed ASM on-chip memory ASM x x x ASM reconfigurable (pipe network) rDPA ASM ASM TU Kaiserslautern ASM Data stream generators - - - x x x ASM - - - - x x x ASM - - - - - x x x ASM non-von-Neumann machine paradigm GAG RAM data counter ASM: AutoSequencing Memory http://hartenstein.de TU Kaiserslautern (anti-von-Neumann machine paradigm) ASM GAG ASM: AutoSequencing Memory RAM Generalization of the DMA data counter GAG & enabling technology: published 1989 [by TU-KL], Survey paper: [M. Herz et al.*: IEEE ICECS 2003, Dubrovnik] patented by TI** 1995 © 2006, [email protected] Data Counter instead of Program Counter 36 Storge Scheme optimization methodology, etc. *) IMEC & TU-KL **) -http://hartenstein.de Compilation: Software vs. Configware TU Kaiserslautern Software Engineering source program software compiler software code instruction streams © 2006, [email protected] Configware Engineering C, FORTRAN MATHLAB, … placement source „program“ & routing mapper configware compiler data scheduler configware code flowware code data streams configuration 37 http://hartenstein.de TU Kaiserslautern Educational Deficits Educational deficits have stalled Reconfigurable Computing (RC) as well as classical supercomputing Transdisciplinary fragmentation: each application domain uses its own trick boxes Too many sophisticated very clever architectures We need a fundamental model with a methodology which all application domains have in common Transdisciplinary education & basic research needed © 2006, [email protected] 38 http://hartenstein.de >> Outline << TU Kaiserslautern • Preface • The Supercomputing Crisis • The Solution ignored for decades • Fine-grained vs. coarse-grained • The wrong Road Map for CS Curricula • Conclusions http://www.uni-kl.de © 2006, [email protected] 39 http://hartenstein.de Coarse-grained vs. fine-grained TU Kaiserslautern device granularity path width eff’ve density flexibility general FPGA fine-grained ~ 1 bit very low purpose DPA coarse-grained multi bit: specialized very high rDPA coarse-grained e.g. 32 bits domainplatform fine-grained & specific mixed high FPGA embedded hdw. © 2006, [email protected] 40 http://hartenstein.de Why coarse grain TU Kaiserslautern much more area-efficient instead of rLB (~1 bit wide) much less use rDPU (e. g. 32 bits wide) reconfigurability overhead reconfigurable Data Path Unit (e. g. rALU) much more MOPS/milliWatt instead of FPGA use rDPA rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU mind set close to classical computing background © 2006, [email protected] 41 http://hartenstein.de Coarse grain is about computing, not logic TU Kaiserslautern Example: mapping onto rDPA by DPSS: based on simulated annealing SNN filter on KressArray (mainly a pipe network) rout thru only array size: 10 x 16 = 160 rDPUs no CPU reconfigurable function block, Legend: rDPU not used [Ulrich Nageldinger] e. g. 32 bits wide © 2006, [email protected] backbus connect used for routing only backbus connect 42 operator and routing port location not usedmarker http://hartenstein.de (r)DPA TU Kaiserslautern commercial rDPA example: PACT XPP - XPU128 XPP128 rDPA ALU • Full 32 or 24 Bit Design working silicon • 2 Configuration Hierarchies • Evaluation Board available, and • XDS Development Tool with Simulator © 2006, [email protected] buses not shown Ctrl CFG rDPU PAE core © PACT AG, http://pactcorp.com 43 http://hartenstein.de e. g.: array w. 56 rDPUs: running under 500 MHz TU Kaiserslautern World TV & game console & multi media center • Variable resolutions and refresh rates Games • Variable scan mode characteristics • Noise Reduction and Artifact Removal • High performance requirements • Variable file encoding formats • Variable content security formats Camera • Variable Displays • Luminance processing • Detail enhancement • Color processing SD/MMC Cards • Sharpness Enhancement • Shadow Enhancement • Differentiation • Programmable de-interlacing heuristics • Frame rate detection and conversion Radio• Motion detection & estimation & compensation Interface • Different standards (MPEG2/4, H.264) • A single device handles all modes http://pactcorp.com © 2006, [email protected] Videos Music SMeXPP rDPA LCD DISPLAY BasebandProcessor 44 Audio- Interface http://hartenstein.de >> Outline << TU Kaiserslautern • Preface • The Supercomputing Crisis • The Solution ignored for decades • Fine-grained vs. coarse-grained • The wrong Road Map for CS Curricula • Conclusions http://www.uni-kl.de © 2006, [email protected] 45 http://hartenstein.de TU Kaiserslautern Joint Task Force for Computing Curricula 2004 fully ignores Reconfigurable Computing Curricula ? FPGA & synonyma: 0 hits (Google: 10 million hits) not even here © 2006, [email protected] 46 http://hartenstein.de TU Kaiserslautern Curriculum Recommendations, v. 2005 Upon my complaints* the only change: including at end of last paragraph of the survey volume: "programmable hardware (including FPGAs, PGAs, PALs, GALs, etc.)." However, no structural changes at all v. 2005 intended to be the final version (?) torpedoing the transdisciplinary responsibility of CS curricula This is criminal ! Peter Denning … © 2006, [email protected] 47 *) no reply http://hartenstein.de with ACM and IEEE-CS: not in good hands TU Kaiserslautern works towards the development of principles and ideas for multidisciplinary modes of research and education. We need SDPS to identify intra-disciplinary communication gaps in CS to develop a roadmap for CS to assume intradisciplinary responsibility for education © 2006, [email protected] 48 http://hartenstein.de SDPS, the first transdisciplinary society TU Kaiserslautern The transdisciplinary genie is out of the bottle. There is no turning back from interdisciplinary cohesion and integrative attempts to solve the complex problems of mankind in this century. The era of individual disciplinary successes and accumulating disciplinary silos of locally functional knowledge has ended with the 20th century. © 2006, [email protected] 49 http://hartenstein.de IDPT - Call for Papers TU Kaiserslautern D DESIGN http://hartenstein.de/IDPT2007/ © 2006, [email protected] 50 http://hartenstein.de TU Kaiserslautern IDPT 2007 IDPT 2006 Speakers: 8 University Presidents ( 1 founding president) 10 Deans (1 founding dean) 1 Nobel Prize Laureate 6 Directors and many others … © 2006, [email protected] 51 http://hartenstein.de >> Outline << TU Kaiserslautern • Preface • The Supercomputing Crisis • The Solution ignored for decades • Fine-grained vs. coarse-grained • The wrong Road Map for CS Curricula • Conclusions http://www.uni-kl.de © 2006, [email protected] 52 http://hartenstein.de Conclusion TU Kaiserslautern We need a Re-definition of Low Power Design not only for microprocessors and embedded systems, but also for HPC and supercomputing: as a Paradigm Shift and a strategic issue © 2006, [email protected] 53 http://hartenstein.de TU Kaiserslautern thank you © 2006, [email protected] 54 http://hartenstein.de TU Kaiserslautern END © 2006, [email protected] 55 http://hartenstein.de TU Kaiserslautern Backup for Discussion: © 2006, [email protected] 56 http://hartenstein.de TU Kaiserslautern Here is the common model it’s not von Neumann most accumulated MIPS have been migrated here mainly just for running legacy software code configware code code etc. instructiondatastreambased CPU the tail is wagging the dog © 2006, [email protected] 57 streambased reconfigurable accelerator hardwired accelerator http://hartenstein.de Dual Paradigm Application Development TU Kaiserslautern high level language Juergen Becker’s CoDe-X, 1996 C language source Partitioner SW compiler CPU CW compiler software/configware co-compiler software code instructionstreambased CPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU reconfigurable accelerator hardwired accelerator rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU © 2006, [email protected] configware code datastreambased 58 http://hartenstein.de For Transdisciplinary CS Education TU Kaiserslautern The von-Neumann-only mind set is obsolete structural procedural procedural-only datastreambased instructionstreambased We need a curricular dual-paradigm approach © 2006, [email protected] 59 http://hartenstein.de The supercomputing paradigm trap TU Kaiserslautern this did not prevent supercomputing from following the wrong rodmap for decades, imprisoned by the von Neumann paradigm trap No technology transfer from Mathematics: caught by the algebraic paradigm trap (systolic array scene) © 2006, [email protected] 60 http://hartenstein.de TU Kaiserslautern The language and tool disaster End of April a DARPA brainstorming conference Software people do not speak VHDL Hardware people do not speak MPI Bad quality of the application development tools A poll at FCCM’98 revealed, that 86% hardware designers hate their tools © 2006, [email protected] 61 http://hartenstein.de TU Kaiserslautern Escaping the paradigm trap The underground success story of FPGAs The fastest growing segment of the semiconductor market Massive speed-up Slashing the electricity bill However, this is not supported by our education systems © 2006, [email protected] 62 http://hartenstein.de The end of Moore’s Law TU Kaiserslautern complexity and clock frequency of single-core microprocessors come to an end Multi-core microprocessor chips emerging: 32 cores on a chip from AMD by 2010. Just more CPUs on the chip is not the way to go for very high performance. This lesson we have learnt from the supercomputing community paying an extremely high price for monstrous installations by having followed the wrong road map for decades. Such fundamental bottlenecks in computer science will necessitate new breakthroughs © 2006, [email protected] 63 http://hartenstein.de Algorithms: fundamental misconception TU Kaiserslautern Instead of hitting physical limits we found, that further progress is limited by a fundamental misconception in the theory of algorithmic complexity. Not processing data is costly, but moving data. We have to rethink the basic assumptions behind computing. © 2006, [email protected] 64 http://hartenstein.de Taxonomy of Algorithm Migration (1) TU Kaiserslautern (Instruction-stream-based algorithm taxonomy: partially existing, not really systematic) Algorithms migrated to time-space domain (for RC): a taxonomy is not existing Computationally intensive applications are the best candidates for migration to FPGA A few algorithms (e. g. Turbocode or Viterbi) require a massive amount of interconnect bulk data bases might be subject of FPGA usage to avoid memory cycles for address computation Steadily coming and going data streams are best candidates © 2006, [email protected] 65 http://hartenstein.de Taxonomy of Algorithm Migration (2) TU Kaiserslautern Migration efficiency (reducing memory cycles): Servers: to be investigated - for sure is: • loop transformations: efficient, deterministic • caches: indeterministic and energy guzzlers • much less local memory needed • secondary data memory: distributed on-chip memory architectures highly promising • address computations: efficient migration © 2006, [email protected] 66 http://hartenstein.de TU Kaiserslautern © 2006, [email protected] 67 http://hartenstein.de configware solution: computing in space for demo: a tiny section of the pipe network inter-rDPU-communication: no memory cycles needed TU Kaiserslautern rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU + rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU S rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU © 2006, [email protected] 68 http://hartenstein.de TU Kaiserslautern Compare it to software solution on CPU S = R + (if C then A else B endif); R B A on a very simple CPU memory C = 1 cycles C =1 nano seconds read instruction if C then read A instruction decoding read operand* operate & register transfers if not C then read B + + S S Clock 200 read instruction instruction decoding read instruction add & store instruction decoding operate & register transfers store result total © 2006, [email protected] 69 http://hartenstein.de section of a major pipe network on rDPU hypothetical branching example to illustrate software-to-configware migration TU Kaiserslautern S = R + (if C then A else B endif); R B A C =1 + S clock 200 MHz (5 nanosec) © 2006, [email protected] C=1 simple conservative CPU example read instruction instruction decoding if C then read A read operand* operate & reg. transfers read instruction if not C then read B instruction decoding read instruction instruction decoding add & store operate & reg. transfers store result total memory nano cycles seconds 1 100 1 100 1 100 1 100 1 5 100 500 *) if no intermediate storage in register file 70 http://hartenstein.de The wrong mind set .... TU Kaiserslautern S = R + (if C then A else B endif); section of a very large pipe network: R B A C =1 „but you can‘t implement decisions!“ not knowing this solution: symptom of the hardware / software chasm + © 2006, [email protected] and the configware / software chasm 71 http://hartenstein.de Co-Compiler Enabling Technology TU Kaiserslautern is available from academia only a small team needed for commercial re-implementation on the road map to the Personal Supercomputer © 2006, [email protected] 72 http://hartenstein.de TU Kaiserslautern Flowware Languages vs. Software © 2006, [email protected] 73 http://hartenstein.de Compilation: Software vs. Configware TU Kaiserslautern Software Engineering source program Configware Engineering C, FORTRAN MATHLAB placement source „program“ & routing mapper software compiler configware compiler data scheduler software code configware code © 2006, [email protected] 74 flowware code http://hartenstein.de TU Kaiserslautern Nick Tredennick’s Paradigm Shifts explain the differences Software Engineering CPU software resources: fixed algorithm: variable 1 programming source needed Configware Engineering configware flowware © 2006, [email protected] resources: variable algorithm: variable 75 2 programming sources needed http://hartenstein.de Co-Compilation TU Kaiserslautern C, FORTRAN, MATHLAB automatic SW / CW partitioner Software / Configware software Co-Compiler compiler mapper configware compiler data scheduler software code configware code © 2006, [email protected] 76 flowware code http://hartenstein.de Co-Compiler for Hardwired Kress/Kung Machine [e. g. Brodersen] TU Kaiserslautern source automatic SW / CW partitioner Software / software Flowware compiler Co-Compiler flowware compiler data scheduler software code © 2006, [email protected] 77 flowware code http://hartenstein.de The Pervasiveness of Reconfigurable Computing (RC) FPGAs are used everywhere Nov. 2005 TU Kaiserslautern “FPGA and ….” # of hits by Google # of hits by Google 647,000 1,490,000 171,000 194,000 398,000 1,620,000 127,000 113,000 158,000 162,000 915,000 272,000 © 2006, [email protected] 78 http://hartenstein.de some published speed-up factors The RC paradox relative performance TU Kaiserslautern 109 DSP and wireless Image processing, Decoding Pattern matching, real-time face Reed-Solomon detection 2400 6000 crypto Multimedia video-rate stereo visionMAC 1000 106 1000 although the effective integration density of FPGAs is by 4 orders of magnitude 103 behind the Moore curve 400 pattern recognition 730 900 288 SPIHT wavelet-based image compression 457 Bioinformatics 1980 © 2006, [email protected] 100 52 FFT protein identification BLAST 40 Pentium 4 20 wiring overhead reconfigurability overhead routing congestion 8080 100 Viterbi Decoding Smith-Waterman pattern matching 88 molecular dynamics simulation GRAPE Astrophysics 1990 79 2000 2010 http://hartenstein.de Transdisciplinary Research and Education TU Kaiserslautern • working towards the development of principles and ideas for multidisciplinary modes of research and education. • There are challenges that cannot be overcome using methods within a single discipline [A. M. Madni, Ph. C-Y Sheu] • The transdisciplinary way of acquiring knowledge means that education, research, development, production, and training are intertwined in such a way that we obtain a better picture and a higher level of abstraction. • This allows us to overcome the shortcomings of the classical, Cartesian-mechanistic, reductionist foundations, and methods of traditional sciences and engineering. [A. Ertas, M. M. Tanik] © 2006, [email protected] 80 http://hartenstein.de how science will revolutionize the 21st century TU Kaiserslautern The heyday of reductionism has passed. This is the reason of the growing worldwide significance of transdisciplinary notions Impenetrable obstacles have been encountered which cannot be solved by the classical simple reductionist approach. This is heralding a new era © 2006, [email protected] 81 http://hartenstein.de Holistic Thinking TU Kaiserslautern work towards the development of principles and ideas for multidisciplinary modes of research and education. provides us the necessary tools and methods to well maintain intellectual control over large projects overcome the shortcomings of the classical, Cartesianmechanistic, reductionist foundations and methods Herbert A. Simon: The Sciences of the Artificial; 3rd Edition holistic thinking vs. mechanistic thinking, disciplinary vs. transdisciplinary” thinking, Nobel Laureate reductionism vs. holism © 2006, [email protected] 82 http://hartenstein.de Scientific Revolutions TU Kaiserslautern Thomas S. Kuhn: The Structure of Scientific Revolutions; University of Chicago Press, 1962 3rd edition: 1996 http://www.des.emory.edu/mfp/Kuhn.html Outline and Study Guide, prepared Aug. 2004 by Professor Frank Pajares, Emory University © 2006, [email protected] 83 http://hartenstein.de More Books TU Kaiserslautern Michio Kaku: Visions: How Science Will Revolutionize the 21st Century; ANCHOR, September 1998 Everett M. Rogers, Nancy Singer Olaguera: Diffusion of Innovations; Fifth Edition, Simon & Schuster, August 2003 © 2006, [email protected] 84 http://hartenstein.de Conclusions TU Kaiserslautern excellent results proven for computationally intensive applications highly promising for servers improvements likely for bulk data & storage applications tool and language scenario needs an urgent transdisciplinary clean-up © 2006, [email protected] 85 http://hartenstein.de