HPCS Application Analysis and Assessment Dr. Jeremy Kepner / Lincoln Dr. David Koester / MITRE This work is sponsored by the Department of Defense.
Download ReportTranscript HPCS Application Analysis and Assessment Dr. Jeremy Kepner / Lincoln Dr. David Koester / MITRE This work is sponsored by the Department of Defense.
HPCS Application Analysis and Assessment Dr. Jeremy Kepner / Lincoln Dr. David Koester / MITRE This work is sponsored by the Department of Defense under Air Force Contract F19628-00-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the author and are not necessarily endorsed by the United States Government. MITRE Slide-1 HPCS Productivity MIT Lincoln Laboratory ISI Outline • Motivation • Productivity Framework • Introduction • Workflows • Metrics • Models & Benchmarks • Schedule and Summary MITRE Slide-2 HPCS Productivity MIT Lincoln Laboratory ISI High Productivity Computing Systems -Program Overview Create a new generation of economically viable computing systems and a procurement methodology for the security/industrial community (2007 – 2010) Petascale Systems Full Scale Development 2 Vendors Advanced Design & Prototypes Concept Study Validated Procurement Evaluation Methodology Test Evaluation Framework team New Evaluation Framework team Phase 1 $20M (2002) MITRE Slide-3 HPCS Productivity Phase 2 $180M (2003-2005) MIT Lincoln Laboratory Phase 3 (2006-2010) ISI Motivation: Metrics Drive Designs “You get what you measure” Low Large FFTs (Reconnaissance) Table Toy (GUPS) (Intelligence) Adaptive Multi-Physics Weapons Design Vehicle Design Weather Top500 Linpack High Rmax High Development Time (Example) Tradeoffs Temporal Locality High Performance High Level Languages Matlab/ Python UPC/CAF C/Fortran MPI/OpenMP StreamsAdd SIMD/ DMA Low Low Current metrics favor caches and pipelines • Systems ill-suited to applications with • Low spatial locality • Low temporal locality • High Language Expressiveness Spatial Locality Execution Time (Example) Low Language Performance High No metrics widely used • Least common denominator standards • Difficult to use • Difficult to optimize HPCS needs a validated assessment methodology that values the “right” vendor innovations MIT Lincoln Laboratory ISI •MITRE Allow tradeoffs between Execution and Development Time Slide-4 HPCS Productivity Assembly/ VHDL Phase 1: Productivity Framework Activity & Purpose Benchmarks Productivity (Ratio of Utility/Cost) Work Flows Productivity Metrics Development Time (cost) MITRE Slide-5 HPCS Productivity Actual System or Model Common Modeling Interface Execution Time (cost) MIT Lincoln Laboratory System Parameters (Examples) BW bytes/flop (Balance) Memory latency Memory size …….. Processor flop/cycle Processor integer op/cycle Bisection BW ……… Size (ft3) Power/rack Facility operation ………. Code size Restart time (Reliability) Code Optimization time ……… ISI Phase 2: Implementation (Mitre, ISI, LBL, Lincoln, HPCMO, LANL & Mission Partners) (Lincoln, OSU, CodeSourcery) Performance Analysis (ISI, LLNL & UCSD) Activity & Purpose Benchmarks Productivity (Ratio of Utility/Cost) Work Flows Metrics Analysis of Current and New Codes (Lincoln, UMD & Mission Partners) Productivity Metrics Actual System or Model Development Time (cost) Common Modeling Interface Execution Time (cost) University Experiments (MIT, UCSB, UCSD, UMD, USC) System Parameters (Examples) BW bytes/flop (Balance) Memory latency Memory size …….. Processor flop/cycle Processor integer op/cycle Bisection BW ……… Size (ft3) Power/rack Facility operation ………. Code size Restart time (Reliability) Code Optimization time ……… (ISI, LLNL& UCSD) (ANL & Pmodels Group) MITRE Slide-6 HPCS Productivity MIT Lincoln Laboratory ISI Outline • Introduction • Lone Researcher • Enterprise • Production • Workflows • Metrics • Models & Benchmarks • Schedule and Summary MITRE Slide-7 HPCS Productivity MIT Lincoln Laboratory ISI HPCS Mission Work Flows Overall Cycle Theory Days to hours Researcher Development Cycle Researcher Hours to minutes Code Prototyping Test Design Development Experiment Visualize Execution Design Port Legacy Software Enterprise Port Legacy Software Enterprise Months to days Months to days Code Optimize Development Prototyping Design Test Scale Simulation Orient Initial Product Development Hours to Minutes Production (Response Time) Decide Act Design Production Years to months Initial Development Observe Code Evaluation Test Maintenance Port, Scale, Optimize Operation HPCS Productivity Factors: Performance, Programmability, MITare Lincoln ISI flow MITRE Portability, and Robustness very Laboratory closely coupled with each work Slide-8 HPCS Productivity Lone Researcher • Missions (development): Cryptanalysis, Signal Processing, Weather, Electromagnetics • Process Overview – – – – – – • Environment overview – – – – • Goal: solve a compute intensive domain problem: crack a code, incorporate new physics, refine a simulation, detect a target Starting point: inherited software framework (~3,000 lines) Modify framework to incorporate new data (~10% of code base) Make algorithmic changes (~10% of code base); Test on data; Iterate Progressively increase problem size until success Deliver: code, test data, algorithm specification Duration: months Team size: 1 Machines: workstations (some clusters), HPC decreasing Languages: FORTRAN, C Matlab, Python Libraries: math (external) and domain (internal) Theory Lone Researcher Software productivity challenges – – Focus on rapid iteration cycle Frameworks/libraries often serial Experiment MITRE Slide-9 HPCS Productivity MIT Lincoln Laboratory ISI Domain Researcher (special case) • Scientific Research: DoD HPCMP Challenge Problems, NNSA/ASCI Milestone Simulations • Process Overview – – – – • Environment overview – – – – • • Goal: Use HPC to perform Domain Research Starting point: Running code, possibly from an Independent Software Vendor (ISV) NO modifications to codes Repeatedly run the application with user defined optimization Duration: months Team size: 1-5 Machines: workstations (some clusters), HPC Languages: FORTRAN, C Libraries: math (external) and domain (internal) Software productivity challenges — None! Productivity challenges – – – Robustness (reliability) Performance Resource center operability Visualize Domain Researcher Simulation MITRE Slide-10 HPCS Productivity MIT Lincoln Laboratory ISI Enterprise Design • Missions (development): Weapons Simulation, Image Processing • Process Overview – – – – – – • Environment overview – – – – • Goal: develop or enhance a system for solving a compute intensive domain problem: incorporate new physics, process a new surveillance sensor Starting point: software framework (~100,000 lines) or module (~10,000 lines) Define sub-scale problem for initial testing and development Make algorithmic changes (~10% of code base); Test on data; Iterate Progressively increase problem size until success Deliver: code, test data, algorithm specification, iterate with user Duration: ~1 year Team size: 2-20 Machines: workstations, clusters, hpc Languages: FORTRAN, C, C++, Matlab, Python, IDL Libraries: open math and communication libraries Software productivity challenges – Visualize Port Legacy Software Enterprise Design Legacy portability essential Avoid machine specific optimizations (SIMD, DMA, …) – Later must convert high level language code Simulation MITRE Slide-11 HPCS Productivity MIT Lincoln Laboratory ISI Design Production • Missions (production): Cryptanalysis, Sensor Processing, Weather • Process Overview – – – – – – • Environment overview – – – • Goal: develop a system for fielded deployment on an HPC system Starting point: algorithm specification, test code, test data, development software framework Rewrite test code into development framework; Test on data; Iterate Port to HPC; Scale; Optimize (incorporate machine specific features) Progressively increase problem size until success Deliver: system Duration: ~1 year Team size: 2-20 Machines: workstations and HPC target Languages: FORTRAN, C, C++ Observe Initial Product Development Software productivity challenges – – – – Conversion of higher level languages Parallelization of serial library functions Parallelization of algorithm Sizing of HPC target machine MITRE Slide-12 HPCS Productivity Orient Production Act MIT Lincoln Laboratory Decide ISI HPC Workflow SW Technologies • • • Production Workflow Many technologies targeting specific pieces of workflow Need to quantify workflows (stages and % time spent) Need to measure technology impact on stages Supercomputer Workstation Algorithm Development Spec Design, Code, Test Operating Systems Compilers Linux Matlab Java Libraries Problem Solving Environments C++ RT Linux OpenMP Globus MITRE F90 UPC Coarray CCA TotalView ESMF Mainstream Software Slide-13 HPCS Productivity Run ATLAS, BLAS, VSIPL MPI DRI FFTW, PETE, PAPI ||VSIPL++ CORBA UML Tools Port, Scale, Optimize MIT Lincoln Laboratory POOMA PVL HPC Software ISI Example: Coding vs. Testing Workflow Breakdown (NASA SEL) Analysis and Design Coding and Auditing Checkout and Test_______ Testing Techniques (UMD) Code Reading Reading by Stepwise Abstraction NTDS 30 20 50 Functional Testing _______________________________________________________ Boundary Value Equivalence Partition Testing Gemini 36 17 47 _______________________________________________________ Structural Testing Saturn V 32 24 44 Achieving 100% statement coverage Sage 39% 14% 47% . _______________________________________________________ OS/360 33 17 50 _______________________________________________________ TRW Survey 46 20 34 What is HPC testing process? Environment Prototype (Matlab) Serial (C/Fortran) Parallel (OpenMP) MITRE Slide-14 HPCS Productivity Small (Workstation) X X X Boehm/TRW Problem Size Medium (Cluster) Full (HPC) X X MIT Lincoln Laboratory New Result? New Bug? ISI Outline • Introduction • Workflows • Metrics • Models & Benchmarks • Existing Metrics • Dev. Time Experiments • Novel Metrics • Schedule and Summary MITRE Slide-15 HPCS Productivity MIT Lincoln Laboratory ISI Example Existing Code Analysis MG Performance Analysis of existing codes used to test metrics and identify important trends in productivity and performance NAS MG Linecounts 1200 comm/sync/dir 1000 declarations computation 800 Cray Inc. Proprietary ĞNotFor Public Disclosure 600 400 200 0 MPI Java Cray Inc. Proprietary MITRE Slide-16 HPCS Productivity HPF OpenMP ĞNotFor Public Disclosure Serial A-ZPL MIT Lincoln Laboratory ISI NPB Implementations Benchmark Languages Serial Fortran Serial C Fortran / MPI C / MPI Fortan / OpenMP C/ OpenMP HPF Java BT CG EP FT IS LU MG SP MITRE Slide-17 HPCS Productivity MIT Lincoln Laboratory ISI Source Lines of Code (SLOC) for the NAS Parallel Benchmarks (NPB) 3,000 2,500 Serial Implementation (Fortran / C) SLOC 2,000 1,500 1,000 500 BT CG EP FT IS LU MG SP Benchm ark MITRE Slide-18 HPCS Productivity MIT Lincoln Laboratory ISI Normalized SLOC for All Implementations of the NPB SLOC (Normalized w.r.t. Serial) 3.00 2.50 Serial 2.00 MPI 1.50 OpenMP HPF 1.00 Java 0.50 BT CG EP FT IS LU MG SP Benchmark MITRE Slide-19 HPCS Productivity MIT Lincoln Laboratory ISI NAS FT Performance vs. SLOCs 3500 Fortran / MPI, 16 Processors Performance (Mops) 3000 2500 2000 1500 Java, 16 Processors 1000 500 Serial Fortran, 1 Processor 0 0 200 400 600 800 1000 1200 1400 1600 Development Effort (SLOC) MITRE Slide-20 HPCS Productivity MIT Lincoln Laboratory ISI Example Experiment Results (N=1) Performance (Speedup x Efficiency) 1000 Matlab C C++ 3Research 2 Current Practice PVL BLAS pMatlab /MPI 100 Distributed Memory 1 BLAS /MPI 4 MatlabMPI 10 6 7 1 Shared Memory BLAS/ OpenMP • Same application (image filtering) • Same programmer • Different langs/libs •Matlab *Estimate •BLAS •BLAS/OpenMP •BLAS/MPI* •PVL/BLAS/MPI* •MatlabMPI •pMatlab* 5 BLAS Single Processor Matlab 0 0 200 400 600 800 1000 Development Time (Lines of Code) Controlled experiments can potentially measure the impact of different MIT Lincoln Laboratory technologies and quantify development time and execution time ISI tradeoffs MITRE Slide-21 HPCS Productivity Novel Metrics • HPC Software Development often involves changing code (∆x) to change performance (∆y) – 1st order size metrics measures scale of change E(∆x) – 2nd order metrics would measure nature of change E(∆x2) • Example: 2 Point Correlation Function – Looks at “distance” between code changes – Determines if changes are localized (good) or distributed (bad) localized Correlation of changes • Other Zany Metrics distributed random Code distance – See Cray talk MITRE Slide-22 HPCS Productivity MIT Lincoln Laboratory ISI Outline • Introduction • Workflows • Metrics • Models & Benchmarks • Prototype Models • A&P Benchmarks • Schedule and Summary MITRE Slide-23 HPCS Productivity MIT Lincoln Laboratory ISI Prototype Productivity Models Efficiency and Power (Kennedy, Koelbel, Schreiber) Special Model with Work Estimator (Sterling) w SP E A cf n cm coT T(PL ) I(PL ) rE(PL ) I (PL ) E(PL ) I(P 0) rE(P 0) I (P 0) E( P 0) I(P 0) / L rE(P 0) /L Utility (Snir) U(T(S, A,Cost)) P(S,A,U(.)) min cos t Cost Productivity Factor Based (Kepner) productivity GUPS ... Linpack productivity mission factor factor Availability productivity Language Parallel Portability factor Level Model Maintenance CoCoMo II (software engineering community) Effort Multipliers x A x Size Least Action (Numrich) Time-To-Solution (Kogge) year Surveillance Programming Time useful ops GUPS ... second Linpack Hardware Cost Scale Factors S= º [ wdev + wcomp ] dt; S=0 Intelligence month Weathe r (res earch) Weapons De sign program m ing bounde d m iss ions Crypt analysis week day HPCS Goal Weathe r (ope rational) hour execution bounde d m iss ions hour day week month year Execution Time HPCS has triggered ground breaking activity in understanding HPC productivity -Community focused on quantifiable productivity (potential for broad impact) MIT foundation Lincoln Laboratory -Numerous proposals provide a strong for Phase 2 MITRE Slide-24 HPCS Productivity ISI Code Size and Reuse Cost Lines of code Function Points Reuse Re-engineering Maintenance Lines per function C, Fortran Fortran77 C++ Java Matlab Python Spreadsheet • + Reused + point ~100 ~100 ~30 ~30 ~10 ~10 ~5 Maintained HPC Challenge Areas Function Points High productivity languages not available on HPC Reuse Nonlinear reuse effects. Performance requirements dictate “white box” reuse model White Software Reuse Cost Box Non-HPC world reduces code size by Higher level languages Reuse HPC performance requirements currently Lincoln Laboratory limit the exploitation of theseMIT approaches MITRE Slide-25 HPCS Productivity + Re-engineered Measured in lines of code or functions points (converted to lines of code) 1.0 Code size is the most important software productivity parameter – – • New Measured (Selby 1988) 0.8 Relative Cost • Code = Size 0.6 0.4 Linear 0.2 0.0 Black Box 0 0.2 0.4 0.6 ISI 0.8 Fraction modified 1 Activity & Purpose Benchmarks Activity & Purpose Benchmark Legend Spec& Test Environment Parallel Source Code Source Code Parallel Specification Executable Specification Written Specification Data Requirements Activity Level 2 Standard Interface Activity Level 1 Run Standard Interface Data Generation and Validation Infrastructure Algorithm Development Spec Design, Code, Test Port, Scale, Optimize Output Standard Interface Purpose Accuracy Data Points Run Development Workflow Activity Benchmarks define a set of instructions (i.e., source code) to be executed Purpose Benchmarks define requirements, inputs and output MIT Lincoln Laboratory ISI MITRE Slide-26 Together they address the entire development workflow HPCS Productivity HPCS Phase 1 Example Kernels and Applications Mission Area Kernels Stockpile Stewardship Random Memory Access Unstructured Grids UMT2000 ASCI Purple Benchmarks SAGE3D ASCI Purple Benchmarks Unstructured Finite Element Model Adaptive Mesh Refinement ALEGRA Sandia National Labs Finite Difference Model NLOM DoD HPCMP TI-03 Finite Difference Model Adaptive Mesh Refinement CTH DoD HPCMP TI-03 Eulerian Hydrocode Adaptive Mesh Operational Weather and Ocean Forecasting Army Future Combat Weapons Systems Crashworthiness Simulations Multiphysics Nonlinear Finite Element Lower / Upper Triangular Matrix Decomposition Conjugate Gradient Solver QR Decomposition Other Kernels Whole Genome Analysis LS-DYNA Available to Vendors LINPACK Available on Web DoD HPCMP TI-03 Paper & Pencil for Kernels Paper & Pencil for Kernels Paper & Pencil for Kernels Table Toy (GUP/s) Multiple Precision Mathematics Dynamic Programming Matrix Transpose [Binary manipulation] Integer Sort [With large multiword key] Binary Equation Solution Paper & Pencil for Kernels Various Convolutions Various Coordinate Transforms Various Block Data Transfers MITRE Bio-Application Quantum and Molecular Mechanics 1D FFT 2D FFT Graph Extraction (Breadth First) Search Sort a large set Construct a relationship graph based on proximity Slide-27 HPCS Productivity Application Source Paper & Pencil for Kernels Paper & Pencil for Kernels Paper & Pencil for Kernels Paper & Pencil for Kernels Paper & Pencil for Kernels Paper & Pencil for Kernels Paper & Pencil for Kernels Systems Biology Kernels Application Source Macromolecular Dynamics Energy Minimization MonteCarlo Simulation CHARMM http://yuri.harvard.edu/ NeedlemanWunsch BLAST FASTA HMMR http://www.med.nyu.edu/ rcr/rcr/course/sim-sw.html http://www.ncbi.nlm.nih.gov/BLAST/ http://www.ebi.ac.uk/fasta33/ http://hmmer.wustl.edu/ Sequence Comparison BioSpice http://genomics.lbl.gov/~aparkin/ Functional Genomics (Arkin, 2001) Group/Codebase.html Biological Pathway Analysis Set of scope benchmarks representing Mission Partner and emerging Bio-Science highend computing requirements Paper & Pencil for Kernels Paper & Pencil for Kernels Paper & Pencil for Kernels Paper & Pencil for Kernels MIT Lincoln Laboratory ISI Outline • Introduction • Workflows • Metrics • Models & Benchmarks • Schedule and Summary MITRE Slide-28 HPCS Productivity MIT Lincoln Laboratory ISI Execution Framework Development Phase II Productivity Forum Tasks and Schedule Task (Communities) FY03 Q3-Q4 FY04 Q1-Q2 Q3-Q4 FY05 Q1-Q2 Q3-Q4 FY06 Q1-Q2 Q3-Q4 Competing Development Time Models -Workflow Models (Lincoln/HPCMO/LANL) -Dev Time Experiments (UMD/) Analyze Existing, Design Exp, & Pilot Studies Controlled Baseline Experiments Mission Specific & New Technology Demonstrations Data Validated Dev Time Assessment Methodology Workflows -Dev & Exe Interfaces (HPC SW/FFRDC) -A&P Benchmarks (Missions/FFRDC) -Unified Model Interface (HPC Modelers) Prototype Interfaces (v0.1) (version0.5) Reqs & Spec (~6) & Exe Spec (~2) Revise & Exe Spec (~2) Revise & Exe Spec (~2) Prototype Interface (v0.1) (version 0.5) (version 1.0) (version 1.0) Intelligence Weapons Design Surveillance Environment Bioinformatics Workflows -Machine Experiments (Modelers/Vendors) -Models & Metrics (Modelers/Vendors) -HPC Productivity Competitiveness Council MITRE Slide-29 HPCS Productivity Existing HPC Systems Next Generation HPC Systems HPCS Designs Competing Execution Time Models Productivity Workshops Productivity Evaluations Roll Out Productivity Metrics MIT Lincoln Laboratory Data Validated Exe Time Assessment Methodology Broad Commercial Acceptance ISI Summary • Goal is to develop an acquisition quality framework for HPC systems that includes – Development time – Execution time • Have assembled a team that will develop models, analyze existing HPC codes, develop tools and conduct HPC development time and execution time experiments • Measures of success – Acceptance by users, vendors and acquisition community – Quantitatively explain HPC rules of thumb: "OpenMP is easier than MPI, but doesn’t scale a high” "UPC/CAF is easier than OpenMP” "Matlab is easier the Fortran, but isn’t as fast” – Predict impact of new technologies MITRE Slide-30 HPCS Productivity MIT Lincoln Laboratory ISI Backup Slides MITRE Slide-31 HPCS Productivity MIT Lincoln Laboratory ISI HPCS Phase II Teams Industry: PI: Elnozahy PI: Gustafson PI: Smith Goal: Provide a new generation of economically viable high productivity computing systems for the national security and industrial user community (2007 – 2010) Productivity Team (Lincoln Lead) MIT Lincoln Laboratory PI: Kepner PI: Lucas PI: Basili PI: Benson & Snavely LCS PI: Koester Ohio State PIs: Vetter, Lusk, Post, Bailey PIs: Gilbert, Edelman, Ahalt, Mitchell Goal: Develop a procurement quality methodology that will MITassessment Lincoln Laboratory ISIbe the basis MITRE of 2010+ HPC procurements Slide-32 HPCS Productivity Productivity Framework Overview Phase I: Define Phase II: Implement Phase III: Transition Framework & Scope Petascale Requirements Framework & Perform Design Assessments To HPC Procurement Quality Framework Value Metrics •Execution •Development Run Evaluation Acceptance Experiments Level Tests Preliminary HPCS Vendors Multilevel Multilevel Workflows -Production -Enterprise -Researcher System HPCS FFRDC & Gov System Models R&D Partners Models & & Benchmarks -Activity •Purpose Final Prototypes Mission Agencies SN001 Commercial or Nonprofit Productivity Sponsor HPCS needs to develop a procurement quality assessment methodology that will be the basis of 2010+ HPC procurements MITRE Slide-33 HPCS Productivity MIT Lincoln Laboratory ISI