Transcript pptx/plain
RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6 1 Homework 2 01 2345678 9 2 Announcements - Homework 2 due tomorrow midnight - Programming Assignment 1 release tomorrow - Pipelined MIPS processor (topic of today) - Subset of MIPS ISA - Feedback - We want to hear from you! - Content? 3 Absolute Jump Prog. inst Mem +4 +4 555 PC control offset + || op 0x3 ALU Reg. File imm tgt mnemonic JAL target =? cmp addr Data Mem ext Could have used ALU for link add description r31 = PC+8 (+8 due to branch delay slot) PC = (PC+4)31..28 || (target << 2) 4 A Processor Review: Single cycle processor memory +4 inst register file +4 =? PC control offset new pc alu cmp addr din dout memory target imm extend 5 Single Cycle Processor Advantages • Single Cycle per instruction make logic and clock simple Disadvantages • Since instructions take different time to finish, memory and functional unit are not efficiently utilized. • Cycle time is the longest delay. – Load instruction • Best possible CPI is 1 6 Pipeline Hazards 0h 1h 2h 3h… 7 A Processor memory inst register file alu +4 addr PC din control new pc Instruction Fetch imm extend Instruction Decode dout memory compute jump/branch targets Execute Memory WriteBack 8 Basic Pipeline Five stage “RISC” load-store architecture 1. Instruction fetch (IF) – get instruction from memory, increment PC 2. Instruction Decode (ID) – translate opcode into control signals and read registers 3. Execute (EX) – perform ALU operation, compute jump/branch targets 4. Memory (MEM) – access memory if needed 5. Writeback (WB) – update register file Slides thanks to Sally McKee & Kavita Bala 9 Pipelined Implementation Break instructions across multiple clock cycles (five, in this case) Design a separate stage for the execution performed during each clock cycle Add pipeline registers to isolate signals between different stages 10 register file B alu D memory D A Pipelined Processor +4 IF/ID M B ID/EX Execute EX/MEM Memory ctrl Instruction Decode Instruction Fetch dout compute jump/branch targets ctrl extend din memory imm new pc control ctrl inst PC addr WriteBack MEM/WB 11 IF Stage 1: Instruction Fetch Fetch a new instruction every cycle • Current PC is index to instruction memory • Increment the PC at end of cycle (assume no branches for now) Write values of interest to pipeline register (IF/ID) • Instruction bits (for later decoding) • PC+4 (for later computing branch targets) 12 IF instruction memory mc 00 = read word 1 PC+4 +4 inst addr WE PC pcreg new pc pcsel Rest of pipeline 1 pcrel pcabs IF/ID 13 ID Stage 2: Instruction Decode On every cycle: • Read IF/ID pipeline register to get instruction bits • Decode instruction, generate control signals • Read from register file Write values of interest to pipeline register (ID/EX) • Control information, Rd index, immediates, offsets, … • Contents of Ra, Rb • PC+4 (for computing branch targets later) 14 ctrl PC+4 imm inst PC+4 Stage 1: Instruction Fetch WE Rd register D file A A IF/ID ID/EX decode extend Rest of pipeline B Ra Rb B result ID dest 15 EX Stage 3: Execute On every cycle: • • • • Read ID/EX pipeline register to get values and control bits Perform ALU operation Compute targets (PC+4+offset, etc.) in case this is a branch Decide if jump/branch should be taken Write values of interest to pipeline register (EX/MEM) • Control information, Rd index, … • Result of ALU operation • Value in case this is a memory store instruction 16 ctrl pcabs ctrl pcrel B imm B D alu + Rest of pipeline PC+4 Stage 2: Instruction Decode A pcsel EX branch? pcreg || ID/EX EX/MEM 17 MEM Stage 4: Memory On every cycle: • Read EX/MEM pipeline register to get values and control bits • Perform memory load/store if needed – address is ALU result Write values of interest to pipeline register (MEM/WB) • Control information, Rd index, … • Result of memory operation • Pass result of ALU operation 18 ctrl ctrl B Stage 3: Execute din dout addr memory Rest of pipeline M D D MEM mc EX/MEM MEM/WB 19 WB Stage 5: Write-back On every cycle: • Read MEM/WB pipeline register to get values and control bits • Select value and write to register file 20 ctrl M Stage 4: Memory D WB result dest MEM/WB 21 IF/ID D M B D A B ID/EX addr din dout OP Rd OP EX/MEM Rd mem PC+4 Rd OP PC+4 +4 PC B Ra Rb imm inst inst mem A Rd D MEM/WB 22 add nand lw add sw r3, r6, r4, r5, r7, r1, r2; r4, r5; 20(r2); r2, r5; 12(r3); 23 IF/ID nand lw r4, add sw r7, r3, r5, r6,20(r2) 12(r3) r1, r2, r4, r5 r2 ID/EX D addr din dout M B B D A nand lw add sw r4, r7, r3, r5, r6, 20(r2) 12(r3) r1, r2, r4,r5 r2r5 P Rd OP EX/MEM Rd mem PC+4 imm 0 36A 9 12 18 7B 41 Rb 77 22 OP PC+4 +4 PC r0 r1 r2 Rd r3 Dr4 r5 r6 Ra r7 nand lw add sw r4, r7, r3, r5, r6, 20(r2) 12(r3) r1, r2, r4,r5 r2r5 Rd 0:add 1:nand inst 2:lw 3:add mem 4:sw nand lw add sw r4, r7, r3, r5, r6, 20(r2) 12(r3) r1, r2, r4,r5 r2r5 inst nand lw add sw r4, r7, r3, r5, r6, 20(r2) 12(r3) r1, r2, r4,r5 r2r5 MEM/WB 24 Time Graphs Clock cycle add nand 1 2 IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID lw add sw Latency: Throughput: Concurrency: 3 4 5 6 7 8 9 EX MEM WB CPI = 25 Pipelining Recap Powerful technique for masking latencies • Logically, instructions execute one at a time • Physically, instructions execute in parallel – Instruction level parallelism Abstraction promotes decoupling • Interface (ISA) vs. implementation (Pipeline) 26 The end 27 Sample Code (Simple) Assume eight-register machine Run the following code on a pipelined datapath add 3 1 2 ; reg 3 = reg 1 + reg 2 nand 6 4 5 ; reg 6 = ~(reg 4 & reg 5) lw 4 20 (2) ; reg 4 = Mem[reg2+20] add 5 2 5 ; reg 5 = reg 2 + reg 5 sw 7 12(3) ; Mem[reg3+12] = reg 7 28 Slides thanks to Sally McKee M U X 1 target + PC+1 PC+1 R0 R1 regB R2 R3 Register file instruction PC Inst mem regA 0 ALU result valA R4 R5 R6 valB R7 M U X A L U ALU result mdata Data mem M U X data offset dest valB Bits 0-2 Bits 15-17 Bits 21-23 IF/ID M U X dest dest dest op op op ID/EX EX/MEM MEM/WB 29 data dest IF/ID ID/EX EX/MEM MEM/WB 30 add 3 1 2 M U X 1 0 + 1 0 R0 R1 R3 Register file add 3 1 2 PC Inst mem R2 R4 R5 R6 R7 0 36 9 12 18 7 41 22 0 0 0 0 M U X A L U 0 0 Data mem M U X data 0 dest 0 Fetch: add 3 1 2 Bits 0-2 Bits 15-17 Bits 21-23 Time: 1 IF/ID M U X 0 0 0 nop nop nop ID/EX EX/MEM MEM/WB 31 nand 6 4 5 add 3 1 2 M U X 1 0 + 2 1 R0 R2 2 R3 Register file nand 6 4 5 PC Inst mem R1 1 R4 R5 R6 R7 0 36 9 12 18 7 41 22 0 0 36 9 3 M U X A L U 0 0 Data mem M U X data dest 0 Fetch: nand 6 4 5 Bits 0-2 Bits 15-17 Bits 21-23 Time: 2 IF/ID M U X 3 0 0 add nop nop ID/EX EX/MEM MEM/WB 32 lw 4 20(2) nand 6 4 5 add 3 1 2 M U X 1 4 + 3 2 R0 R2 5 R3 Register file lw 4 20(2) PC Inst mem R1 4 R4 R5 R6 R7 0 36 9 12 18 7 41 22 0 18 7 0 36 9 6 M U X A L U 45 0 Data mem M U X data dest 9 Fetch: lw 4 20(2) Bits 0-2 Bits 15-17 Bits 21-23 Time: 3 IF/ID M U X 6 nand ID/EX 3 3 0 add nop EX/MEM MEM/WB 33 add 5 2 5 lw 4 20(2) nand 6 4 5 add 3 1 2 M U X 1 8 + 4 3 R0 R2 4 R3 Register file add 5 2 5 PC Inst mem R1 2 R4 R5 R6 R7 0 36 9 12 18 7 41 22 0 9 18 45 18 7 20 M U X A L U -3 45 0 Data mem M U X data dest 7 Fetch: add 5 2 5 Bits 0-2 Bits 15-17 Bits 21-23 Time: 4 IF/ID M U X 4 lw ID/EX 6 6 3 nand EX/MEM 3 add MEM/WB 34 sw 7 12(3) add 5 2 5 lw 4 20 (2) nand 6 4 5 add 3 1 2 M U X 1 23 + 5 4 R0 R2 5 R3 Register file sw 7 12(3) PC Inst mem R1 2 R4 R5 R6 R7 0 36 9 45 18 7 41 22 0 9 7 5 -3 9 M U 20 X A L U 29 -3 45 0 Data mem M U X data dest 18 Fetch: sw 7 12(3) Bits 0-2 Bits 15-17 Bits 21-23 Time: 5 IF/ID M U X 5 add ID/EX 4 4 6 lw EX/MEM 6 3 nand MEM/WB 35 sw 7 12(3) add 5 2 5 lw 4 20(2) nand 6 4 5 M U X 1 9 + 5 R0 R1 3 R2 7 R3 Register file PC Inst mem R4 R5 R6 R7 0 36 9 45 18 7 -3 22 0 45 22 29 9 7 12 M U X A L U 16 29 -3 99 Data mem M U X data dest 7 No more instructions Bits 0-2 Bits 15-17 Bits 21-23 Time: 6 IF/ID M U X 7 sw ID/EX 5 5 4 add EX/MEM 4 6 lw MEM/WB 36 nop nop sw 7 12(3) add 5 2 5 lw 4 20(2) M U X 1 15 + R0 R1 R2 Inst mem R3 Register file PC R4 R5 R6 R7 0 36 9 45 99 7 -3 22 0 16 45 M U 12 X A L U 57 16 0 M U 99 X Data mem data dest 22 No more instructions Bits 0-2 Bits 15-17 7 M U X Bits 21-23 Time: 7 IF/ID 7 5 sw ID/EX EX/MEM 5 4 add MEM/WB 37 nop nop nop sw 7 12(3) add 5 2 5 M U X 1 + R0 R1 R2 Inst mem R3 Register file PC R4 R5 R6 R7 0 36 9 45 99 16 -3 22 57 M U X A L U 57 22 16 0 Data mem data dest 22 No more instructions Bits 0-2 Bits 15-17 M U X 7 Bits 21-23 Time: 8 IF/ID Slides thanks to Sally McKee M U X 5 sw ID/EX EX/MEM MEM/WB 38 nop nop nop nop sw 7 12(3) M U X 1 + R0 R1 R2 Inst mem R3 Register file PC R4 R5 R6 R7 0 36 9 45 99 16 -3 22 M U X A L U M U X Data mem data dest No more instructions Bits 0-2 Bits 15-17 M U X Bits 21-23 Time: 9 IF/ID ID/EX EX/MEM MEM/WB 39