Transcript pptx/plain

RISC Pipeline
Han Wang
CS3410, Spring 2010
Computer Science
Cornell University
See: P&H Chapter 4.6
1
Homework 2
01 2345678 9
2
Announcements
- Homework 2 due tomorrow midnight
- Programming Assignment 1 release tomorrow
- Pipelined MIPS processor (topic of today)
- Subset of MIPS ISA
-
Feedback
- We want to hear from you!
- Content?
3
Absolute Jump
Prog. inst
Mem
+4
+4
555
PC
control
offset
+
||
op
0x3
ALU
Reg.
File
imm
tgt
mnemonic
JAL target
=?
cmp
addr
Data
Mem
ext
Could have
used ALU for
link add
description
r31 = PC+8 (+8 due to branch delay slot)
PC = (PC+4)31..28 || (target << 2)
4
A Processor
Review: Single cycle processor
memory
+4
inst
register
file
+4
=?
PC
control
offset
new
pc
alu
cmp
addr
din
dout
memory
target
imm
extend
5
Single Cycle Processor
Advantages
• Single Cycle per instruction make logic and clock simple
Disadvantages
• Since instructions take different time to finish, memory
and functional unit are not efficiently utilized.
• Cycle time is the longest delay.
– Load instruction
• Best possible CPI is 1
6
Pipeline Hazards
0h
1h
2h
3h…
7
A Processor
memory
inst
register
file
alu
+4
addr
PC
din
control
new
pc
Instruction
Fetch
imm
extend
Instruction
Decode
dout
memory
compute
jump/branch
targets
Execute
Memory
WriteBack
8
Basic Pipeline
Five stage “RISC” load-store architecture
1. Instruction fetch (IF)
– get instruction from memory, increment PC
2. Instruction Decode (ID)
– translate opcode into control signals and read registers
3. Execute (EX)
– perform ALU operation, compute jump/branch targets
4. Memory (MEM)
– access memory if needed
5. Writeback (WB)
– update register file
Slides thanks to Sally McKee & Kavita Bala
9
Pipelined Implementation
Break instructions across multiple clock cycles
(five, in this case)
Design a separate stage for the execution
performed during each clock cycle
Add pipeline registers to isolate signals between
different stages
10
register
file
B
alu
D
memory
D
A
Pipelined Processor
+4
IF/ID
M
B
ID/EX
Execute
EX/MEM
Memory
ctrl
Instruction
Decode
Instruction
Fetch
dout
compute
jump/branch
targets
ctrl
extend
din
memory
imm
new
pc
control
ctrl
inst
PC
addr
WriteBack
MEM/WB
11
IF
Stage 1: Instruction Fetch
Fetch a new instruction every cycle
• Current PC is index to instruction memory
• Increment the PC at end of cycle (assume no branches for now)
Write values of interest to pipeline register (IF/ID)
• Instruction bits (for later decoding)
• PC+4 (for later computing branch targets)
12
IF
instruction
memory
mc
00 = read word
1
PC+4
+4
inst
addr
WE
PC
pcreg
new
pc
pcsel
Rest of pipeline
1
pcrel
pcabs
IF/ID
13
ID
Stage 2: Instruction Decode
On every cycle:
• Read IF/ID pipeline register to get instruction bits
• Decode instruction, generate control signals
• Read from register file
Write values of interest to pipeline register (ID/EX)
• Control information, Rd index, immediates, offsets, …
• Contents of Ra, Rb
• PC+4 (for computing branch targets later)
14
ctrl PC+4 imm
inst
PC+4
Stage 1: Instruction Fetch
WE
Rd register
D
file
A
A
IF/ID
ID/EX
decode
extend
Rest of pipeline
B
Ra Rb
B
result
ID
dest
15
EX
Stage 3: Execute
On every cycle:
•
•
•
•
Read ID/EX pipeline register to get values and control bits
Perform ALU operation
Compute targets (PC+4+offset, etc.) in case this is a branch
Decide if jump/branch should be taken
Write values of interest to pipeline register (EX/MEM)
• Control information, Rd index, …
• Result of ALU operation
• Value in case this is a memory store instruction
16
ctrl
pcabs
ctrl
pcrel
B
imm
B
D
alu
+
Rest of pipeline
PC+4
Stage 2: Instruction Decode
A
pcsel
EX
branch?
pcreg
||
ID/EX
EX/MEM
17
MEM
Stage 4: Memory
On every cycle:
• Read EX/MEM pipeline register to get values and control bits
• Perform memory load/store if needed
– address is ALU result
Write values of interest to pipeline register (MEM/WB)
• Control information, Rd index, …
• Result of memory operation
• Pass result of ALU operation
18
ctrl
ctrl
B
Stage 3: Execute
din
dout
addr
memory
Rest of pipeline
M
D
D
MEM
mc
EX/MEM
MEM/WB
19
WB
Stage 5: Write-back
On every cycle:
• Read MEM/WB pipeline register to get values and control bits
• Select value and write to register file
20
ctrl
M
Stage 4: Memory
D
WB
result
dest
MEM/WB
21
IF/ID
D
M
B
D
A
B
ID/EX
addr
din dout
OP
Rd
OP
EX/MEM
Rd
mem
PC+4
Rd
OP
PC+4
+4
PC
B
Ra Rb
imm
inst
inst
mem
A
Rd
D
MEM/WB
22
add
nand
lw
add
sw
r3,
r6,
r4,
r5,
r7,
r1, r2;
r4, r5;
20(r2);
r2, r5;
12(r3);
23
IF/ID
nand
lw r4,
add
sw
r7,
r3,
r5,
r6,20(r2)
12(r3)
r1,
r2,
r4, r5
r2
ID/EX
D
addr
din dout
M
B
B
D
A
nand
lw
add
sw
r4,
r7,
r3,
r5,
r6,
20(r2)
12(r3)
r1,
r2,
r4,r5
r2r5
P
Rd
OP
EX/MEM
Rd
mem
PC+4
imm
0
36A
9
12
18
7B
41
Rb
77
22
OP
PC+4
+4
PC
r0
r1
r2
Rd
r3
Dr4
r5
r6
Ra
r7
nand
lw
add
sw
r4,
r7,
r3,
r5,
r6,
20(r2)
12(r3)
r1,
r2,
r4,r5
r2r5
Rd
0:add
1:nand
inst
2:lw
3:add
mem
4:sw
nand
lw
add
sw
r4,
r7,
r3,
r5,
r6,
20(r2)
12(r3)
r1,
r2,
r4,r5
r2r5
inst
nand
lw
add
sw
r4,
r7,
r3,
r5,
r6,
20(r2)
12(r3)
r1,
r2,
r4,r5
r2r5
MEM/WB
24
Time Graphs
Clock cycle
add
nand
1
2
IF
ID
EX MEM WB
IF
ID
EX MEM WB
IF
ID
EX MEM WB
IF
ID
EX MEM WB
IF
ID
lw
add
sw
Latency:
Throughput:
Concurrency:
3
4
5
6
7
8
9
EX MEM WB
CPI =
25
Pipelining Recap
Powerful technique for masking latencies
• Logically, instructions execute one at a time
• Physically, instructions execute in parallel
– Instruction level parallelism
Abstraction promotes decoupling
• Interface (ISA) vs. implementation (Pipeline)
26
The end
27
Sample Code (Simple)
Assume eight-register machine
Run the following code on a pipelined datapath
add
3 1 2 ; reg 3 = reg 1 + reg 2
nand
6 4 5 ; reg 6 = ~(reg 4 & reg 5)
lw 4 20 (2) ; reg 4 = Mem[reg2+20]
add
5 2 5 ; reg 5 = reg 2 + reg 5
sw
7 12(3) ; Mem[reg3+12] = reg 7
28
Slides thanks to Sally McKee
M
U
X
1
target
+
PC+1
PC+1
R0
R1
regB
R2
R3
Register file
instruction
PC
Inst
mem
regA
0
ALU
result
valA
R4
R5
R6
valB
R7
M
U
X
A
L
U
ALU
result
mdata
Data
mem
M
U
X
data
offset
dest
valB
Bits 0-2
Bits 15-17
Bits 21-23
IF/ID
M
U
X
dest
dest
dest
op
op
op
ID/EX
EX/MEM
MEM/WB
29
data
dest
IF/ID
ID/EX
EX/MEM
MEM/WB
30
add 3 1 2
M
U
X
1
0
+
1
0
R0
R1
R3
Register file
add 3 1 2
PC
Inst
mem
R2
R4
R5
R6
R7
0
36
9
12
18
7
41
22
0
0
0
0
M
U
X
A
L
U
0
0
Data
mem
M
U
X
data
0
dest
0
Fetch:
add 3 1 2
Bits 0-2
Bits 15-17
Bits 21-23
Time: 1
IF/ID
M
U
X
0
0
0
nop
nop
nop
ID/EX
EX/MEM
MEM/WB
31
nand 6 4 5
add 3 1 2
M
U
X
1
0
+
2
1
R0
R2
2
R3
Register file
nand 6 4 5
PC
Inst
mem
R1
1
R4
R5
R6
R7
0
36
9
12
18
7
41
22
0
0
36
9
3
M
U
X
A
L
U
0
0
Data
mem
M
U
X
data
dest
0
Fetch:
nand 6 4 5
Bits 0-2
Bits 15-17
Bits 21-23
Time: 2
IF/ID
M
U
X
3
0
0
add
nop
nop
ID/EX
EX/MEM
MEM/WB
32
lw 4 20(2)
nand 6 4 5
add 3 1 2
M
U
X
1
4
+
3
2
R0
R2
5
R3
Register file
lw 4 20(2)
PC
Inst
mem
R1
4
R4
R5
R6
R7
0
36
9
12
18
7
41
22
0
18
7
0
36
9
6
M
U
X
A
L
U
45
0
Data
mem
M
U
X
data
dest
9
Fetch:
lw 4 20(2)
Bits 0-2
Bits 15-17
Bits 21-23
Time: 3
IF/ID
M
U
X
6
nand
ID/EX
3
3
0
add
nop
EX/MEM
MEM/WB
33
add 5 2 5
lw 4 20(2)
nand 6 4 5
add 3 1 2
M
U
X
1
8
+
4
3
R0
R2
4
R3
Register file
add 5 2 5
PC
Inst
mem
R1
2
R4
R5
R6
R7
0
36
9
12
18
7
41
22
0
9
18
45
18
7
20
M
U
X
A
L
U
-3 45
0
Data
mem
M
U
X
data
dest
7
Fetch:
add 5 2 5
Bits 0-2
Bits 15-17
Bits 21-23
Time: 4
IF/ID
M
U
X
4
lw
ID/EX
6
6
3
nand
EX/MEM
3
add
MEM/WB
34
sw 7 12(3)
add 5 2 5
lw 4 20 (2)
nand 6 4 5
add 3 1 2
M
U
X
1
23
+
5
4
R0
R2
5
R3
Register file
sw 7 12(3)
PC
Inst
mem
R1
2
R4
R5
R6
R7
0
36
9
45
18
7
41
22
0
9
7
5
-3
9
M
U
20 X
A
L
U
29 -3
45
0
Data
mem
M
U
X
data
dest
18
Fetch:
sw 7 12(3)
Bits 0-2
Bits 15-17
Bits 21-23
Time: 5
IF/ID
M
U
X
5
add
ID/EX
4
4
6
lw
EX/MEM
6
3
nand
MEM/WB
35
sw 7 12(3)
add 5 2 5
lw 4 20(2)
nand 6 4 5
M
U
X
1
9
+
5
R0
R1
3
R2
7
R3
Register file
PC
Inst
mem
R4
R5
R6
R7
0
36
9
45
18
7
-3
22
0
45
22
29
9
7
12
M
U
X
A
L
U
16 29
-3
99
Data
mem
M
U
X
data
dest
7
No more
instructions
Bits 0-2
Bits 15-17
Bits 21-23
Time: 6
IF/ID
M
U
X
7
sw
ID/EX
5
5
4
add
EX/MEM
4
6
lw
MEM/WB
36
nop
nop
sw 7 12(3)
add 5 2 5
lw 4 20(2)
M
U
X
1
15
+
R0
R1
R2
Inst
mem
R3
Register file
PC
R4
R5
R6
R7
0
36
9
45
99
7
-3
22
0
16
45
M
U
12 X
A
L
U
57 16
0
M
U
99 X
Data
mem
data
dest
22
No more
instructions
Bits 0-2
Bits 15-17
7
M
U
X
Bits 21-23
Time: 7
IF/ID
7
5
sw
ID/EX
EX/MEM
5
4
add
MEM/WB
37
nop
nop
nop
sw 7 12(3)
add 5 2 5
M
U
X
1
+
R0
R1
R2
Inst
mem
R3
Register file
PC
R4
R5
R6
R7
0
36
9
45
99
16
-3
22
57
M
U
X
A
L
U
57
22
16
0
Data
mem
data
dest
22
No more
instructions
Bits 0-2
Bits 15-17
M
U
X
7
Bits 21-23
Time: 8
IF/ID
Slides thanks to Sally McKee
M
U
X
5
sw
ID/EX
EX/MEM
MEM/WB
38
nop
nop
nop
nop
sw 7 12(3)
M
U
X
1
+
R0
R1
R2
Inst
mem
R3
Register file
PC
R4
R5
R6
R7
0
36
9
45
99
16
-3
22
M
U
X
A
L
U
M
U
X
Data
mem
data
dest
No more
instructions
Bits 0-2
Bits 15-17
M
U
X
Bits 21-23
Time: 9
IF/ID
ID/EX
EX/MEM
MEM/WB
39