Improving Multi-Core Performance Using Mixed-Cell Cache Architecture Samira Khan*†, Alaa R. Alameldeen*,

Improving Multi-Core Performance Using Mixed-Cell Cache Architecture Samira Khan†, Alaa R. Alameldeen,

Transcript Improving Multi-Core Performance Using Mixed-Cell Cache Architecture Samira Khan†, Alaa R. Alameldeen,

Improving Multi-Core Performance
Using Mixed-Cell Cache Architecture
Samira Khan*†, Alaa R. Alameldeen*,
Chris Wilkerson*, Jaydeep Kulkarni* and Daniel A. Jiménez§
*Intel Labs
Carnegie Mellon University
§Texas A&M University
• Problem:
–
–
–
–
Summary
Cache cells become unreliable at low voltage
Mixed-cell cache: Use some larger robust cells [Ghasemi 2011]
Smaller non-robust cells are turned off at low voltage
Capacity loss leads to performance loss
• Goal:
– No capacity loss at low voltage to gain high performance
• Observation:
– A clean line has a duplicate copy in the memory hierarchy
– A modified line is the only existing copy
• Our Approach:
– Protect a modified line in larger robust cells
– Store a clean line in smaller non-robust cells
– Fetch data from the lower level on an error in a clean line
Significantly improves performance and reduces power
compared to prior work
2
Outline
•
•
•
•
•
Summary
Background and Motivation
Mixed-Cell Cache Architecture
Methodology and Results
Conclusion
3
Background and Motivation
• Multi-core designs are power-limited
• Can activate more cores by lowering the voltage
Voltage Scale
More active cores at low voltage
4
Ensuring Resiliency at Lower Voltage
• Cache cells begin to fail at lower voltage
Non-robust
Robust
Error
Cache
Mixed-Cell Cache
• Mixed-Cell Cache [Ghasemi 2011]
– Some ways built with robust cells
+ Resilient to error at low voltage
- Area and power overhead
– Only robust cells are operational at low voltage
Cache capacity loss at lower voltage
can degrade performance significantly
5
Effect of Cache Capacity Reduction
in a 4-Core System
Speedup at 590mV, 825MHz
1.1
1.0
0.9
0.8
0.7
0.6
0.5
Full Capacity
Disable
In our experiments, 75% reduction in cache capacity
leads to 20% performance loss on average
6
Goal:
Improve performance using the whole cache
at low voltage
7
Outline
•
•
•
•
•
Summary
Background and Motivation
Mixed-Cell Cache Architecture
Methodology and Results
Conclusion
8
Our Mixed-Cell Architecture
• Observation:
– A Clean line has a duplicate copy in the memory hierarchy
• On an error, can get the data from the duplicate copy
– A Modified line is the only copy in the system
• Critical to keep the data error free
• Idea:
–
–
–
–
Protect a modified line using larger robust cells
Store a clean line in smaller non-robust cells
Use parity/ECC to detect errors in clean lines
Fetch data from the lower level on an error in clean lines
9
Our Mixed-Cell Architecture
Non-robust
Robust
Clean
Modified
Mixed-cell (Disable)
[Ghasemi 2011]
Our Design
• Use both robust and non-robust ways at low voltage
• A modified line is stored only in a robust way
• A clean line is stored only in a non-robust way
Modify cache management techniques to ensure
clean and modified lines are stored appropriately
10
Mixed-Cell Architecture:
Cache Miss
• Write miss: Allocate line in a robust way
• Read miss: Allocate line in a non-robust way
X
LRU
Write miss X
Y
A B
LRU
LRU LRU
Write miss Y Read miss A
Time
Read miss B
11
Mixed-Cell Architecture:
Cache Hit
E
Write Hit E
F
G H I
Write Hit G
J
K L
Read Hit J
• Read hit: No change
• Write hit:
– Write hit in robust: No change
– Write hit in non-robust: We propose three mechanisms
–Writeback
–Swap
–Duplicate
12
Write to a Non-Robust Line:
Writeback
• Write it back in the next level of memory hierarchy
• Make data clean in the non-robust cell
Write Hit G
E
F
G H I
J
K L
Now this block
Dirty block
in non-robust
contains
clean
way is vulnerable,
data
writeback G
+ Simple
- An extra writeback at each write in a non-robust way
13
Write to a Non-Robust Line:
Swap
• Swap modified line with the LRU robust line
• Writeback the robust data to next cache level
Write Hit G
E
F
G H I
J
K L
Now this
block
Swap
E and G,
contains
clean
E is now
vulnerable,
data
writeback E
+ Increases write hits in robust cells
- Extra latency for swap
14
Write to a Non-Robust Line:
Duplicate
• Pair two non-robust ways
• Static pairing: way <0,1>, <2,3>…
• Duplicate the data in the partner way
• On an error in one way, use data from the partner way
Write Hit G
E
F
G H I
J
K L
Duplicate G in the
partner way
+ Simple, no extra writeback
- Capacity loss, extra latency for duplication
15
Outline
•
•
•
•
•
Summary
Background and Motivation
Mixed-Cell Cache Design
Methodology and Results
Conclusion
16
Evaluation Methodology
• Simulator: CMP$im, a Pin-based x86 simulator [Jaleel 2008]
• Benchmarks: 20 4-core multi-programmed mixes from
SPEC 2006
• Each cache has 2 robust ways
• L1D 32KB, 2 robust, 6 non-robust ways, 3 cycles
• L2 256KB, 2 robust, 6 non-robust ways, 10 cycles
• L3 shared 4MB, 2 robust, 14 non-robust ways, 25 cycles
• Memory latency 80 cycles
• Vmin 590 mV, 825 MHz
17
Comparison Points
• Robust: Cache uses only robust cells
• Smaller capacity, L1D 20KB, L2 160KB, L3 2.25MB
• Disable: Mixed-Cell Cache [Ghasemi 2011]
• Only ¼ of the cache works at low voltage, L1D 8KB, L2
64KB, L3 1MB
• Ideal: Cache uses only non-robust cells
• Larger capacity, L1 40KB, L2 320KB, L3 4.5MB
• Can not work at low voltage
• Can provide higher voltage to cache using a separate Vcc
– Increases complexity
– Adds latency to signals crossing voltage domains
18
4-Core Performance at Low Voltage
1.4
Weighted Speedup
1.2
2.6%
17%
1.0
0.8
0.6
0.4
0.2
0.0
Disable
Robust
Writeback
Swap
Duplicate
Ideal
Swap provides 17% speedup over Disable
Swap performs within 2.6% of Ideal
19
Normalized Memory Bandwidth
vs. Disable
Normalized Memory Bandwidth
6.15
2
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
21%
28.5%
3%
Disable
Robust
Writeback
Swap
Duplicate
Ideal
Duplicate increases memory bandwidth
by only 3% compared to Ideal
20
Normalized LLC Static Power
at Vmin (590mV)
Normalized LLC Static Power vs.
Disable
3
10%
2.5
2
2.3X
1.5
1
0.5
0
Disable
Robust
Swap
Duplicate
Ideal
Swap and Duplicate reduce LLC static power by 10%
compared to Ideal
21
Normalized L1D Dynamic Power
at Vmin (590mV)
Normalized L1 Dynamic Power
vs. Disable
1.2
1
22%
0.8
50%
0.6
30%
0.4
0.2
0
Disable
Robust
Swap
Duplicate
Ideal
Duplicate reduces dynamic power by 50% compared to Disable
Duplicate is within 30% of the Ideal
22
• Problem:
–
–
–
–
Conclusion
Cache cells become unreliable at low voltage
Mixed-cell cache: Use some larger robust cells
Smaller non-robust cells are turned off at low voltage
Capacity loss leads to performance loss
• Goal:
– No capacity loss at low voltage to gain high performance
• Observation:
– A clean line has a duplicate copy in the memory hierarchy
– A modified line is the only existing copy
• Our Approach:
– Protect a modified line in larger robust cells
– Store a clean line in smaller non-robust cells
– Fetch data from the lower level on an error in a clean line
Improves performance by 17% and reduces L1D dynamic
power by 50% compared to prior work
23
Thank you
24
Improving Multi-Core Performance
Using Mixed-Cell Cache Architecture
Samira Khan*†, Alaa R. Alameldeen*,
Chris Wilkerson*, Jaydeep Kulkarni* and Daniel A. Jiménez§
*Intel Labs
Carnegie Mellon University
§Texas A&M University

Improving Multi-Core Performance Using Mixed-Cell Cache Architecture Samira Khan*†, Alaa R. Alameldeen*,

Transcript Improving Multi-Core Performance Using Mixed-Cell Cache Architecture Samira Khan*†, Alaa R. Alameldeen*,

Directory

Improving Multi-Core Performance Using Mixed-Cell Cache Architecture Samira Khan†, Alaa R. Alameldeen,

Transcript Improving Multi-Core Performance Using Mixed-Cell Cache Architecture Samira Khan†, Alaa R. Alameldeen,