Improving Multi-Core Performance Using Mixed-Cell Cache Architecture Samira Khan*†, Alaa R. Alameldeen*,
Download ReportTranscript Improving Multi-Core Performance Using Mixed-Cell Cache Architecture Samira Khan*†, Alaa R. Alameldeen*,
Improving Multi-Core Performance Using Mixed-Cell Cache Architecture Samira Khan*†, Alaa R. Alameldeen*, Chris Wilkerson*, Jaydeep Kulkarni* and Daniel A. Jiménez§ *Intel Labs Carnegie Mellon University §Texas A&M University • Problem: – – – – Summary Cache cells become unreliable at low voltage Mixed-cell cache: Use some larger robust cells [Ghasemi 2011] Smaller non-robust cells are turned off at low voltage Capacity loss leads to performance loss • Goal: – No capacity loss at low voltage to gain high performance • Observation: – A clean line has a duplicate copy in the memory hierarchy – A modified line is the only existing copy • Our Approach: – Protect a modified line in larger robust cells – Store a clean line in smaller non-robust cells – Fetch data from the lower level on an error in a clean line Significantly improves performance and reduces power compared to prior work 2 Outline • • • • • Summary Background and Motivation Mixed-Cell Cache Architecture Methodology and Results Conclusion 3 Background and Motivation • Multi-core designs are power-limited • Can activate more cores by lowering the voltage Voltage Scale More active cores at low voltage 4 Ensuring Resiliency at Lower Voltage • Cache cells begin to fail at lower voltage Non-robust Robust Error Cache Mixed-Cell Cache • Mixed-Cell Cache [Ghasemi 2011] – Some ways built with robust cells + Resilient to error at low voltage - Area and power overhead – Only robust cells are operational at low voltage Cache capacity loss at lower voltage can degrade performance significantly 5 Effect of Cache Capacity Reduction in a 4-Core System Speedup at 590mV, 825MHz 1.1 1.0 0.9 0.8 0.7 0.6 0.5 Full Capacity Disable In our experiments, 75% reduction in cache capacity leads to 20% performance loss on average 6 Goal: Improve performance using the whole cache at low voltage 7 Outline • • • • • Summary Background and Motivation Mixed-Cell Cache Architecture Methodology and Results Conclusion 8 Our Mixed-Cell Architecture • Observation: – A Clean line has a duplicate copy in the memory hierarchy • On an error, can get the data from the duplicate copy – A Modified line is the only copy in the system • Critical to keep the data error free • Idea: – – – – Protect a modified line using larger robust cells Store a clean line in smaller non-robust cells Use parity/ECC to detect errors in clean lines Fetch data from the lower level on an error in clean lines 9 Our Mixed-Cell Architecture Non-robust Robust Clean Modified Mixed-cell (Disable) [Ghasemi 2011] Our Design • Use both robust and non-robust ways at low voltage • A modified line is stored only in a robust way • A clean line is stored only in a non-robust way Modify cache management techniques to ensure clean and modified lines are stored appropriately 10 Mixed-Cell Architecture: Cache Miss • Write miss: Allocate line in a robust way • Read miss: Allocate line in a non-robust way X LRU Write miss X Y A B LRU LRU LRU Write miss Y Read miss A Time Read miss B 11 Mixed-Cell Architecture: Cache Hit E Write Hit E F G H I Write Hit G J K L Read Hit J • Read hit: No change • Write hit: – Write hit in robust: No change – Write hit in non-robust: We propose three mechanisms –Writeback –Swap –Duplicate 12 Write to a Non-Robust Line: Writeback • Write it back in the next level of memory hierarchy • Make data clean in the non-robust cell Write Hit G E F G H I J K L Now this block Dirty block in non-robust contains clean way is vulnerable, data writeback G + Simple - An extra writeback at each write in a non-robust way 13 Write to a Non-Robust Line: Swap • Swap modified line with the LRU robust line • Writeback the robust data to next cache level Write Hit G E F G H I J K L Now this block Swap E and G, contains clean E is now vulnerable, data writeback E + Increases write hits in robust cells - Extra latency for swap 14 Write to a Non-Robust Line: Duplicate • Pair two non-robust ways • Static pairing: way <0,1>, <2,3>… • Duplicate the data in the partner way • On an error in one way, use data from the partner way Write Hit G E F G H I J K L Duplicate G in the partner way + Simple, no extra writeback - Capacity loss, extra latency for duplication 15 Outline • • • • • Summary Background and Motivation Mixed-Cell Cache Design Methodology and Results Conclusion 16 Evaluation Methodology • Simulator: CMP$im, a Pin-based x86 simulator [Jaleel 2008] • Benchmarks: 20 4-core multi-programmed mixes from SPEC 2006 • Each cache has 2 robust ways • L1D 32KB, 2 robust, 6 non-robust ways, 3 cycles • L2 256KB, 2 robust, 6 non-robust ways, 10 cycles • L3 shared 4MB, 2 robust, 14 non-robust ways, 25 cycles • Memory latency 80 cycles • Vmin 590 mV, 825 MHz 17 Comparison Points • Robust: Cache uses only robust cells • Smaller capacity, L1D 20KB, L2 160KB, L3 2.25MB • Disable: Mixed-Cell Cache [Ghasemi 2011] • Only ¼ of the cache works at low voltage, L1D 8KB, L2 64KB, L3 1MB • Ideal: Cache uses only non-robust cells • Larger capacity, L1 40KB, L2 320KB, L3 4.5MB • Can not work at low voltage • Can provide higher voltage to cache using a separate Vcc – Increases complexity – Adds latency to signals crossing voltage domains 18 4-Core Performance at Low Voltage 1.4 Weighted Speedup 1.2 2.6% 17% 1.0 0.8 0.6 0.4 0.2 0.0 Disable Robust Writeback Swap Duplicate Ideal Swap provides 17% speedup over Disable Swap performs within 2.6% of Ideal 19 Normalized Memory Bandwidth vs. Disable Normalized Memory Bandwidth 6.15 2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 21% 28.5% 3% Disable Robust Writeback Swap Duplicate Ideal Duplicate increases memory bandwidth by only 3% compared to Ideal 20 Normalized LLC Static Power at Vmin (590mV) Normalized LLC Static Power vs. Disable 3 10% 2.5 2 2.3X 1.5 1 0.5 0 Disable Robust Swap Duplicate Ideal Swap and Duplicate reduce LLC static power by 10% compared to Ideal 21 Normalized L1D Dynamic Power at Vmin (590mV) Normalized L1 Dynamic Power vs. Disable 1.2 1 22% 0.8 50% 0.6 30% 0.4 0.2 0 Disable Robust Swap Duplicate Ideal Duplicate reduces dynamic power by 50% compared to Disable Duplicate is within 30% of the Ideal 22 • Problem: – – – – Conclusion Cache cells become unreliable at low voltage Mixed-cell cache: Use some larger robust cells Smaller non-robust cells are turned off at low voltage Capacity loss leads to performance loss • Goal: – No capacity loss at low voltage to gain high performance • Observation: – A clean line has a duplicate copy in the memory hierarchy – A modified line is the only existing copy • Our Approach: – Protect a modified line in larger robust cells – Store a clean line in smaller non-robust cells – Fetch data from the lower level on an error in a clean line Improves performance by 17% and reduces L1D dynamic power by 50% compared to prior work 23 Thank you 24 Improving Multi-Core Performance Using Mixed-Cell Cache Architecture Samira Khan*†, Alaa R. Alameldeen*, Chris Wilkerson*, Jaydeep Kulkarni* and Daniel A. Jiménez§ *Intel Labs Carnegie Mellon University §Texas A&M University