Transcript 740_2.ppt
Resource Replication 6 Integer Units 4 FP units 8 Sets of architectural registers 100+100 Renaming registers (Int/FP) HW Context (PC, Return Stack etc.) Ports in I-cache Replication (Contd) Per-thread mechanism for Pipeline Flushing Instruction Retirement Trapping Precise Interrupts Thread Identifier in BTB, TLB Inter-thread Interference Increases with #threads 1.4% (2 thread) 4.8% (4) 5.3% (8) Does not hurt much 0.1% performance degradation Why? L1 misses covered by L2 misses Out of order execution, write buffer, multi thread Memory Requirement Increases with number of threads Mostly for L1 Bank Conflict Memory requirement doubles as number of threads go from 1 to 8 Multiple Thread Long L1 cache line Longer cache line has better locality Overall performance degrades by 3.4%