A Unified WCET Analysis Framework for Multi-core Platforms Sudipta Chattopadhyay, Chong Lee Kee, Abhik Roychoudhury National University of Singapore Timon Kelter, Peter Marwedel TU Dortmund, Germany Heiko.
Download ReportTranscript A Unified WCET Analysis Framework for Multi-core Platforms Sudipta Chattopadhyay, Chong Lee Kee, Abhik Roychoudhury National University of Singapore Timon Kelter, Peter Marwedel TU Dortmund, Germany Heiko.
A Unified WCET Analysis Framework for Multi-core Platforms Sudipta Chattopadhyay, Chong Lee Kee, Abhik Roychoudhury National University of Singapore Timon Kelter, Peter Marwedel TU Dortmund, Germany 1 Heiko Falk Ulm University, Germany RTAS 2012, Beijing Timing Analysis Hard real time systems require absolute timing guarantees System level analysis Single task analysis Worst case execution time (WCET) analysis 2 An upper bound on execution time for all possible inputs Sound over-approximation is obtained by static analysis RTAS 2012, Beijing WCET Analysis WCET of basic blocks Infeasible path constraints Program Micro-architectural modeling IPET Loop bound Control flow graph constraints IPET = Implicit Path Enumeration Technique 3 Path analysis RTAS 2012, Beijing Architecture Core 1 Core n L1 cache L1 cache Shared bus Shared L2 cache Memory 4 RTAS 2012, Beijing Micro-architectural Modeling Li et. al RTSS’09 branch predictor shared cache Chattopadhyay et. al SCOPES’10 Interactions Kelter et. al ECRTS’11 cache pipeline shared bus Rosen et. al RTSS’07 Single Core Multi Core Unified Multi-core timing analysis 5 RTAS 2012, Beijing Timing Anomaly (shared Cache) hit hit hit miss miss miss hit miss miss hit hit miss hit miss miss hit May not be the worst case path 6 RTAS 2012, Beijing Timing Anomaly (Shared Bus) delaymax delaymin delaymax delaymin delaymin delaymax May not be the worst case path 7 RTAS 2012, Beijing Background Representing each pipeline stage as a timing interval start [3,7] [1,3] latency finish [4,10] IF ID EX WB CM IF ID EX WB CM R1 := R2 + 5 Structural dependency IF ID EX WB CM R5 := R1 * R7 IF ID EX WB CM IF ID EX WB CM Contention R3 := R5 * 5 A fixed-point analysis derives the timing of each stage as an interval 8 RTAS 2012, Beijing Shared Cache + Pipeline Abstract interpretation – hit, miss or unclear Timing interval L1 miss unclear hit T := T + [1, 1] T := T + [ miss1 + 1, miss1 + 1] (shared) L2 hit unclear T := T + [miss1 + 1, miss1 + miss2 + 1] T := T + [1, miss1 + miss2 + 1] hit latency = 1 cycle miss1 L1 cache miss penalty miss2 L2 cache miss penalty 9 RTAS 2012, Beijing Shared Bus Analysis Time Division Multiple Access (TDMA) Offset abstraction Core 0 Core 0 offset delay round T (core 1) 10 Core 1 Core 1 Core 0 Core 0 delay = 0 offset round T’ (core 0) RTAS 2012, Beijing Core 1 Core 1 Shared bus + pipeline IF1 ID1 IF2 ID2 O1 Oin O2 IF3 ID3 IF2 finishes after ID1ID1 finishes after IF2 ID1 IF2 IF2 ID1 Oin = O1 Oin = O2 (approximate timing by static analysis) IF2 ID1 Oin = O1 U O2 Property: Offset content monotonically decreases over different iterations 11 RTAS 2012, Beijing Loop Construct Ci = bus context of the loop body at i-th iteration Bus contexts C1 C2 C3 …… C100 Unrolling loop iterations EXPENSIVE 12 RTAS 2012, Beijing Loop Construct Bus context flow graph C1 C2 C3 C4 How do we define bus context? C5 C5 C3 Property: If Ci Cj, then Ci+k Cj+k for any k > 0 13 RTAS 2012, Beijing Loop Construct C1 Bus context flow graph C2 C3 C4 Bus offsets of all pipeline stages of all instructions? There could be thousands of nodes How do we define bus context? 14 RTAS 2012, Beijing Loop Construct previous iteration current iteration IF ID EX WB CM IF ID EX WB CM IF ID EX WB CM IF ID EX WB CM How do we define bus context? Property: If the bus offsets of the cross-iteration edges do not change, WCET of the loop iteration cannot change 15 RTAS 2012, Beijing Loop Construct C1 Bus context flow graph C2 C3 C4 Compute WCET for each bus context Generate ILP flow constraints: E(C1) + E(C2) + E(C3) + E(C4) ≤ loop bound E(C1) ≥ E(C2) E(C1) = number of times context C1 is executed 16 RTAS 2012, Beijing Branch prediction + Cache m Cache conflict Cache hit Cache miss m m’ m evicted from cache branch correctly predicted branch incorrectly predicted 17 RTAS 2012, Beijing Branch prediction + Cache Cache content JOIN m Branch location m Maximum number of speculated instructions m’ Unclear cache access 18 Cache content RTAS 2012, Beijing Overall Picture branch predictor cache pipeline shared cache shared bus WCET of basic blocks IPET Loop bound Multi Core Bus context constraints 19 Infeasible path constrain s constraints Path analysis RTAS 2012, Beijing Experimental Setup (Chronos Toolkit) GCC simplescalar C source Micro architectural modeling Private cache Shared cache pipeline Binary code Flow constraints Branch prediction Shared bus Micro-architectural constraints 20 CFG RTAS 2012, Beijing ILP WCET Cache Sharing vs Cache Partitioning 4 4 4 Core 1 8 8 Core 1 Core 2 8 Core 2 Shared Cache between 2 cores 21 Vertically partition Horizontally partition RTAS 2012, Beijing Evaluation (cache + pipeline) Imprecision of shared cache analysis jfdctint 22 statemate RTAS 2012, Beijing Evaluation (Cache + pipeline + Speculation) Imprecision of modeling speculation 23 RTAS 2012, Beijing Evaluation (Bus + pipeline) Imprecision of path analysis 24 Imprecision of shared bus analysis RTAS 2012, Beijing Evaluation (Bus + pipeline + Speculation) Imprecision of path analysis 25 Imprecision of shared bus analysis RTAS 2012, Beijing Conclusion A unified WCET analysis framework Handles interaction of shared cache and bus with pipeline and branch prediction Timing anomaly is possible, state explosion is handled by timing interval abstraction Detailed information of the tool and extensive results are available at: 26 http://www.comp.nus.edu.sg/~rpembed/chronos-multi-core.html RTAS 2012, Beijing Questions Thank You 27 RTAS 2012, Beijing