Transcript Document
Physical Synthesis Comes of Age Chuck Alpert, IBM Corp. Chris Chu, Iowa State University Paul Villarrubia, IBM Corp. Physical Synthesis Family Tree Synthesis Layout Physical Synthesis Roles of layout as a parent: Clean up the mess created by physical synthesis (Implement the netlist generated by physical synthesis) Provide guidance to physical synthesis so that it will do things right Is layout mature enough to serve the role? Is there still room for layout to grow? 2 New Requirements of Placement 1. Super fast 2. Stable in handling incremental placement 3. Physical synthesis constantly makes changes to netlist Flexible objective function 4. 4 to 8 million objects now Provide quick feedbacks to physical synthesis to refine the netlist Timing, Power, Routability Handle mixed-size modules Hierarchical design and use of IP blocks are common 3 Placement As a Baby Simulated annealing based placement Popularized by Timberwolf [DAC-86] Greedy Algorithm Simulated Annealing •You only have 1 chance. •OK to make mistakes. Keep trying! •If you get stuck, I will terminate you! •Evaluation/Feedback is important. Strength: Good quality for small designs Easy to consider different objective functions Handle incremental changes well Weakness: Very slow – crawling Non-trivial to handle modules of different sizes 4 Placement As a Kid Min-cut placement (or Partitioning-based placement) An old idea [Breuer, DAC-77] Circuit Placement Region Strength: Capo [DAC-00] leverages breakthrough in partitioning using multilevel technique (e.g., hMetis [DAC-97], MLFM [DAC-97]) Dragon [ICCAD-00] combines hierarchical partitioning with annealing Efficient and scalable Very good wirelength, but can we do better? Weakness: More difficult to handle other objectives Not stable in handling incremental changes Not good in white space management 5 White Space in Min-Cut Placement Capo (Min-Cut) adaptec2 HPWL=9955 APlace (Analytical) adaptec2 HPWL=8715 Courtesy: IBM 6 Placement Maturing Analytical placement Strength: Fastest and scalable Best wirelength Robust framework to incorporate different objectives and constraints Stable in handling incremental changes Good in white space management Why would analytical placement work so well? Used by 4 of the top 5 placers in ISPD-05 Placement Contest and the top 5 placers in ISPD-06 Placement Contest Can see the big picture Why was it not popular in the past? Hard to spread modules evenly in placement region 7 Attempt Still Relying on Partitioning Gordian: Global Optimization and Rectangle Dissection [TCAD-91] Centers of mass Artificial center of mass constraints disturb global optimal solution too drastically 8 Another Partitioning-based Spreading Quadratic optimization with quadrisection [Vygen, DAC-97] Courtesy: IBM 9 Spreading by Density-based Force Kraftwerk [DAC-98] f ( p) 12 pT Cp d T p const Cp d 0 Spread cells by additional forces: Cp d f 0 Density-based force to push cells away from dense to sparse region r r' k f (r ) D(r ' ) 2 dr ' 2 r r' r xr Great idea: Quadratic wirelength minimization: Min Spread cells smoothly Very good wirelength But not too fast: Constant force, hard to control convergence Density-based force expensive to compute 10 Dramatic Speedup FastPlace [ISPD-04] repeat Solve quadratic program to minimize wirelength Spread the cells until cell distribution is roughly even Reduce wirelength by iterative heuristic Hybrid Net Model Speed up solving of QP Cell Shifting Simple technique to compute spreading force Fast convergence due to the use of pseudo-net [Hu et al., ISPD02] instead of constant force Iterative Local Refinement More efficient than using QP to refine the solution Minimize wirelength based on linear objective 11 Linearization of Quadratic Wirelength New Kraftwerk [ICCAD-06] BoundingBox net model for multi-pin nets: Need to know the outmost pins of a net BoundingBox Clique Accurately models HPWL Faster and less memory than clique model Two fundamental components of spreading force: Hold force – Constant force Move force – Enforced by pseudo-net to fixed point 12 Relaxation Rather than Linearization RQL [DAC-07] Force Vector Modulation to FastPlace framework Currently fastest and best wirelength Spreading Force Magnitude Rank Modules based on the spreading force magnitude Module Index Nullify the spreading force for top 5-10% of modules 13 An Alterative Analytical Approach APlace [ISPD-04], mPL5 [ISPD-05], NTUPlace3 [ICCAD-06] APlace Wirelength Model NTUP3 mPL6 Log-sum-exponential Density potential based Spreading Force Objective Function Bell-shaped Bell-shaped Non-linear & Non-convex Quadratic Fixed-point based Quadratic Log-sum-exponential function to approximate HPWL [Naylor et al., US Patent 2001] lse x1 ,, xn ln Poisson smoothed RQL xi / e max x1 ,, xn i 1 n Density constraint is directed formulated into the objective function Very competitive wirelength and runtime 14 Placement: Getting Old or Still Young? Better approach than quadratic / analytical approach? Massive parallelism to speed up placement Better clustering technique Marco placement / floorplanning True timing driven placement 15 Sufficient Parental Guidance? All physical synthesis gets from placement is distance info Physical synthesis has a distorted world view! Wirelength estimation is inaccurate (especially for nets with high pin count) Congestion estimation is inaccurate Routing of a Bus S3 S2 S1 S0 T0 T1 T2 T3 A Simple Solution S3 S2 S1 S0 T0 T1 T2 T3 Probablistic Estimation S3 S2 S1 S0 T0 T1 T2 T3 1 1 1 2 3 4 Harmonicseries Prob.Usage 1 Area estimation is inaccurate Without buffering and gate sizing Timing estimation is very inaccurate 16 Routing-Driven Physical Synthesis Need a more integrated approach Main obstacle: Past: Placement-Driven Physical Synthesis Future: Routing-Driven Physical Synthesis Runtime Two possibilities: 1. Construct Steiner trees to guide synthesis and placement 2. Perform global routing to guide synthesis and placement 17 Fast Steiner Tree Construction FLUTE (Fast LookUp Table Estimation) [ICCAD 04, ISPD 05] An extremely fast and accurate rectilinear Steiner Tree algorithm Very suitable for VLSI applications: Optimal up to degree 9, Very accurate up to degree 100 Over all 1.57 million nets in 18 IBM circuits [ISPD 98] RMST 4 3 Error (%) RSTT 2 1 SPAN FLUTE BGA 0 0 20 40 60 80 Runtime (s) 100 BI1S 120 18 Is Steiner Tree Sufficient? Steiner trees do not consider detour due to routing congestion or buffering congestion Can we predict the impact of congestion on routing? There is no way for generic estimators to accurately estimate congestion of arbitrary global routers! ibm01 ibm02 ibm03 ibm04 ibm06 ibm07 ibm08 ibm09 ibm10 Labyrinth(70%) #cong 238 368 247 588 367 568 486 377 501 Labyrinth(50%) Chi Dispersion #cong #match #cong #match 268 54 122 44 390 89 46 7 214 47 1 0 596 261 273 161 391 81 9 1 643 162 122 55 655 138 30 18 399 69 12 3 376 93 27 16 match Congestion by router 1 Congestion by router 2 19 Traditional Global Routing Simultaneous approach (e.g., ILP) Very slow Sequential approach Net-by-net routing, Rip-up and Reroute Maze routing for a net: Lee’s, Dijkstra’s, A*-search algorithms Reasonably fast Reasonably good quality Is it good enough to handle the demand of physical synthesis? 20 Progresses in Global Routing Pattern Routing [Kastner et al., ICCAD-00] Better cost functions for maze routing [Hadsell & Madden, DAC-03; Pan & Chu, ICCAD-06] Much faster because of much less reliance on maze routing Negotiated Congestion by PathFinder [FPGA-95] Reduce overflow significantly Congestion-driven Steiner tree construction [Pan & Chu, ICCAD-06] L-shaped, Z-shaped routes Faster Used by BoxRouter [ICCAD-07], FGA [ICCAD-07], Archer [ICCAD-07] Excellent routing ability Very slow because it takes a long time to build congestion history Wanted: Techniques that are both fast and high quality 21 What Should We Do Next? Integration of global routing into placement An initial attempt: IPR [DAC-07] Integration of FastPlace, FastDP, FLUTE and FastRoute Significantly improves routability & wirelength in good runtime Incorporate buffering and gate sizing into integrated placement & routing Much more accurate timing information Should also help congestion and placement density control Integration with logic synthesis In other words, we need: Better basic algorithms – placement, Steiner tree, global routing, buffering, gate sizing, etc. Clever ways of integration It is a (EDA) family problem. Let’s work together! 22 Thank You