## **ISPD 2013 Discrete Gate Sizing Contest** Speakers: Mustafa Ozdal, Gustavo Wilke Organizers: Chirayu Amin, Andrey Ayupov, Steve Burns, Cheng Zhuo Intel Corporation, Hillsboro OR ## People ### Contest Organizers Cheng Zhuo Gustavo Wilke Steve Burns **Andrey Ayupov** Chirayu Amin Mustafa Ozdal ### Responsibilities Communications + timing scripts **Evaluations** **Cell library** **Benchmarks** Timing models Contest organization + parsers ### Special Thanks To: - Troy Wood, Robert Hoogenstryd (Synopsys); - □ Ted Schroeder, Jack Ho, Dan McMullen, Jack Erickson (Cadence); - Noel Menezes, Jason Xu, Alaena Young, Chris Forn, Kabiru Ahmed, Rohit Vachher, Shishpal Rawat (Intel); # Participation Statistics 25 initial registrations Asia: 14 teams North America: 9 teams South America: 1 team Europe: 1 team Overall 8 different countries 12 alpha binary submissions 9 final submissions # **ISPD 2013 Contest Overview** ## Discrete Gate Sizing Contest: Background - Gate sizing contests in ISPD 2012 & 2013 - ISPD contests are traditionally organized 2 years in a row. - Major changes compared to the ISPD 2012 Contest: - A realistic distributed RC model for interconnect - More challenging benchmarks - More emphasis on runtime in the evaluation metrics ## Discrete Gate Sizing Contest: An Overview - Simultaneous gate sizing and Vt assignment to optimize power under performance constraints - Problem formulation - Inputs: - Standard cell library - Netlist - Timing constraints - Interconnect parasitics - Outputs: - Cell sizes and types - Objective: - Satisfy all performance constraints - Minimize total leakage power - An industrial timing engine used as the reference timer ### Gate Sizing and Threshold Voltage Selection - Choose the cell sizes and device types from the library such that: - All timing constraints are satisfied - Total power is minimized ### Gate Sizing and Threshold Voltage Selection - Choose the cell sizes and device types from the library such that: - All timing constraints are satisfied - Total power is minimized # Contest Objectives - Main objective: Expose industrial challenges in the gate sizing problem to academia - Common industrial challenges: - Discrete cell sizes - Continuous optimization + rounding: typically suboptimal - Non-convex cell timing models - Due to transistor folding in the layout, etc. - Slew dependencies and constraints - Realistic interconnect models - Scalability for large design sizes - Complex timing constraints - Multiple clock domains, false paths captured in the contest not captured in the contest ### Benchmark Features - Each benchmark circuit consists of: - A netlist - Structured verilog format - Sanitized (no hierarchy, no buses, no unconnected pins, etc.) - Interconnect parasitics - **IEEE SPEF format** - Distributed RC model - Timing constraints Synopsys Design Constraints (SDC) format - Single clock period, no false paths, no latches - Circuit interface (driving cells at PIs, loads at POs, etc.) - Standard industrial formats - C++ parser helpers provided by the organizers ### Benchmark Creation: Netlists - 4 netlists from the ISPD 2012 Contest - usb\_phy - pci\_bridge32 - des\_perf - netcard (derived from IWLS-2005) - 4 netlists created using high level and logic synthesis tools - We implemented 4 algorithms in SystemC - cordic: CORDIC sine/cosine functions - fft: Fast-Fourier transform - matrix\_mult: Matrix multiplication - edit\_dist: Dynamic-programming based edit distance algorithm - Cadence® C-to-Silicon Compiler to generate the RTL - Cadence® RTL Compiler to generate the netlists ### Benchmark Creation: Parasitics All netlists were placed and routed to generate realistic parasitics. Cadence® Encounter Digital Implementation System used. Special thanks to Ted Schroeder, Jack Ho, Dan McMullen, and Jack Erickson for enabling Cadence® tool use and valuable support! # Benchmark Creation: Timing Constraints All netlists were sized with varying clock frequencies. - Two clock frequencies with feasible results were chosen: - Fast corner. for high-performance designs - Slow corner. for low-power designs - Overall, 16 different benchmarks generated. - 8 designs with 2 corners each # Example: Path Distribution after Sizing More near-critical paths in the new benchmarks # Example (cont'd): Zoom-In to Critical Paths Many more near-critical paths for cordic\_fast ### Statistics for Evaluation Benchmarks | | Р | ins | Cells | | | | |--------------|------|------|-------|-----|-------|--| | Benchmark | In | Out | Comb | Seq | Total | | | usb_phy | 15 | 19 | 510 | 98 | 608 | | | pci_bridge32 | 160 | 201 | 28K | 3K | 31K | | | des_perf | 234 | 140 | 104K | 9К | 113K | | | netcard | 1836 | 10 | 884K | 98K | 982K | | | cordic | 34 | 64 | 42K | 1K | 43K | | | fft | 1026 | 1026 | 31K | 2K | 33K | | | matrix_mult | 3202 | 1600 | 153K | 3K | 156K | | | edit_dist | 2562 | 12 | 121K | 6K | 127K | | Each netlist has 2 different clock periods (fast and slow) # Standard Cell Library - Cell library created specifically for the gate sizing contest - Realistic non-convex timing models - Realistic discrete levels - The same library used for both ISPD'12 and ISPD'13 contests - 11 combinational functions + 1 flip flop - For each combinational cell family: - 30 different cell types/sizes: - 3 threshold voltages (Vt) - 10 sizes for each Vt - Synopsys Liberty™ format with lookup tables for delay and slew ### Cell Library: Delay and Slew Tables Delay and output slew defined as a function of input slew and output loads # Timing Infrastructure - Synopsys PrimeTime® used for final evaluations - Contestants had 3 choices: - 1. Implement own STA - 2. Call Synopsys PrimeTime® using the infrastructure we provided: 3. Implement own scripts for Synopsys PrimeTime® Special thanks to Troy Wood and Robert Hoogenstryd from Synopsys for providing academic licenses to Synopsys PrimeTime<sup>®</sup> and valuable support! # **ISPD 2013 Contest Evaluation** ### **Contest Evaluation** - Basic evaluation metrics - Violations - Power - Runtime - Two separate rankings - Primary: Quality - Secondary: Tradeoff between quality and run time ### **Evaluation Metrics: Violations** - Violations are divided into three different types - Negative slack (ps) - Sum of violations at PO and sequential inputs - Slew (ps) - Sum of violations at PO and cell input pins - Maximum capacitance (fF) - Sum of violations at cell output pin - All benchmarks can be sized without any violations ### **Evaluation Metrics: Power** - Only leakage power is considered - Total leakage power value is given by the sum of the leakage power for each cell ### **Evaluation Metrics: Runtime** - Runtime is the wall clock time from the beginning to the end of the execution of the submitted binary - All jobs running after the runtime limit is reached were killed Runtime limit = $$3h + 1h \times Roundup\left(\frac{\# gates}{40K}\right)$$ | benchmark | runtime (h) | |--------------|-------------| | usb_phy | 4 | | pci_bridge32 | 4 | | des_perf | 6 | | netcard | 28 | | cordic | 5 | | fft | 4 | | matrix_mult | 7 | | edit_dist | 7 | - Machine specification - 16 cores available for parallel execution - 49152Mb RAM # Primary Metric: Quality - The ranking metric for a benchmark is defined in lexicographic order as: - First: ∑violations - Second: ∑power (when violations are tied) - Third: Runtime (when violations and power are tied) - Sum of the ranks for each benchmark defines the final score for each team - The lowest rank sum wins the contest! # Secondary Metric: Quality/Runtime - Encourage multi-threading and optimization efficiency - Runtime limits are reduced to 1/5 of the original limits - All the solutions with the same number of violations are ranked by: $$cost = Power \times \left(0.95 + 0.05 \frac{Runtime}{Runtime_{REF}}\right)$$ - 20% runtime reduction can compensate 1% power degradation - Runtime<sub>REF</sub> value is half of the runtime limit for each benchmark - Maximum benefit of a fast program is 10% # **ISPD 2013 Contest Results** ### **Contest Awards** - Recognition and cash prizes for: - Primary metric: 1<sup>st</sup> place: \$1000 2<sup>nd</sup> place: \$500 3<sup>rd</sup> place: \$300 Secondary metric: 1<sup>st</sup> place: \$700 Note: Cash awards for the future ISPD contests will depend on the availability of funding. # Results Comparison: Small and Easy 7 out of 9 teams completed without violations # Results Comparison: Small and Hard No teams were able to complete without violations # Results Comparison: Fast vs Slow 5 out of 9 teams completed without violations # Results Comparison: Fast vs Slow 4 out of 9 teams completed without violations # Primary Ranking: Winner Teams (Slow) ### **Normalized Leakage (Slow Corner)** Missing data points are runs that completed with violations # Primary Ranking: Winner Teams (Fast) ### **Normalized Leakage (Fast Corner)** Missing data points are runs that completed with violations # Primary Metric: Detailed Ranking | Ranks of the top 3 teams for each benchmark | | | | | | | | | |---------------------------------------------|------------|-------------|------------|--|--|--|--|--| | Benchmark | First team | Second team | Third team | | | | | | | cordic_fast | 1 | 2 | 3 | | | | | | | cordic_slow | 1 | 3 | 4 | | | | | | | des_perf_fast | 1 | 2 | 5 | | | | | | | des_perf_slow | 1 | 2 | 4 | | | | | | | edit_dist_fast | 1 | 2 | 4 | | | | | | | edit_dist_slow | 1 | 2 | 4 | | | | | | | fft_fast | 1 | 2 | 3 | | | | | | | fft_slow | 1 | 2 | 4 | | | | | | | matrix_mult_fast | 9 | 2 | 3 | | | | | | | matrix_mult_slow | 1 | 4 | 2 | | | | | | | netcard_fast | 1 | 3 | 2 | | | | | | | netcard_slow | 1 | 2 | 3 | | | | | | | pci_bridge32_fast | 1 | 2 | 5 | | | | | | | pci_bridge32_slow | 1 | 2 | 4 | | | | | | | usb_phy_fast | 1 | 2 | 6 | | | | | | | usb_phy_slow | 2 | 1 | 4 | | | | | | | Sum | 25 | 35 | 60 | | | | | | # Primary Metric: 3<sup>rd</sup> Place Winner 2013 ACM International Symposium on Physical Design ### **Discrete Gate Sizing Contest** Primary Metric ### **Third Place** Team GoodTime Li-Chung Hsu and Simon Yi-Hung Chen Cheng-Kok Koh General Chair Chyn Com Cliff C. N. Sze Technical Program Chair Mustafa Ozdal Contest Chair Muste # Primary Metric: 2<sup>nd</sup> Place Winner 2013 ACM International Symposium on Physical Design ### **Discrete Gate Sizing Contest** Primary Metric ### **Second Place** Team Trident SeokHyeong Kang, Pankit Thapar, Hyein Lee, Benjamin Vandersloot, Igor Markov Cheng-Kok Koh General Chair Chyr la Cliff C. N. Sze Technical Program Chair Mustafa Ozdal Contest Chair Musto # Primary Metric: 1<sup>st</sup> Place Winner 2013 ACM International Symposium on Physical Design ### **Discrete Gate Sizing Contest** Primary Metric ### First Place Team South Brazil Guilherme Flach, Tiago Reimann, Gracieli Posser, Marcelo Johann, Ricardo Reis, Vinicius Livramento, Chrystian Guth, Renan Oliveira Netto, José Luís Güntzel Cheng-Kok Koh General Chair Chyn la Cliff C. N. Sze Technical Program Chair Mustafa Ozdal Contest Chair Huste # Secondary Metric: 1<sup>st</sup> Place Winner 2013 ACM International Symposium on Physical Design ### **Discrete Gate Sizing Contest** Secondary Metric ### **First Place** Team Trident SeokHyeong Kang, Pankit Thapar, Hyein Lee, Benjamin Vandersloot, Igor Markov Cheng-Kok Koh General Chair Chyn Com Cliff C. N. Sze Technical Program Chair Mustafa Ozda Mustafa Ozdal Contest Chair # Primary Metric: Top 5 | Team | Affiliation | Members | Score | |-----------------|------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------| | South<br>Brazil | UFRGS & UFSC, Brazil | Guilherme Flach, Tiago Reimann, Gracieli<br>Posser, Marcelo Johann, Ricardo Reis,<br>Vinicius Livramento, Chrystian Guth, Renan<br>Oliveira Netto, José Luís Güntzel | 25 | | | University of California San Diego, University of Michigan | Seokhyeong Kang, Pankit Thapar, Hyein<br>Lee, Benjamin VanderSloot, Igor L. Markov | 35 | | GoodTime | Keio University, National Chiao-<br>Tung University | Li-Chung Hsu, Simon Yi-Hung Chen | 60 (-5.71E5) * | | Team_8 | The Chinese University of Hong Kong | Wing-Kai Chow, Xu He, Jian Kuang, Ka-<br>Chun Lam, Wenzan Cai, Evangeline F. Y.<br>Young | 60 (-6.07E7) * | | Team_14 | Northwestern University | Peng Kang, Li Li, Yuankai Chen | 75 | <sup>\*</sup> The tie was broken based on the total sum of violations across all benchmarks # Secondary Metric: Top 5 | Name | Affiliation | Members | Score | |-----------------|------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------| | Trident | University of California<br>San Diego, University of<br>Michigan | Seokhyeong Kang, Pankit Thapar, Hyein Lee,<br>Benjamin VanderSloot , Igor L. Markov | 22 | | Team_8 | The Chinese University of Hong Kong | Wing-Kai Chow, Xu He, Jian Kuang, Ka-Chun Lam,<br>Wenzan Cai, Evangeline F. Y. Young | 54 | | South<br>Brazil | UFRGS & UFSC, Brazil | Guilherme Flach, Tiago Reimann, Gracieli Posser,<br>Marcelo Johann, Ricardo Reis, Vinicius Livramento,<br>Chrystian Guth, Renan Oliveira Netto, José Luís<br>Güntzel | 65 | | _ | Northwestern University | Peng Kang, Li Li, Yuankai Chen | 67 | | | Keio University, National Chiao-Tung University | Li-Chung Hsu, Simon Yi-Hung Chen | 85 | # Thank you! # **BACKUP SLIDES** # Results Summary for top 4 teams ### **Primary Ranking** | Benchmark | 1 <sup>st</sup> Place<br>(South Brazil) | | 2 <sup>nd</sup> Place<br>(Trident) | | 3 <sup>rd</sup> Place<br>(GoodTime) | | 4 <sup>th</sup> Place<br>(Team_8) | | |-------------------|-----------------------------------------|-------------|------------------------------------|-------------|-------------------------------------|-------------|-----------------------------------|-------------| | | Power (uW) | Runtime (s) | Power (uW) | Runtime (s) | Power (uW) | Runtime (s) | Power (uW) | Runtime (s) | | cordic_fast | | | | | | | | | | cordic_slow | 323791 | 5682 | 443612.5* | 5545* | 1077730.5* | 14684* | 1629373 | 18015 | | des_perf_fast | | | | | | | | | | des_perf_slow | 353005.5 | 5763 | 380437.5 | 4150 | 2391830 | 21012 | 2382234 | 21611 | | edit_dist_fast | 596322.5 | 11130 | 639013.5 | 9964 | | | 4915710 | 8108 | | edit_dist_slow | 447402.5 | 6974 | 468454 | 6476 | | | 2517290 | 25219 | | fft_fast | 226207.5 | 3129 | 321449.5 | 2115 | 637814.5 | 13763 | 723907 | 14419 | | fft_slow | 90340.5 | 2198 | 94715.5 | 1390 | 106681 | 7264 | 189880.5 | 14410 | | matrix_mult_fast | | | | | | | | | | matrix_mult_slow | 469732.5 | 14604 | 512847.5* | 12947* | 1381369 | 24977 | 3578623 | 25217 | | netcard_fast | 5317839.5 | 36795 | | | 19152003 | 99302 | | | | netcard_slow | 5302372.5 | 32964 | 5371103 | 100816 | 5245666* | 99325* | | | | pci_bridge32_fast | 96511 | 5222 | 106929.5 | 697 | | | 337526.5 | 14418 | | pci_bridge32_slow | 57894.5 | 857 | 59265.5 | 396 | 77184.5 | 13522 | 111570 | 14408 | | usb_phy_fast | 1608 | 35 | 1680.5 | 25 | 6550 | 405 | 2374 | 383 | | usb_phy_slow | 1078.5 | 35 | 1074.5 | 15 | 1112.5 | 15 | 1099.5 | 132 | - \* solutions had very small violations (< 0.1)</p> - Empty cells are benchmarks that were not sized without violations # Results Summary for top 4 teams ### Secondary Ranking | Benchmark | 1 <sup>st</sup> Place<br>(Trident) | | 2 <sup>nd</sup> Place<br>(Team_8) | | 3 <sup>rd</sup> Place<br>(South Brazil) | | 4 <sup>th</sup> Place<br>(Team_14) | | |-------------------|------------------------------------|-------------|-----------------------------------|-------------|-----------------------------------------|-------------|------------------------------------|-------------| | | Power (uW) | Runtime (s) | Power (uW) | Runtime (s) | Power (uW) | Runtime (s) | Power (uW) | Runtime (s) | | cordic_fast | | | | | | | | | | cordic_slow | 563093 | 1570 | 1961092 | 3604 | 323707* | 2979* | | | | des_perf_fast | | | | | | | | | | des_perf_slow | 395868 | 1229 | 2823880 | 4336 | 353807* | 4071* | | | | edit_dist_fast | 704822.5 | 2655 | 9769384 | 5051 | | | | | | edit_dist_slow | 489476 | 1993 | 7485664 | 5051 | 448158.5 | 4741 | | | | fft_fast | 361299.5 | 416 | 974608 | 2882 | 224530 | 1827 | | | | fft_slow | 98149.5 | 246 | 418988 | 2882 | 90316 | 1367 | 104802.5 | 739 | | matrix_mult_fast | | | | | | | | | | matrix_mult_slow | 570741.5 | 3910 | 7540344 | 5053 | | | | | | netcard_fast | | | | | | | | | | netcard_slow | 5371103 | 20178 | | | | | | | | pci_bridge32_fast | 112641.5 | 175 | 504979 | 2882 | 96511 | 1417 | 198791 | 758 | | pci_bridge32_slow | 60171.5 | 135 | 136955.5 | 2881 | 57894.5 | 575 | 62013 | 438 | | usb_phy_fast | 1644 | 15 | 2628.5 | 173 | 1608 | 35 | 2644.5 | 72 | | usb_phy_slow | 1075.5 | 15 | 1095.5 | 42 | 1078.5 | 25 | 1241 | 33 | - \* means that the solution presented very small violations (< 0.1)</p> - Empty cells are benchmarks that were not sized without violations