Debug and Root Causing

Post-silicon validation is a crucial component of contemporary System-on-Chip (SoC) validation and debug that is performed in an unsystematic and ad hoc way under aggressive time-to-market schedules and incur disproportionate costs. Post-silicon validation is the final gateway to detect, localize, and debug deep state space bugs before mass production is commenced. Limited observability is a key obstacle in post-silicon validation and debug that seriously hinders observation of various internal signals during post-silicon execution. Consequently, post-silicon validation of SoC involves intense apriori planning to instrument the design to ensure that relevant signals are observable. The price of insufficient instrumentation is a silicon re-spin, which is typically infeasible. The natural questions that arise are what signals should one instrument? what should be the design abstraction at which the observability selection analysis be made? how to make observability selection scalable to industrial scale SoCs? what are the collaterals that can be exploited for an informed and educated observability selection? Our current research solutions address these questions and bridges the widening gap between academic research and current and future industrial requirements for SoC post-silicon validation.

Our paper demonstrating the futility of traditional metrics in literature got a best paper nomination in ICCAD 2015. Our paper scaling a top-down methodology in silicon verification to a contemporary SoC design was nominated for best paper in DAC 2018. DAC is the premier conference in design automation. It was one of 6 nominated papers in over 700 submissions. This work was done in collaboration with Intel as well as IBM.

Our methodology and results on diagnosing system level bugs were in collaboration with Huawei Technologies and later, Samsung Technologies. We started out with 7-10GB trace files, and no systematic approach to analyze them. At the end of the collaboration, we had a push button approach to isolate the reasons for transaction delays and latencies, and shortlist them as candidates for inspection. In some cases, our analysis was the difference between two weeks of manual inspection and a couple of hours with the RootSys methodology.

We use a mutual information gain based method to select observable at the application level. We proposed a new method to select observable at the behavioral level by extending Google’s PageRank algorithm that does not optimize for traditional metrics. Our application-level solution was the first to scale hardware tracing from tiny ISCAS89 benchmarks containing a few thousand flops to industrial scale OpenSPARC T2 SoC containing more than a million flops. Using observable selected at the application-level, we were able to prune out up to 89% of potential root causes and focused debugging to no more than 55% of IP pairs participating in the post-silicon execution.

  1. Debjit Pal, Abhishek Sharma, Sandip Ray, Flavio M. de Paula, Shobha Vasudevan. Application level hardware tracing for scaling post-silicon debug. Design Automation Conference (DAC) 2018: 92:1-92:6. Best paper nomination.
  2. Sai Ma, Debjit Pal, Rui Jiang, Sandip Ray and Shobha Vasudevan, Can’t see the forest for the trees: State restoration’s limitations in selecting signals for post-silicon debug.  International Conference on Computer Aided Design (ICCAD) 2015: 1-8. Best paper nomination.
  3. Lingyi Liu, Xuanyu Zhong, Xiaotao Chen and Shobha Vasudevan, Diagnosing Root Causes of System Level Performance Violations, International Conference on Computer Aided Design (ICCAD) 2013: 295-302
  4. Debjit Pal and Shobha Vasudevan, Emphasizing Functional Relevance Over State Restoration in Post-silicon Signal Tracing. Accepted. To appear in IEEE Transactions on CAD of Integrated Circuits and Systems (IEEE TCAD)
  5. Jayanand Asok Kumar, Seyed Nematollah Ahmadyan and Shobha Vasudevan, Efficient statistical model checking of hardware circuits with multiple failure regions. IEEE Transactions on CAD of Integrated Circuits and Systems (IEEE TCAD) 33(6):945-958 (2014)
  6. Debjit Pal and Shobha Vasudevan, Symptomatic Bug Localization for Functional Debug of Hardware Designs (VLSI Design). International Conference on VLSI Design 2016: 517-522.
Debug and root causing at system level
Pruned out potential root causes
Scalable and efficient signal selection
Correlation between behavioral coverage and traditional metric