Reliability

Our recent research in reliability is with a focus on highly autonomous vehicle environments. With the extent of automation in vehicles, the computing components are significant. Reliability of these components is paramount, as it can be the difference between life and death. We plan to address the auto reliability problem in different phases- hardware, embedded systems and machine learning components. In hardware, microcontrollers are used in environments with a high likelihood of electrostatic discharge. The effects of such instant discharge on computing system reliability was hitherto unknown.

We have demonstrated, for the first time, the reliability hazards to high level computation from lower circuit level electrostatic discharge (ESD) using real measurements. Until this study, the past understanding of soft error sources was limited to cosmic radiations of high energy particles. Over the past few decades, reliability research has focused on that class of transient errors. With this new class of errors that we have uncovered, we have found that not only are they much more likely to occur than radiation based soft errors, but their effects are more widespread and long lasting. The computing and embedded systems reliability communities need to focus on designing for, protecting against and recovering from this serious class of errors.

Our work on ageing related early forewarning analysis was developed in collaboration with Texas Instruments, and was a silver medalist in the ACM student research competition (SRC). It was later incorporated into TI processes as well.

We fabricated a real microcontroller test chip typical to automobile systems and injected ESD errors with a physical ESD gun. The effects were studied on programs running on the microcontroller. Our experiments and measurements reveal that this class of errors can cause hazards at the higher computation levels. Also, their effects are unlike radiation based soft errors and much more serious than them. Radiation based soft errors are equivalent to a single bit flip that self corrects. On the other hand, ESD can cause widespread corruption in many registers and not recover for long.

We also developed the first method to study the effects of ageing (due to NBTI) in a chip, early in the design cycle. This is with the idea of baking in recovery mechanisms early in the design.

  1. Keven Feng, Sandeep Vora, Elyse Rosenbaum and Shobha Vasudevan. Guilty as Charged: Computational Reliability Threats Posed by Electrostatic Discharge Induced Soft Errors. Design Automation and Test in Europe (DATE 2019). To appear.
  2. Sandeep Vora, Rui Jiang, Shobha Vasudevan and Elyse Rosenbaum, Application Level Investigation of System-Level ESD-Induced Soft Failures, 38th Annual EOS/ESD Symposium 2016: 30-35.
  3. Sandeep Vora, Rui Jiang, Prajwal Vijayraj, Keven Feng, Yang Xiu, Shobha Vasudevan, Elyse Rosenbaum. Hardware and Software Combined Detection of System Level ESD Induced Soft Failures. Annual EOS/ESD symposium 2018. To appear.
  4. Jayanand Asok Kumar, Kenneth M Butler, Heesoo Kim and Shobha Vasudevan, Early prediction of NBTI effects using RTL source code analysis, Design Automation Conference (DAC) 2012: 808-813
  5. Sankar Gurumurthy, Shobha Vasudevan and J. A. Abraham. Automated Mapping of Pre-Computed Module-Level Test Sequences to Processor Instructions, International Test Conference (ITC) 2005:10-20
  6. Sankar Gurumurthy, Shobha Vasudevan and J. A. Abraham. Automated functional propagation of module level test responses,  International Test Conference (ITC) 2006: 1-9
  7. Viraj Athavale, Jayanand Asok Kumar and Shobha Vasudevan A Scalable Approach for Throughput Estimation of Timing Speculation Designs, MWSCAS 201
Our test chip OpenMSP430 micro-controller
Test board and setup