# In-Field Testing Using MISR Research code for investigations of Silent Data Corruptions (SDCs) caused by hardware faults. ## Quick Start This project has submodules. To ensure everything is up-to-date, run the following after `git clone`, `git pull` or `git checkout`: ``` git submodule init git submodule sync git submodule update ``` This project manages reproducible programming environments with: - [uv](https://docs.astral.sh/uv/) for managing python environments. - [nix](https://nixos.org) for managing non-python tools and benchmark designs. Follow [this guide](https://librelane.readthedocs.io/en/stable/installation/nix_installation/index.html) or [this guide](https://github.com/fossi-foundation/nix-eda/blob/main/docs/installation.md) to setup [nix-eda](https://github.com/fossi-foundation/nix-eda/tree/main) binary cache to avoid re-building EDA-related tools. ## Usage To access non-Python tools such as `iverilog`, run `nix develop` before any of the commands below to enter the appropriate shell environment. Commands that only rely on Python tools work also outside a `nix develop` shell if [uv](https://docs.astral.sh/uv/) is installed on the base system. ### JPEG Compile jpeg decoder core using `iverilog` and run RTL simulation of the jpeg decoder core using `vvp`: ``` make uv run jpeg_core_tb_run_plasma.py ``` See `uv run jpeg_core_tb_run_plasma.py --help` for more options. ### PicoRV32 Run picorv32's built-in testbenches (generate `picorv32/testbench.vcd`) with one of these commands: ``` make test_vcd make test_ez_vcd ``` Import generated VCD with kyupy and convert it to a pattern file for later fault simulation: ``` uv run picorv32_vcd_import.py picorv32/testbench.vcd patterns.npy ``` See `uv run picorv32_vcd_import.py --help` for more options. ### Other Benchmark Circuits Load synthesized circuits and display statistics (example code): ``` uv run load_sky130_circuits.py ``` This script demonstrates how to obtain synthesized netlists via nix derivations [published in this github repository](https://github.com/s-holst/benchmark-circuits). These circuits along with layout and timings are built on-demand using [LibreLane](https://fossi-foundation.org/librelane/) classic flow if not yet available in local nix store. To access the full design data (netlist, timing, layout, ...), call one of these: ``` nix build github:s-holst/benchmark-circuits#picorv32-sky130 nix build github:s-holst/benchmark-circuits#jpeg_core-sky130 ``` ## Transient Fault Simulation Goal: Classify fault effects into: masked, non-SDC, SDC. - *Transient fault:* A single line-flip fault that is only active for a single clock cycle. - *Single line-flip fault:* Logic value on a single signal in the circuit gets inverted. This is equivalent to a stuck-at 0(1) if the fault-free value is 1(0). - *Trace length $l$:* Number of clock cycles considered in simulation. - *Fault sites $f$:* Number of distict fault locations considered in simulation. - Number of distinct faults (size of the fault set) is: $l\cdot f$. Approach: - PPSFP for fault injection cycle. Generates $l\cdot f$ responses (=system states right after fault injection). - System states are simulated for the next clock cycles as necessary (fault-free propagation of erroneous states). - System states are classified into: - *error-free:* state is the same as in fault-free operation (fault effect disappeared) - *erroneous non-SDC:* surrounding system (testbench) detected the error (criterions: trap signal, OOB memory access, ...) - *erroneous potential-SDC:* state differs from fault-free operation, but remains undetected by surrounding system (testbench). - Once a system state becomes *error-free*: Stop further simulation, original fault is *masked*. - Once a system state becomes *erroneous non-SDC*: Stop further simulation, original fault is *non-SDC*. - As long as system state remains *erroneous potential-SDC*, keep simulating until the end of the trace. - If system state is *erroneous potential-SDC* at the end of the trace: original fault is *SDC*. Optimizations: - FFR: Only explicitly simulate FFR stems, reduce number of responses to $l\cdot \#FFR$. - Independent FFR could be simulated together, but system states have to be re-constructed for checking. - Distributed computing: Partition over fault universe, worker tasks for erroneous potential-SDC propagation. Some initial code is in `sim_transient_faults.py`. Run ``` uv run sim_transient_faults.py picorv32-sky130 patterns.npy ```