In-Field Testing Using MISR
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 
stefan d2400ccab1 starting ffr-based transient fault sim 10 hours ago
.vscode better docs and vscode config 4 weeks ago
core_jpeg@bb03cce45d jpeg core example, migrate to flake 4 weeks ago
kyupy@f863b70457 starting ffr-based transient fault sim 10 hours ago
picorv32@6d4a484b62 import and validate vcd with kyupy 2 weeks ago
sbst_programs added sbst programs 2 weeks ago
.gitignore added sbst programs 2 weeks ago
.gitmodules update submodules 3 weeks ago
.python-version init uv, use nix for synthesized picorv32 4 weeks ago
Makefile added sbst programs 2 weeks ago
README.md starting ffr-based transient fault sim 10 hours ago
flake.lock jpeg core example, migrate to flake 4 weeks ago
flake.nix jpeg core example, migrate to flake 4 weeks ago
jpeg_core_tb.v jpeg core example, migrate to flake 4 weeks ago
jpeg_core_tb_run_plasma.py use uv in #! 2 weeks ago
load_sky130_circuits.py print jpeg_core netlist stats 4 weeks ago
picorv32_vcd_import.py some useful options for vcd converter, some docs 2 weeks ago
pyproject.toml jpeg core example, migrate to flake 4 weeks ago
sim_transient_faults.py starting ffr-based transient fault sim 10 hours ago
uv.lock jpeg core example, migrate to flake 4 weeks ago

README.md

In-Field Testing Using MISR

Research code for investigations of Silent Data Corruptions (SDCs) caused by hardware faults.

Quick Start

This project has submodules. To ensure everything is up-to-date, run the following after git clone, git pull or git checkout:

git submodule init
git submodule sync
git submodule update

This project manages reproducible programming environments with:

  • uv for managing python environments.
  • nix for managing non-python tools and benchmark designs. Follow this guide or this guide to setup nix-eda binary cache to avoid re-building EDA-related tools.

Usage

To access non-Python tools such as iverilog, run nix develop before any of the commands below to enter the appropriate shell environment. Commands that only rely on Python tools work also outside a nix develop shell if uv is installed on the base system.

JPEG

Compile jpeg decoder core using iverilog and run RTL simulation of the jpeg decoder core using vvp:

make
uv run jpeg_core_tb_run_plasma.py

See uv run jpeg_core_tb_run_plasma.py --help for more options.

PicoRV32

Run picorv32's built-in testbenches (generate picorv32/testbench.vcd) with one of these commands:

make test_vcd
make test_ez_vcd

Import generated VCD with kyupy and convert it to a pattern file for later fault simulation:

uv run picorv32_vcd_import.py picorv32/testbench.vcd patterns.npy

See uv run picorv32_vcd_import.py --help for more options.

Other Benchmark Circuits

Load synthesized circuits and display statistics (example code):

uv run load_sky130_circuits.py

This script demonstrates how to obtain synthesized netlists via nix derivations published in this github repository. These circuits along with layout and timings are built on-demand using LibreLane classic flow if not yet available in local nix store.

To access the full design data (netlist, timing, layout, ...), call one of these:

nix build github:s-holst/benchmark-circuits#picorv32-sky130
nix build github:s-holst/benchmark-circuits#jpeg_core-sky130

Transient Fault Simulation

Goal: Classify fault effects into: masked, non-SDC, SDC.

  • Transient fault: A single line-flip fault that is only active for a single clock cycle.
  • Single line-flip fault: Logic value on a single signal in the circuit gets inverted. This is equivalent to a stuck-at 0(1) if the fault-free value is 1(0).
  • Trace length $l$: Number of clock cycles considered in simulation.
  • Fault sites $f$: Number of distict fault locations considered in simulation.
  • Number of distinct faults (size of the fault set) is: $l\cdot f$.

Approach:

  • PPSFP for fault injection cycle. Generates $l\cdot f$ responses (=system states right after fault injection).
  • System states are simulated for the next clock cycles as necessary (fault-free propagation of erroneous states).
  • System states are classified into:
    • error-free: state is the same as in fault-free operation (fault effect disappeared)
    • erroneous non-SDC: surrounding system (testbench) detected the error (criterions: trap signal, OOB memory access, ...)
    • erroneous potential-SDC: state differs from fault-free operation, but remains undetected by surrounding system (testbench).
  • Once a system state becomes error-free: Stop further simulation, original fault is masked.
  • Once a system state becomes erroneous non-SDC: Stop further simulation, original fault is non-SDC.
  • As long as system state remains erroneous potential-SDC, keep simulating until the end of the trace.
  • If system state is erroneous potential-SDC at the end of the trace: original fault is SDC.

Optimizations:

  • FFR: Only explicitly simulate FFR stems, reduce number of responses to $l\cdot #FFR$.
  • Independent FFR could be simulated together, but system states have to be re-constructed for checking.
  • Distributed computing: Partition over fault universe, worker tasks for erroneous potential-SDC propagation.

Some initial code is in sim_transient_faults.py. Run

uv run sim_transient_faults.py picorv32-sky130 patterns.npy