# Loading and Exploring Gate-Level Circuits

Example of parsing the bench data format to make simple gate-level circuits.

In [1]:
from kyupy import bench

# load a file
b01 = bench.load('tests/b01.bench')

# ... or specify the circuit as string 
mycircuit = bench.parse('input(a,b) output(o1,o2,o3) x=buf(a) o1=not(x) o2=buf(x) o3=buf(x)')

Circuits are objects of the class `Circuit`.

In [2]:
b01



In [3]:
mycircuit



Circuits are containers for two types of elements: nodes and lines.
* A `Node` is a named entity in a circuit (e.g. a gate, a standard cell, a named signal, or a fan-out point) that has connections to other nodes.
* A `Line` is a directional 1:1 connection between two Nodes.

Use the `dump()` method to get a string representation of all nodes and their connections.

In [4]:
print(mycircuit.dump())

None(0,1,2,3,4)
0:__fork__"a" >1
1:__fork__"b" 
2:__fork__"o1" <2 
3:__fork__"o2" <4 
4:__fork__"o3" <6 
5:buf"x" <1 >0
6:__fork__"x" <0 >3 >5 >7
7:not"o1" <3 >2
8:buf"o2" <5 >4
9:buf"o3" <7 >6


The first line of the dump starts with the circuit name ("None" for `mycircuit`), followed by the node-IDs of all the ports (inputs and outputs) of the circuit.

Each of the following lines describes one node.
Each node in the circuit has a unique ID, a type, a name, and line-connections. This information is given on each line in that order.

A line in the circuit has a unique ID, a driver node and a receiver node. The connections in the dump show the direction (">" for output, "<" for input) and the line-ID. For example in `mycircuit`: Node-0 has one output connected to Line-1, and this Line-1 is connected to the input of Node-5.

The `interface` is the list of nodes forming the ports (inputs and outputs):

In [5]:
mycircuit.interface

[0:__fork__"a" >1,
 1:__fork__"b" ,
 2:__fork__"o1" <2 ,
 3:__fork__"o2" <4 ,
 4:__fork__"o3" <6 ]

## Nodes

There are two types of nodes: __forks__ and __cells__.

Forks have the special type `__fork__` while cells can be of various types (`buf`, `not`, `and`, `nor`, etc.).
Forks are used to label signals with names and to connect a one cell to multiple other cells (fan-out).
The names among all forks and among all cells within a circuit are unique.
Thus, a fork and a cell are allowed to share the same name.

Nodes in circuits can be accessed by ID or by name.

In [6]:
mycircuit.nodes[7]

7:not"o1" <3 >2

In [7]:
mycircuit.forks['x']

6:__fork__"x" <0 >3 >5 >7

In [8]:
mycircuit.cells['x']

5:buf"x" <1 >0

Nodes have an `index` (the node ID), a `kind` (the type), a `name`, as well as `ins` (input pins) and `outs` (output pins)

In [9]:
n = mycircuit.nodes[6]
n.index, n.kind, n.name, n.ins, n.outs

(6, '__fork__', 'x', [0], [3, 5, 7])

The inputs and outputs of a node are lists containing `Line` objects.

In [10]:
type(n.ins[0])

kyupy.circuit.Line

## Lines

A line is a directional connection between one driving node (`driver`) and one reading node (`reader`).

A line also knows to which node pins it is connected to: `driver_pin`, `reader_pin`.

In [11]:
l = mycircuit.nodes[6].outs[1]
l.index, l.driver, l.reader, l.driver_pin, l.reader_pin

(5, 6:__fork__"x" <0 >3 >5 >7, 8:buf"o2" <5 >4, 1, 0)

## Basic Analysis Examples
### Cell type statistics

In [12]:
from collections import defaultdict

counts = defaultdict(int)

for n in b01.cells.values():
 counts[n.kind] += 1

print(counts)

defaultdict(, {'DFF': 5, 'AND': 1, 'NAND': 28, 'OR': 1, 'NOT': 10})


### Tracing a scan chain

In [13]:
from kyupy import verilog

b14 = verilog.load('tests/b14.v.gz')
b14



In [14]:
chain = []
cell = b14.cells['test_so000']
chain.append(cell)
while len(cell.ins) > 0:
 cell = cell.ins[2 if 'SDFF' in cell.kind else 0].driver
 if '__fork__' not in cell.kind:
 chain.append(cell)
 
print('chain length', len(chain))
for c in chain[:10]:
 print(c.kind, c.name)
print('...')
for c in chain[-10:]:
 print(c.kind, c.name)

chain length 287
output test_so000
NBUFFX8_RVT HFSBUF_36_76
SDFFARX1_RVT wr_reg
INVX4_RVT HFSINV_691_254
INVX0_RVT HFSINV_2682_255
SDFFARX1_RVT state_reg
NBUFFX2_RVT ZBUF_55_inst_860
SDFFARX1_RVT reg3_reg_28_
SDFFARX1_RVT reg3_reg_27_
SDFFARX1_RVT reg3_reg_26_
...
NBUFFX2_RVT ZBUF_1656_inst_2160
SDFFARX1_RVT IR_reg_3_
NBUFFX2_RVT ZBUF_85_inst_865
SDFFARX1_RVT IR_reg_2_
SDFFARX1_RVT IR_reg_1_
SDFFARX1_RVT IR_reg_0_
NBUFFX2_RVT ZBUF_17_inst_905
NBUFFX4_RVT ZBUF_275_inst_906
SDFFARX1_RVT B_reg
input test_si000


### Determining Logic Depth of Nodes

In [15]:
from kyupy import verilog

b14 = verilog.load('tests/b14.v.gz')
b14



Calculate logic level (logic depth, distance from inputs or scan flip-flops) for each node in the circuit.
Inputs and flip-flops themselves are level 0, **cells** driven by just inputs and flip-flops are level 1, and so on.
**Fork** nodes have the same level as their driver, because they do not increase the logic depth.

In [16]:
import numpy as np

levels = np.zeros(len(b14.nodes), dtype='uint16') # store level for each node.

for cell in b14.topological_order():
 if 'DFF' in cell.kind or 'input' == cell.kind:
 levels[cell.index] = 0
 elif '__fork__' == cell.kind:
 levels[cell.index] = levels[cell.ins[0].driver.index] # forks only have exactly one driver
 else:
 levels[cell.index] = max([levels[line.driver.index] for line in cell.ins]) + 1
 
print(f'Maximum logic depth: {np.max(levels)}')

Maximum logic depth: 112


List nodes with the highest depth and which nodes they are driving.

In [17]:
nodes_by_depth = np.argsort(levels)[::-1]

for n_idx in nodes_by_depth[:20]:
 n = b14.nodes[n_idx]
 readers = ', '.join([f'{l.reader.kind:12s} {l.reader.name:14s}' for l in n.outs])
 print(f'depth: {levels[n_idx]} node: {n.kind:12s} {n.name:6s} driving: {readers}')

depth: 112 node: __fork__ n2692 driving: SDFFARX1_RVT reg1_reg_29_ 
depth: 112 node: NAND2X0_RVT U465 driving: __fork__ n2692 
depth: 112 node: NAND2X0_RVT U562 driving: __fork__ n2724 
depth: 112 node: __fork__ n2724 driving: SDFFARX1_RVT reg0_reg_29_ 
depth: 112 node: __fork__ n2608 driving: SDFFARX1_RVT B_reg 
depth: 112 node: NAND2X0_RVT U170 driving: __fork__ n2608 
depth: 111 node: NAND2X0_RVT U5550 driving: __fork__ n2693 
depth: 111 node: __fork__ n2660 driving: SDFFARX1_RVT reg2_reg_29_ 
depth: 111 node: AND2X2_RVT U5560 driving: __fork__ n2660 
depth: 111 node: __fork__ n2725 driving: SDFFARX1_RVT reg0_reg_28_ 
depth: 111 node: __fork__ n2693 driving: SDFFARX1_RVT reg1_reg_28_ 
depth: 111 node: __fork__ n362 driving: NAND2X0_RVT U170 
depth: 111 node: NAND2X0_RVT U173 driving: __fork__ n362 
depth: 111 node: __fork__ n600 driving: NAND2X0_RVT U562 
depth: 111 node: NAND2X0_RVT U563 driving: __fork__ n600 
depth: 111 node: NAND2X0_RVT U565 driving: __fork__ n2725 
depth: 111 n

# Working With Test Data and Logic Simulation

Load a stuck-at fault test pattern set and expected fault-free responses from a STIL file.

In [18]:
from kyupy import verilog, stil
from kyupy.logic import MVArray, BPArray
from kyupy.logic_sim import LogicSim

b14 = verilog.load('tests/b14.v.gz')
s = stil.load('tests/b14.stuck.stil.gz')
stuck_tests = s.tests(b14)
stuck_responses = s.responses(b14)

Tests and responses are instances of `MVArray`. Its `length` is the number of test vectors stored, its `width` is the number of values in a vector. By default, the stil parser returns 8-valued test vectors (`m=8`).

In [19]:
stuck_tests



The internal storage (an `ndarray` of `uint8`) is accessible via `data`. The first axis is the width, and the last axis goes along the test set.

In [20]:
stuck_tests.data.shape

(306, 1081)

The subscript accessor returns a string representation of the given test vector number. Possible values are '0', '1', '-', 'X', 'R', 'F', 'P', and 'N'.

In [21]:
stuck_tests[1]

'P0--------------------11011111011001100111010101011101----------------------------------00-10111011010110011101110010111010111011101100010000110101111111011010101001010101010101010101001010110101001010101010101010110100000111111111111111011010100100101010010010101101010101001010100111010001010010000011100'

In [22]:
stuck_responses[1]

'--10000010010100010111--------------------------------0101010010101010110101001001010100--011111110011011111000111010101010111011101100010000110101111111011010101001010101010101010101001010110101001010101010101010110100000111111111111111011010100100101010010010101101010101001010101000111111111111111011101'

The order of values in the vectors correspond to the circuit's interface followed by the scan flip-flops as they appear in `b14.cells`.
The test data can be used directly in the simulators as they use the same ordering convention.

The logic simulator uses bit-parallel storage of logic values, but our loaded test data uses one `uint8` per logic value.
To convert the storage layout, we instanciate a `BPArray` for the input stimuli.
The storage layout is more compact, but individual values cannot be easily accessed anymore.

In [23]:
stuck_tests_bp = BPArray(stuck_tests)
stuck_tests_bp



In [24]:
stuck_tests_bp.data.shape

(306, 3, 136)

The following code performs a 8-valued logic simulation and stores the results in a new instance of `BPArray`.
The packed array is unpacked into an `MVArray` for value access.

In [25]:
responses_bp = BPArray((stuck_tests_bp.width, len(stuck_tests_bp)))
simulator = LogicSim(b14, sims=len(stuck_tests_bp))
simulator.assign(stuck_tests_bp)
simulator.propagate()
simulator.capture(responses_bp)
responses = MVArray(responses_bp)

In [26]:
responses[1]

'--10000010010100010111--------------------------------0101010010101010110101001001010100--011111110011011111000111010101010111011101100010000110101111111011010101001010101010101010101001010110101001010101010101010110100000111111111111111011010100100101010010010101101010101001010101000111111111111111011101'

Compare simulation results to expected fault-free responses loaded from STIL. The first test fails, because it is a flush test while simulation implicitly assumes a standard test with a capture clock.

In [27]:
matches = 0
for i in range(len(responses)):
 if responses[i] == stuck_responses[i]:
 matches += 1
 else:
 print(f'mismatch for test pattern {i}')
print(f'{matches} of {len(responses)} responses matched with simulator')

mismatch for test pattern 0
1080 of 1081 responses matched with simulator


Transition faults require test vector pairs for testing. These pairs are generated by `tests_loc`, assuming a launch-on-capture scheme (two functional clock cycles after scan-in).

In [28]:
s = stil.load('tests/b14.transition.stil.gz')
trans_tests = s.tests_loc(b14)
trans_responses = s.responses(b14)

In [29]:
trans_tests



Possible values in the string representation are: '0', '1', '-', 'X', 'R' (rising transition), 'F' (falling transition), 'P' (positive pulse(s), 010), 'N' (negative pulse(s), 101).

In [30]:
trans_tests[1]

'00--------------------RRRRRRFRRRRRRRRRRRFFRFRRRRRRRRRR----------------------------------00-00000001110100011111011010000000000000000011001001100101111110101110110001000100010100110111111101101000000111110011100010111000111R1111111111111111111111110001100100000110100000111010101110RFF00F000F0F00F00000FF01F'

We validate these patterns with an 8-valued logic simulation

In [31]:
trans_tests_bp = BPArray(trans_tests)
responses_bp = BPArray((trans_tests_bp.width, len(trans_tests_bp)))
simulator = LogicSim(b14, sims=len(trans_tests_bp))
simulator.assign(trans_tests_bp)
simulator.propagate()
simulator.capture(responses_bp)
responses = MVArray(responses_bp)

In [32]:
responses[1]

'--F00000F00F0F000F00FF--------------------------------01110101011100000101100000100110R0--0RRRRRRRNNNRNRPRNNNNNRFFRFRRRRRRR000000000011001001100101111110101110110001000100010100110111111101101000000111110011100010111000NNNNNNNNNNNNNNNNNNNNNNNNNNNNP0011001000001101000001110101011101RRRRRRRRRRRRRRRRRRRRP01R'

The responses loaded from STIL only contain the final logic values. Use simple character replacements before comparing these. First test is again a flush test.

In [33]:
matches = 0
for i in range(len(responses)):
 if trans_responses[i] == responses[i].replace('P','0').replace('N','1').replace('R','1').replace('F','0'):
 matches += 1
 else:
 print(f'mismatch for test pattern {i}')
print(f'{matches} of {len(responses)} responses matched with simulator')

mismatch for test pattern 0
1391 of 1392 responses matched with simulator


# Working With Delay Information and Timing Simulation

Delay data for gates and interconnect can be loaded from SDF files. In kyupy's timing simulators, delays are associated with the lines between nodes, not with the nodes themselves. Each line in the circuit has a rising delay, a falling delay, a negative pulse threshold, and a positive pulse threshold. 

In [34]:
from kyupy import sdf
from kyupy.saed import pin_index

df = sdf.load('tests/b14.sdf.gz')
lt = df.annotation(b14, pin_index, dataset=0, interconnect=False)

The returned delay information is an `ndarray` with a set of delay values for each line in the circuit.

In [35]:
lt.shape

(46891, 2, 2)

Number of non-0 values loaded:

In [36]:
(lt != 0).sum()

119676

The available timing simulators are `WaveSim` and `WaveSimCuda`.
They work similarly to `LogicSim` in that they evaluate all cells in topological order.
Instead of propagating a logic value, however, they propagate waveforms.

`WaveSim` uses the numba just-in-time compiler for acceleration on CPU.
It falls back to pure python if numba is not available. `WaveSimCuda` uses numba for GPU acceleration.
If no CUDA card is available, it will fall back to pure python (not jit-compiled for CPU!).
Pure python is too slow for most purposes.

Both simulators operate data-parallel.
The following instanciates a new engine for 32 independent timing simulations and each signal line in the circuit can carry at most 16 transitions. All simulators share the same circuit and the same line delay specification.

In [37]:
from kyupy.wave_sim import WaveSimCuda, TMAX
import numpy as np

wsim = WaveSimCuda(b14, lt, sims=32, wavecaps=16)

These are various memories allocated, with waveforms usually being the largest. 

In [38]:
def print_mem(name, arr):
 print(f'{name}: {arr.size * arr.itemsize / 1024:.1f} kiB')
 
print_mem('Waveforms ', wsim.state)
print_mem('State Allocation Table ', wsim.sat)
print_mem('Circuit Timing ', wsim.timing)
print_mem('Circuit Netlist ', wsim.ops)
print_mem('Capture Data ', wsim.cdata)
print_mem('Test Stimuli Data ', wsim.tdata)

Waveforms : 93908.5 kiB
State Allocation Table : 1113.4 kiB
Circuit Timing : 1484.5 kiB
Circuit Netlist : 732.7 kiB
Capture Data : 267.8 kiB
Test Stimuli Data : 3.6 kiB


This is a typical simulation loop where the number of patterns is larger than the number of simulators available.
We simulate `trans_tests_bp`.
The timing simulator accepts 8-valued `BPArray`s, but it will return response (capture) data in a different format.

In [39]:
sims = 128 # len(trans_tests_bp) # Feel free to simulate all tests if CUDA is set up correctly.

cdata = np.zeros((len(wsim.interface), sims, 7)) # space to store all capture data

for offset in range(0, sims, wsim.sims):
 wsim.assign(trans_tests_bp, offset=offset)
 wsim.propagate(sims=sims-offset)
 wsim.capture(time=2.5, cdata=cdata, offset=offset) # capture at time 2.5

The capture data contains for each PI, PO, and scan flip-flop (axis 0), and each test (axis 1) seven values:
0. Probability of capturing a 1 at the given capture time (same as next value, if no standard deviation given).
1. A capture value decided by random sampling according to above probability.
2. The final value (assume a very late capture time).
3. True, if there was a premature capture (capture error), i.e. final value is different from captured value.
4. Earliest arrival time. The time at which the output transitioned from its initial value.
5. Latest stabilization time. The time at which the output transitioned to its final value.
6. Overflow indicator. If non-zero, some signals in the input cone of this output had more transitions than specified in `wavecaps`. Some transitions have been discarded, the final values in the waveforms are still valid.

In [40]:
cdata.shape

(306, 128, 7)

For validating against known logic values, take `cdata[...,1]`.

In [41]:
matches = 0

for i in range(cdata.shape[1]):
 response = ''.join('1' if x > 0.5 else '0' for x in cdata[..., i, 1])
 if trans_responses[i].replace('-','0') == response:
 matches += 1
 else:
 print(f'mismatch for test pattern {i}')
print(f'{matches} of {cdata.shape[1]} responses matched with simulator')

mismatch for test pattern 0
127 of 128 responses matched with simulator


The circuit delay is the maximum among all latest stabilization times:

In [42]:
cdata[...,5].max()

2.0610005855560303

Check for overflows. If too many of them occur, increase `wavecaps` during engine instanciation:

In [43]:
cdata[...,6].sum()

0.0

Check for capture failures:

In [44]:
cdata[...,3].sum()

0.0

# CUDA Support Notes

Try this code to check if CUDA is set up correctly.

If there is an error related to `nvvm`, you probably need to set up some environment variables:
```
%env LD_LIBRARY_PATH=/usr/local/cuda/lib64
%env CUDA_HOME=/usr/local/cuda
```
If problems persist, refer to documentations for numba and cuda. 

In [45]:
from numba import cuda

cuda.detect()

Found 1 CUDA devices
id 0 b'TITAN V' [SUPPORTED]
 compute capability: 7.0
 pci device id: 0
 pci bus id: 2
Summary:
	1/1 devices are supported


True