# KyuPy Introduction

Working with KyuPy's basic data structures.

## Gate-Level Circuits

KyuPy has parsers for:

* The [ISCAS'89 Benchmark Format](https://www.researchgate.net/profile/Franc-Brglez/publication/224723140_Combination_profiles_of_sequential_benchmark_circuits) ".bench"
* Non-hierarchical gate-level verilog

Files can be loaded using `load(file)`, strings can be parsed using `parse(text)`.

In [1]:
from kyupy import bench, verilog

# load a file
b14 = verilog.load('../tests/b14.v.gz')

# ... or specify the circuit as string 
adder = bench.parse('''
INPUT(a, b)
OUTPUT(s)
cin = DFF(cout)
axb = XOR(a, b)
s = XOR(axb, cin)
aab = AND(a, b)
axbacin = AND(axb, cin)
cout = OR(aab, axbacin)
''', name='adder')

# 0000000.000 W Numba unavailable. Falling back to pure Python.


They return KyuPy's intermediate prepresentation of the circuit graph (objects of class `Circuit`):

In [2]:
b14

{name: "b14", cells: 15873, forks: 15842, lines: 46891, io_nodes: 91}

In [3]:
adder

{name: "adder", cells: 6, forks: 8, lines: 17, io_nodes: 3}

Apparently, circuits contain `cells`, `forks`, `lines`, and `io_nodes`.

### Cells and Forks

Let's explore cells and forks for the adder circuit.

There are dictionary-mappings from names to these objects:

In [4]:
adder.cells

{'cin': 4:DFF"cin" <1 >0,
 'axb': 6:XOR"axb" <3 <4 >2,
 's': 8:XOR"s" <6 <7 >5,
 'aab': 9:AND"aab" <9 <10 >8,
 'axbacin': 11:AND"axbacin" <12 <13 >11,
 'cout': 13:OR"cout" <15 <16 >14}

In [5]:
adder.forks

{'a': 0:__fork__"a"  >3 >9,
 'b': 1:__fork__"b"  >4 >10,
 's': 2:__fork__"s" <5 ,
 'cout': 3:__fork__"cout" <14 >1,
 'cin': 5:__fork__"cin" <0 >7 >13,
 'axb': 7:__fork__"axb" <2 >6 >12,
 'aab': 10:__fork__"aab" <8 >15,
 'axbacin': 12:__fork__"axbacin" <11 >16}

(For bench-files, the names of gates equal the names of the signals they produce. In verilog files, the names can be different.)

In [6]:
adder.cells['axb']

6:XOR"axb" <3 <4 >2

In [7]:
adder.forks['axb']

7:__fork__"axb" <2 >6 >12

Cells and forks are instances of class `Node`, which represent *things* that are connected to one or more other *things* in the circuit.

* A **cell** represents a gate or a standard cell.
* A **fork** represents a named signal or a fan-out point (connecting the output of one cell to multiple other cells or forks).

`Node`-objects have an `index`, a `kind`, and a `name`.

In [8]:
adder.cells['axb'].index, adder.cells['axb'].kind, adder.cells['axb'].name

(6, 'XOR', 'axb')

*Forks* are `Node`-objects of the special kind `__fork__`.

*Cells* are `Node`-objects of any other kind. A kind is just a string and can be anything.

The namespaces of *forks* and *cells* are separate:
* A *cell* and a *fork* **can** have the same name.
* Two *cells* or two *forks* **cannot** have the same name.

In [9]:
adder.forks['axb'].index, adder.forks['axb'].kind, adder.forks['axb'].name

(7, '__fork__', 'axb')

The `index` of a *node* in a circuit is unique and consecutive.

Also *forks* and *cells* have all separate indices.

Nodes can be accessed by their index using the `nodes` list:

In [10]:
adder.nodes

[0:__fork__"a"  >3 >9,
 1:__fork__"b"  >4 >10,
 2:__fork__"s" <5 ,
 3:__fork__"cout" <14 >1,
 4:DFF"cin" <1 >0,
 5:__fork__"cin" <0 >7 >13,
 6:XOR"axb" <3 <4 >2,
 7:__fork__"axb" <2 >6 >12,
 8:XOR"s" <6 <7 >5,
 9:AND"aab" <9 <10 >8,
 10:__fork__"aab" <8 >15,
 11:AND"axbacin" <12 <13 >11,
 12:__fork__"axbacin" <11 >16,
 13:OR"cout" <15 <16 >14]

In [11]:
adder.nodes[6], adder.nodes[7]

(6:XOR"axb" <3 <4 >2, 7:__fork__"axb" <2 >6 >12)

### Lines

A `Line` is a directional 1:1 connection between two Nodes.

A line has a circuit-unique and consecutive `index` just like nodes.

Line and node indices are different!

There is a `lines` list. If a line is printed, it just outputs its index:

In [12]:
adder.lines

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]

A line one `driver`-node and one `reader`-node:

In [13]:
adder.lines[2].driver, adder.lines[2].reader

(6:XOR"axb" <3 <4 >2, 7:__fork__"axb" <2 >6 >12)

Nodes show their connections to the lines with direction ("<" for input, ">" for output) and the line index.

In the example above, line 2 connects the output of cell "axb" to the input of fork "axb".

The input connections and output connections of a node are ordered lists of lines called `ins` and `outs`:

In [14]:
adder.cells['axb'].ins, adder.cells['axb'].outs

([3, 4], [2])

A line also stores its positions in the connection lists in `driver_pin` and `reader_pin`:

In [15]:
adder.lines[2].driver_pin, adder.lines[2].reader_pin, adder.lines[4].reader_pin

(0, 0, 1)

### IO_Nodes

Any node in the circuit can be designated as a primary input or primary output by adding it to the `io_nodes` list:

In [16]:
adder.io_nodes

[0:__fork__"a"  >3 >9, 1:__fork__"b"  >4 >10, 2:__fork__"s" <5 ]

It is common that io_nodes either have only output connections (in a role as primary-input) or only input connections (in a role as primary-output).

Inputs and outputs appear in the order they were defined in the loaded file. Inputs and outputs are often interspersed.

A related list is `s_nodes`. It contains the io_nodes at the beginning and adds all sequential elements (flip-flops, latches).

In [17]:
adder.s_nodes

[0:__fork__"a"  >3 >9,
 1:__fork__"b"  >4 >10,
 2:__fork__"s" <5 ,
 4:DFF"cin" <1 >0]

### Basic Circuit Navigation

A circuit can be traversed easily using the properties of `Circuit`, `Node`, and `Line`.

In [18]:
adder.io_nodes[0].outs[0].reader

6:XOR"axb" <3 <4 >2

In [19]:
for line in adder.io_nodes[0].outs:
    print(line.reader)

6:XOR"axb" <3 <4 >2
9:AND"aab" <9 <10 >8


In [20]:
adder.cells['cin'].ins[0].driver.name

'cout'

Let's continue with `b14` loaded before. It has 91 io_nodes:

In [21]:
b14, b14.io_nodes[:20]

({name: "b14", cells: 15873, forks: 15842, lines: 46891, io_nodes: 91},
 [31587:input"clock"  >15805,
  31589:input"reset"  >15806,
  31591:output"addr[19]" <46836 ,
  31592:output"addr[18]" <46837 ,
  31593:output"addr[17]" <46838 ,
  31594:output"addr[16]" <46839 ,
  31595:output"addr[15]" <46840 ,
  31596:output"addr[14]" <46841 ,
  31597:output"addr[13]" <46842 ,
  31598:output"addr[12]" <46843 ,
  31599:output"addr[11]" <46844 ,
  31600:output"addr[10]" <46845 ,
  31601:output"addr[9]" <46846 ,
  31602:output"addr[8]" <46847 ,
  31603:output"addr[7]" <46848 ,
  31604:output"addr[6]" <46849 ,
  31605:output"addr[5]" <46850 ,
  31606:output"addr[4]" <46851 ,
  31607:output"addr[3]" <46852 ,
  31608:output"addr[2]" <46853 ])

and even more sequential nodes:

In [22]:
len(b14.s_nodes)

306

The `io_locs(prefix)` and `s_locs(prefix)` methods return the locations of signals, busses and registers in `io_nodes` and `s_nodes`:

In [23]:
b14.io_locs('reset')

1

In [24]:
b14.io_locs('addr')

[21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2]

In [25]:
[b14.s_nodes[i] for i in b14.s_locs('IR_reg')]

[1130:SDFFARX1_RVT"IR_reg_0_" <16917 <16919 <16918 <16920 <16921 >566 >567,
 1202:SDFFARX1_RVT"IR_reg_1_" <17052 <17054 <17053 <17055 <17056 >611 >612,
 1124:SDFFARX1_RVT"IR_reg_2_" <16907 <16909 <16908 <16910 <16911 >562 >563,
 1127:SDFFARX1_RVT"IR_reg_3_" <16912 <16914 <16913 <16915 <16916 >564 >565,
 1199:SDFFARX1_RVT"IR_reg_4_" <17047 <17049 <17048 <17050 <17051 >609 >610,
 1196:SDFFARX1_RVT"IR_reg_5_" <17042 <17044 <17043 <17045 <17046 >607 >608,
 1193:SDFFARX1_RVT"IR_reg_6_" <17037 <17039 <17038 <17040 <17041 >605 >606,
 1190:SDFFARX1_RVT"IR_reg_7_" <17032 <17034 <17033 <17035 <17036 >603 >604,
 1187:SDFFARX1_RVT"IR_reg_8_" <17027 <17029 <17028 <17030 <17031 >601 >602,
 1184:SDFFARX1_RVT"IR_reg_9_" <17022 <17024 <17023 <17025 <17026 >599 >600,
 1181:SDFFARX1_RVT"IR_reg_10_" <17017 <17019 <17018 <17020 <17021 >597 >598,
 1178:SDFFARX1_RVT"IR_reg_11_" <17012 <17014 <17013 <17015 <17016 >595 >596,
 1175:SDFFARX1_RVT"IR_reg_12_" <17007 <17009 <17008 <17010 <17011 >593 >594,
 1172:SDF

**Example: Tracing out a scan chain.**

We start at the output of the scan chain "test_so000", then go backwards through the circuit.

When we encounter a scan cell "SDFF", we continue with the "SI" pin, which has index 2.

In [26]:
chain = [cell := b14.cells['test_so000']]
while len(cell.ins) > 0:
    chain.append(cell := cell.ins[2 if cell.kind.startswith('SDFF') else 0].driver)
        
print(f'length (with forks): {len(chain)}')
print(f'length (without forks): {len(list(filter(lambda n: n.kind != "__fork__", chain)))}')
print(f'length only SDFF: {len(list(filter(lambda n: n.kind.startswith("SDFF"), chain)))}')

names = [f'{c.kind}"{c.name}"' for c in chain]
print(' '.join(names[:10]) + ' ... ' + ' '.join(names[-10:]))

length (with forks): 573
length (without forks): 287
length only SDFF: 215
output"test_so000" __fork__"test_so000" NBUFFX8_RVT"HFSBUF_36_76" __fork__"aps_rename_215_" SDFFARX1_RVT"wr_reg" __fork__"HFSNET_169" INVX4_RVT"HFSINV_691_254" __fork__"HFSNET_170" INVX0_RVT"HFSINV_2682_255" __fork__"state" ... __fork__"IR[0]" SDFFARX1_RVT"IR_reg_0_" __fork__"ZBUF_17_16" NBUFFX2_RVT"ZBUF_17_inst_905" __fork__"ZBUF_275_16" NBUFFX4_RVT"ZBUF_275_inst_906" __fork__"B" SDFFARX1_RVT"B_reg" __fork__"test_si000" input"test_si000"


There is a generator for **traversing the circuit in topological order**.

The following loop prints all nodes:
* starting with primary inputs (nodes that don't have any input connections) and sequential elements,
* and continuing with nodes who's inputs are connected only to already printed nodes.

In [27]:
for n in adder.topological_order():
    print(n)

0:__fork__"a"  >3 >9
1:__fork__"b"  >4 >10
4:DFF"cin" <1 >0
6:XOR"axb" <3 <4 >2
9:AND"aab" <9 <10 >8
5:__fork__"cin" <0 >7 >13
7:__fork__"axb" <2 >6 >12
10:__fork__"aab" <8 >15
8:XOR"s" <6 <7 >5
11:AND"axbacin" <12 <13 >11
2:__fork__"s" <5 
12:__fork__"axbacin" <11 >16
13:OR"cout" <15 <16 >14
3:__fork__"cout" <14 >1


**Example: Determining logic level (distance from inputs or sequential elements) of nodes.**

Inputs and flip-flops themselves are level 0, *cells* driven by just inputs and flip-flops are level 1, and so on.
*Fork* nodes have the same level as their driver, because they do not increase the logic depth.

In [28]:
import numpy as np

levels = np.zeros(len(b14.nodes), dtype=np.uint32)  # store level for each node.

for n in b14.topological_order():
    if 'DFF' in n.kind or len(n.ins) == 0:
        levels[n] = 0
    elif n.kind == '__fork__':
        levels[n] = levels[n.ins[0].driver]  # forks only have exactly one driver
    else:
        levels[n] = max([levels[line.driver] for line in n.ins]) + 1
        
print(f'Maximum logic depth: {np.max(levels)}')

Maximum logic depth: 112


List nodes with the highest depth and which nodes they are driving.

In [29]:
nodes_by_depth = np.argsort(levels)[::-1]

for n_idx in nodes_by_depth[:20]:
    n = b14.nodes[n_idx]
    readers = ', '.join([f'{l.reader.kind:12s} {l.reader.name:14s}' for l in n.outs])
    print(f'depth: {levels[n_idx]} node: {n.kind:12s} {n.name:6s} driving: {readers}')

depth: 112 node: __fork__     n2692  driving: SDFFARX1_RVT reg1_reg_29_  
depth: 112 node: NAND2X0_RVT  U465   driving: __fork__     n2692         
depth: 112 node: NAND2X0_RVT  U562   driving: __fork__     n2724         
depth: 112 node: __fork__     n2724  driving: SDFFARX1_RVT reg0_reg_29_  
depth: 112 node: __fork__     n2608  driving: SDFFARX1_RVT B_reg         
depth: 112 node: NAND2X0_RVT  U170   driving: __fork__     n2608         
depth: 111 node: NAND2X0_RVT  U5550  driving: __fork__     n2693         
depth: 111 node: __fork__     n2660  driving: SDFFARX1_RVT reg2_reg_29_  
depth: 111 node: AND2X2_RVT   U5560  driving: __fork__     n2660         
depth: 111 node: __fork__     n2725  driving: SDFFARX1_RVT reg0_reg_28_  
depth: 111 node: __fork__     n2693  driving: SDFFARX1_RVT reg1_reg_28_  
depth: 111 node: __fork__     n362   driving: NAND2X0_RVT  U170          
depth: 111 node: NAND2X0_RVT  U173   driving: __fork__     n362          
depth: 111 node: __fork__     n600   d

## Working With Logic Values

Sequential states of circuits, signals, and test patterns contain logic values.

KyuPy provides some useful tools to deal with 2-valued, 4-valued, and 8-valued logic data.

All logic values are stored in numpy arrays of dtype `np.uint8`.

There are two storage formats:
* `mv` (for "multi-valued"): Each logic value is stored as uint8
* `bp` (for "bit-parallel"): Groups of 8 logic values are stored as three uint8

### `mv` Arrays

Suppose we want to simulate the adder circuit with 2 inputs, 1 output and 1 flip-flop.

In [30]:
adder.s_nodes

[0:__fork__"a"  >3 >9,
 1:__fork__"b"  >4 >10,
 2:__fork__"s" <5 ,
 4:DFF"cin" <1 >0]

We can construct a set of vectors using the `mvarray` helper function.

Each vector has 2 elements, one for each io_node and sequential element.

This would be an exhaustive vector set (the output in `s_nodes` remains unassigned ("-")):

In [31]:
from kyupy.logic import mvarray

inputs = mvarray('00-0', '10-0', '01-0', '11-0', '00-1', '10-1', '01-1', '11-1')
inputs

array([[0, 3, 0, 3, 0, 3, 0, 3],
       [0, 0, 3, 3, 0, 0, 3, 3],
       [2, 2, 2, 2, 2, 2, 2, 2],
       [0, 0, 0, 0, 3, 3, 3, 3]], dtype=uint8)

The numeric values in this array are defined in `kyupy.logic`.

The **last** axis is always the number of vectors. It may be unintuitive at first, but it is more convenient for data-parallel simulations.

The **second-to-last** axis corresponds to `s_nodes`. I.e., the first row is for input 'a', the second row for input 'b', and so on.

In [32]:
inputs.shape

(4, 8)

Get a string representation of a vector set. Possible values are '0', '1', '-', 'X', 'R', 'F', 'P', and 'N'.

In [33]:
from kyupy.logic import mv_str

print(mv_str(inputs))

00-0
10-0
01-0
11-0
00-1
10-1
01-1
11-1


Load a stuck-at fault test pattern set and expected fault-free responses from a STIL file. It contains 1081 test vectors.

In [34]:
from kyupy import stil

s = stil.load('../tests/b14.stuck.stil.gz')
stuck_tests = s.tests(b14)
stuck_responses = s.responses(b14)

In [35]:
len(b14.s_nodes)

306

In [36]:
stuck_tests.shape

(306, 1081)

In [37]:
stuck_responses.shape

(306, 1081)

In [38]:
print(mv_str(stuck_tests[:,:5]))

00--------------------00101011101101111011100010101100----------------------------------00-11110011001100110001100110011000100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110110011001100110011001100110011001001100110011001100111000
P0--------------------11011111011001100111010101011101----------------------------------00-10111011010110011101110010111010111011101100010000110101111111011010101001010101010101010101001010110101001010101010101010110100000111111111111111011010100100101010010010101101010101001010100111010001010010000011100
P0--------------------00001101000101011100111111111111----------------------------------00-10000001000000010100000000000000110110110111111010101000101100101110001111101001110110100000110101001000100101000101010101001000011110110111111111000001111000010000101100010000100010100100011111101010001101000100011
P0--------------------11011111111110101011001111101111-------------------------

In [39]:
print(mv_str(stuck_responses[:,:5]))

--11001100110011001100--------------------------------0011001100110011001100110011001110--011110011001100110001100110011000100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110110011001100110011001100110011001001100110011001100111000
--10000010010100010111--------------------------------0101010010101010110101001001010100--011111110011011111000111010101010111011101100010000110101111111011010101001010101010101010101001010110101001010101010101010110100000111111111111111011010100100101010010010101101010101001010101000111111111111111011101
--01000101100010101111--------------------------------1000100101000100001000110100001010--001000001111111101000000000000000110110110111111010101000101100101110001111101001110110100000110101001000100101000101010101001000011001110111111111000001111000010000101100010000100010100100011111000111110100111000010
--11001010001111010110--------------------------------1010101000010001000111001

The order of values in the vectors correspond to the circuit's `s_nodes`.
The test data can be used directly in the simulators as they use the same ordering convention.

`stuck_tests` has values for all primary inputs and scan flip-flops, `stuck_responses` contains the expected values for all primary outputs and scan flip-flops.

Since this is a static test, only '0' and '1' are used with the exception of the clock input, which has a positive pulse 'P'.

A transition fault test is a dynamic test that also contains 'R' for rising transition and 'F' for falling transition:

In [40]:
s = stil.load('../tests/b14.transition.stil.gz')
transition_tests = s.tests_loc(b14)
transition_responses = s.responses(b14)

In [41]:
print(mv_str(transition_tests[:,:5]))

XX--------------------XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX----------------------------------XX-11110011001100110001100110011000100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110110011001100110011001100110011001001100110011001100111000
00--------------------RRRRRRFRRRRRRRRRRRFFRFRRRRRRRRRR----------------------------------00-00000001110100011111011010000000000000000011001001100101111110101110110001000100010100110111111101101000000111110011100010111000111R1111111111111111111111110001100100000110100000111010101110RFF00F000F0F00F00000FF01F
00--------------------11R111110R0RR0R1110R01R1R001FRRR----------------------------------0F-RR0R00000000RR11R0RRR000R0R000R0010100010001011000111001000010001010111010101010100000000100001011100100001100011110110100000010011000011111100010111100010010111110100011100100011010000010111F11F0F01RRR110F0R01R011R
00--------------------RRFRRFR100FR10R010F10FR1111F111R-------------------------

In [42]:
print(mv_str(transition_responses[:,:5]))

--11001100110011001100--------------------------------0011001100110011001100110011001110--011110011001100110001100110011000100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110110011001100110011001100110011001001100110011001100111000
--00000000000000000000--------------------------------0111010101110000010110000010011010--011111111111110111111100101111111000000000011001001100101111110101110110001000100010100110111111101101000000111110011100010111000111111111111111111111111111100011001000001101000001110101011101111111111111111111110011
--11010001111110000110--------------------------------1101000001011000100111000101111110--000010111111100000100011101011100010100010001011000111001000010001010111010101010100000000100001011100100001100011110110100000010011100011111100010111100010010111110100011100100011010000010110011000011111100010111110
--11111101011011010010--------------------------------1001101000001001000101001

### `bp` Arrays

The logic simulator uses bit-parallel storage of logic values, but our loaded test data uses one `uint8` per logic value.

Use `mv_to_bp` to convert mv data to the bit-parallel storage layout.
Bit-parallel storage is more compact, but individual values cannot be easily accessed anymore.

In [43]:
from kyupy.logic import mv_to_bp, bp_to_mv

stuck_tests_bp = mv_to_bp(stuck_tests)

In [44]:
stuck_tests_bp.data.shape

(306, 3, 136)

Instead of 1081 bytes per s_node, bit-parallel storage only uses 3*136=408 bytes.

The reverse operation is `bp_to_mv`. Note that the number of vectors may be rounded up to the next multiple of 8:

In [45]:
bp_to_mv(stuck_tests_bp).shape

(306, 1088)

## Logic Simulation

The following code performs a 8-valued logic simulation on all 1081 vectors for one clock cycle.

In [46]:
from kyupy.logic_sim import LogicSim

sim = LogicSim(b14, sims=stuck_tests.shape[-1])  # 1081 simulations in parallel
sim.s[0] = stuck_tests_bp
sim.s_to_c()
sim.c_prop()
sim.c_to_s()
sim_responses = bp_to_mv(sim.s[1])[...,:stuck_tests.shape[-1]]  # trim from 1088 -> 1081

In [47]:
print(mv_str(sim_responses[:,:5]))

--11001100110011001100--------------------------------0011001100110011001100110011001110--000110110110111010111011100010100100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110110011001100110011001100110011001100110011001100110011001
--10000010010100010111--------------------------------0101010010101010110101001001010100--011111110011011111000111010101010111011101100010000110101111111011010101001010101010101010101001010110101001010101010101010110100000111111111111111011010100100101010010010101101010101001010101000111111111111111011101
--01000101100010101111--------------------------------1000100101000100001000110100001010--001000001111111101000000000000000110110110111111010101000101100101110001111101001110110100000110101001000100101000101010101001000011001110111111111000001111000010000101100010000100010100100011111000111110100111000010
--11001010001111010110--------------------------------1010101000010001000111001

Compare simulation results to expected fault-free responses loaded from STIL.

The first test fails, because it is a flush test while simulation implicitly assumes a standard test with a capture clock.

The remaining 1080 responses are identical.

In [48]:
np.sum(np.min(sim_responses == stuck_responses, axis=0))

1080

Same simulation for the transition-fault test set:

In [49]:
sim = LogicSim(b14, sims=transition_tests.shape[-1])  # 1392 simulations in parallel
sim.s[0] = mv_to_bp(transition_tests)
sim.s_to_c()
sim.c_prop()
sim.c_to_s()
sim_responses = bp_to_mv(sim.s[1])[...,:transition_tests.shape[-1]]  # trim to 1392

In [50]:
print(mv_str(sim_responses[:,:5]))

--11001100110011001100--------------------------------0011001100110011001100110011001110--0XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110110011001100110011001100110011001100110011001100110011001
--F00000F00F0F000F00FF--------------------------------01110101011100000101100000100110R0--0RRRRRRRNNNRNRPRNNNNNRFFRFRRRRRRR000000000011001001100101111110101110110001000100010100110111111101101000000111110011100010111000NNNNNNNNNNNNNNNNNNNNNNNNNNNNP0011001000001101000001110101011101RRRRRRRRRRRRRRRRRRRRP01R
--R10R0F011RRR10F0F11F--------------------------------1101000001011000100111000101111110--0FFPNPRRRRRRRFFFFFRFFFRRRFRFRRRFPPNPNPPPNPPPNPNNPPPNNNPPNPPPPNPPPNPNPNNNPNPNPNPNPNPPPPPPPPNPPPPNPNNNPPNPPPPNNPPPNNNNPNNPNPPPPPPNPPNNRPPPNNNNNNPPPNPNNNNPPPNPPNPNNNNNPNPPPNNNPPNPPPNNPNP0P00N0NNFPNNPPPPNNNNNNPPPNPNNRNNF
--RRRR1RFR0RRF1R0R0FR0--------------------------------1001101000001001000101001

The simulator responses contain 'R' for rising transition, 'F' for falling transition, 'P' for possible positive pulse(s) (010) and 'N' for possible negative pulse(s) (101).

We need to map each of these cases to the final logic values before we can compare:

In [51]:
sim_responses_final = np.choose(sim_responses, mvarray('0X-10101'))  # '0X-1PRFN'
print(mv_str(sim_responses_final[:,:5]))

--11001100110011001100--------------------------------0011001100110011001100110011001110--0XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110011001100110110011001100110011001100110011001100110011001100110011001
--00000000000000000000--------------------------------0111010101110000010110000010011010--011111111111110111111100101111111000000000011001001100101111110101110110001000100010100110111111101101000000111110011100010111000111111111111111111111111111100011001000001101000001110101011101111111111111111111110011
--11010001111110000110--------------------------------1101000001011000100111000101111110--000010111111100000100011101011100010100010001011000111001000010001010111010101010100000000100001011100100001100011110110100000010011100011111100010111100010010111110100011100100011010000010110011000011111100010111110
--11111101011011010010--------------------------------1001101000001001000101001

Again, first test is a flush test, so we expect 1391 matches.

In [52]:
np.sum(np.min(sim_responses_final == transition_responses, axis=0))

1391

# Working With Delay Information and Timing Simulation

Delay data for gates and interconnect can be loaded from SDF files. In kyupy's timing simulators, delays are associated with the lines between nodes, not with the nodes themselves. Each line in the circuit has a rising delay, a falling delay, a negative pulse threshold, and a positive pulse threshold. 

In [53]:
from kyupy import sdf

df = sdf.load('../tests/b14.sdf.gz')
lt = df.annotation(b14, dataset=0, interconnect=False)

The returned delay information is an `ndarray` with a set of delay values for each line in the circuit.

In [54]:
lt.shape

(46891, 2, 2)

Number of non-0 values loaded:

In [55]:
(lt != 0).sum()

120628

The available timing simulators are `WaveSim` and `WaveSimCuda`.
They work similarly to `LogicSim` in that they evaluate all cells in topological order.
Instead of propagating a logic value, however, they propagate waveforms.

`WaveSim` uses the numba just-in-time compiler for acceleration on CPU.
It falls back to pure python if numba is not available. `WaveSimCuda` uses numba for GPU acceleration.
If no CUDA card is available, it will fall back to pure python (not jit-compiled for CPU!).
Pure python is too slow for most purposes.

Both simulators operate data-parallel.
The following instanciates a new engine for 32 independent timing simulations and each signal line in the circuit can carry at most 16 transitions. All simulators share the same circuit and the same line delay specification.

In [56]:
from kyupy.wave_sim import WaveSimCuda, TMAX
import numpy as np

wsim = WaveSimCuda(b14, lt, sims=32, c_caps=16)

These are various memories allocated, with waveforms usually being the largest. 

In [57]:
def print_mem(name, arr):
    print(f'{name}: {arr.nbytes / 1024:.1f} kiB')
    
print_mem('Waveforms              ', wsim.c)
print_mem('State Allocation Table ', wsim.vat)
print_mem('Circuit Timing         ', wsim.timing)
print_mem('Circuit Netlist        ', wsim.ops)
print_mem('Sequential State       ', wsim.s)

Waveforms              : 93908.5 kiB
State Allocation Table : 1113.4 kiB
Circuit Timing         : 1484.5 kiB
Circuit Netlist        : 1099.0 kiB
Sequential State       : 420.8 kiB


This is a typical simulation loop where the number of patterns is larger than the number of simulators available.
We simulate `trans_tests_bp`.
The timing simulator accepts 8-valued `BPArray`s, but it will return response (capture) data in a different format.

In [39]:
sims = 128  # trans_tests.shape[-1]  # Feel free to simulate all tests if CUDA is set up correctly.

cdata = np.zeros((len(wsim.interface), sims, 7))  # space to store all capture data

for offset in range(0, sims, wsim.sims):
    wsim.assign(trans_tests_bp, offset=offset)
    wsim.propagate(sims=sims-offset)
    wsim.capture(time=2.5, cdata=cdata, offset=offset)  # capture at time 2.5

The capture data contains for each PI, PO, and scan flip-flop (axis 0), and each test (axis 1) seven values:

0. Probability of capturing a 1 at the given capture time (same as next value, if no standard deviation given).
1. A capture value decided by random sampling according to above probability.
2. The final value (assume a very late capture time).
3. True, if there was a premature capture (capture error), i.e. final value is different from captured value.
4. Earliest arrival time. The time at which the output transitioned from its initial value.
5. Latest stabilization time. The time at which the output transitioned to its final value.
6. Overflow indicator. If non-zero, some signals in the input cone of this output had more transitions than specified in `wavecaps`. Some transitions have been discarded, the final values in the waveforms are still valid.

In [40]:
cdata.shape

(306, 128, 7)

For validating against known logic values, take `cdata[...,1]`.

In [41]:
matches = 0

for i in range(cdata.shape[1]):
    response = ''.join('1' if x > 0.5 else '0' for x in cdata[..., i, 1])
    if trans_responses[i].replace('-','0') == response:
        matches += 1
    else:
        print(f'mismatch for test pattern {i}')
print(f'{matches} of {cdata.shape[1]} responses matched with simulator')

mismatch for test pattern 0
127 of 128 responses matched with simulator


The circuit delay is the maximum among all latest stabilization times:

In [42]:
cdata[...,5].max()

2.17240047454834

Check for overflows. If too many of them occur, increase `wavecaps` during engine instanciation:

In [43]:
cdata[...,6].sum()

2.0

Check for capture failures:

In [44]:
cdata[...,3].sum()

0.0