RTL-Arrow

Hardware-to-Cloud Bridge

Calvin Deutschbein, Jimmy Ostler

Motivation

  • Stack Overflow Annual Developer Survey, 2025

Problem Statement

  • The end of MOSFET scaling leads to dedicated silicon for most tasks:
    • Designs are more complex, and
    • Automated design is more powerful.
  • An explosion of hardware complexity drove major advances in Hardware Design Languages (HDLs).
  • Post-2010 advances in program language thoery are essentially unused in hardware design.

Outline

  1. Background
  2. Demonstration
  3. Insight
  4. Case Study
  5. Performance

Background

State of Play

  • Major advances in open source hardware.
    • RISC-V open source CPUs
    • OpenTitan/Caliptra open source root of trust (RoT)
  • Major advances in inference engines, data encoding, cloud
    • ASF Arrow and the Polars LazyFrame
  • We bridge the gap:
    • IEEE 1364 hardware traces to industry-standard dataframes
    • Open-source, GPLv3, indexed crate in Rust

Terms

HDL
Hardware design language - code defines silicon.
Testbench
Verilog code that provides inputs to a design for simulation.
Simulator
An HDL+Testbench compiler for pre-silicon validation..
Trace
Often stored as a “value change dump” only recording changes
Dataframe
Two dimensional data structure for discrete observations.

Demonstration

HDL

  • Specify hardware design.
  • We create a counter:
    • A value increases by one each clock tick, unless…
    • The device is still reseting (turning on).
count.v
module counter(out, clk, rst);
  output reg [3:0] out;
  input            clk;
  input            rst;

  always @(posedge clk)
      out <= rst ? out + 1 : 0;

endmodule

Testbench

  • Set rst from 0 to 1 at time \(t=5\)
  • Toggle clk between 0 and 1 every \(t\)
test.v
module test;
  reg        rst = 0;
  reg        clk = 0;
  wire [3:0] val;
  
  initial begin
     $dumpfile("test.vcd");
     $dumpvars(0,test);
     # 5 rst = 1;
     # 5 $finish;
  end
  
  always #1 clk = !clk;
  
  counter c1 (val, clk, rst);
  
endmodule

Simulator

  • Simulation is a lot like compiling…
    • Create a program that simulates the test.
    • Run the program that simulates the test.
iverilog -o sim count.v test.v
vvp sim
  • It is non-trivial to install iverilog but we provide a container with an install.
podman pull ghcr.io/hwcicd/myrtha

Trace

  • Then the changes to values by time.
test.vcd

Altogether

  • c1.clk (denoted ") changes to value 1 at time #3.
test.vcd
test.vcd

Dataframe

  • We can view this as a dataframe!
$ pip install vcd2df
$ python3 -c "import vcd2df as v; print(v.vcd2df('test.vcd'))"     
     #0  #1  #2  #3  #4  #5  #6  #7  #8
val  -1   0   0   0   0   1   1   2   2
clk   0   1   0   1   0   1   0   1   0
rst   0   0   0   0   0   1   1   1   1
out  -1   0   0   0   0   1   1   2   2
  • In the Python implementation -1 denotes an uninitialized (not yet set) register.
  • Wait - isn’t that terrible?
    • Now we need an additional sign bit!

Insight

Our Insight

  • When first implemented, early DataFrame libraries like pandas often used float NaN to denote invalid entiries.
  • However, new and better technologies exist:
    • Within cloud technologies, Arrow formats support nullable columns
    • Within programing languages, Rust supports the Option type.

Arrow

All array types, with the exception of union types (more on these later), utilize a dedicated memory buffer, known as the validity (or “null”) bitmap, to encode the nullness or non-nullness of each value slot.

is_valid[j] -> bitmap[j / 8] & (1 << (j % 8))
  • An example:

Rust

Type Option represents an optional value: every Option is either Some and contains a value, or None, and does not.

// Pattern match to retrieve the value
match result {
    // The operation was valid
    Some(x) => println!("Result: {x}"),
    // The operation was invalid
    None    => println!("Invalid operation!"),
}

Altogether

  • As intended with a VCD, we begin tracking state with all signals as unknown values, which we denote by None
  • We then create a column, with an entry for each signal, to update.
let mut curr = BTreeMap::<String, Option<u64>>::new();
for key in names.keys() {
    curr.insert(key.clone(), None);
}
let mut times: Vec<Column> = vec![Column::new("Names".into(), names)];

Creating the DataFrame

  • Within each time point, we simply create a Polars column of all values:
let tmp: Vec<Option<u64>> = curr.values().cloned().collect();
times.push(Column::new(time.into(), tmp));
  • Then, when we have read every time point, collate into a DataFrame.
DataFrame::new(times)

Viola!

┌───────┬──────┬─────┬─────┬───┬─────┬─────┬─────┬─────┐
 Names ┆ #0   ┆ #1  ┆ #2  ┆ … ┆ #6  ┆ #7  ┆ #8  ┆ #9  │
 ------------ ┆   ┆ ------------
 str   ┆ u64  ┆ u64 ┆ u64 ┆   ┆ u64 ┆ u64 ┆ u64 ┆ u64 │
╞═══════╪══════╪═════╪═════╪═══╪═════╪═════╪═════╪═════╡
 val   ┆ null ┆ 0   ┆ 0   ┆ … ┆ 1   ┆ 2   ┆ 2   ┆ 3   │
 clk   ┆ 0    ┆ 1   ┆ 0   ┆ … ┆ 0   ┆ 1   ┆ 0   ┆ 1   │
 rst   ┆ 0    ┆ 0   ┆ 0   ┆ … ┆ 1   ┆ 1   ┆ 1   ┆ 1   │
 out   ┆ null ┆ 0   ┆ 0   ┆ … ┆ 1   ┆ 2   ┆ 2   ┆ 3   │
└───────┴──────┴─────┴─────┴───┴─────┴─────┴─────┴─────┘
  • All those -1’s that don’t make sense are now an Arrow null by way of a Rust None!

Case Study

Canonical Addresses

  • In prior work1 it was found that canonical address were security-revelant.
    • Registers equal to the design bit size (e.g. 32).
    • Evenly divisible by the bit size of design.
fn val_canonical(val:Option<u64>, bits:u64) -> bool {
    return match val {
        Some(n) => n < (1 << bits) && n % bits == 0,
        None    => true,
    };
}

Examine a Design

  • We study the PicoRV32 design - a 32 bit RISC-V processor.
    • Simulate yourself, or we provide a file.
curl https://raw.githubusercontent.com/vcd2df/vcd_ex/refs/heads/main/pico.vcd -o pico.vcd

As a DataFrame

┌──────────────────────┬──────┬───────┬────────┬───┬────────────┬────────────┬────────────┬────────────┐
 Names                ┆ #0   ┆ #5000 ┆ #10000 ┆ … ┆ #10980000  ┆ #10985000  ┆ #10990000  ┆ #10995000  │
 ------------    ┆   ┆ ------------
 str                  ┆ u64  ┆ u64   ┆ u64    ┆   ┆ u64        ┆ u64        ┆ u64        ┆ u64        │
╞══════════════════════╪══════╪═══════╪════════╪═══╪════════════╪════════════╪════════════╪════════════╡
 trap                 ┆ 0    ┆ 0     ┆ 0      ┆ … ┆ 0          ┆ 0          ┆ 0          ┆ 0          │
 decoded_rs           ┆ null ┆ null  ┆ null   ┆ … ┆ null       ┆ null       ┆ null       ┆ null       │
 mem_la_wdata         ┆ null ┆ null  ┆ null   ┆ … ┆ 45         ┆ 45         ┆ 45         ┆ 45         │
 mem_wstrb            ┆ null ┆ null  ┆ null   ┆ … ┆ 0          ┆ 0          ┆ 15         ┆ 15         │
 decoded_rs1          ┆ null ┆ null  ┆ null   ┆ … ┆ 31         ┆ 31         ┆ 31         ┆ 31         │
 …                    ┆ …    ┆ …     ┆ …      ┆ … ┆ …          ┆ …          ┆ …          ┆ …          │
 mem_do_wdata         ┆ 0    ┆ 0     ┆ 0      ┆ … ┆ 1          ┆ 1          ┆ 1          ┆ 1          │
 decoded_imm_j        ┆ null ┆ null  ┆ null   ┆ … ┆ 4294967284 ┆ 4294967284 ┆ 4294967284 ┆ 4294967284 │
 mem_la_firstword_reg ┆ 0    ┆ 0     ┆ 0      ┆ … ┆ 0          ┆ 0          ┆ 0          ┆ 0          │
 decoded_rd           ┆ null ┆ null  ┆ null   ┆ … ┆ 0          ┆ 0          ┆ 0          ┆ 0          │
 mem_la_secondword    ┆ 0    ┆ 0     ┆ 0      ┆ … ┆ 0          ┆ 0          ┆ 0          ┆ 0          │
└──────────────────────┴──────┴───────┴────────┴───┴────────────┴────────────┴────────────┴────────────┘

Results

  • There are no canonical address registers which are initialized!
  • With a single function and application!
    • While a negative research result, a positive process result.
┌──────────────────────┬───────────┐
 Names                ┆ Canonical │
 ------
 str                  ┆ bool      │
╞══════════════════════╪═══════════╡
 trap                 ┆ False     │
 decoded_rs           ┆ False     │
 mem_la_wdata         ┆ False     │
 mem_wstrb            ┆ False     │
 decoded_rs1          ┆ False     │
 …                    ┆ False     │
 mem_do_wdata         ┆ False     │
 decoded_imm_j        ┆ False     │
 mem_la_firstword_reg ┆ False     │
 decoded_rd           ┆ False     │
 mem_la_secondword    ┆ False     │
└──────────────────────┴───────────┘

Performance

Setup

  • We compared vs. Python on 3 metrics
    • Build time
    • Container size
    • Runtime

To do so, we implemented a crate in Rust and performed a simple example, based on a use case involving a map of information flow detection across 181 value change dump files.

Architecture

Build Time

  • We created separate build and cluster containers
  • Python: installing the relevant runtime on a cluster container
  • Rust: installing the compiler toolchain on the build container, compiling, and copying an executable to the cluster container.
Python Rust
74.5s 291.7s

Container Size

  • We created separate build and cluster containers
  • Python: Repurposed the build container.
  • Rust: We report the sizes separately as “Build” and “Cluster”
Python Build Cluster
1.76 GB 2.59 GB .78 GB

Run Time

  • Extract a security specification over a ~65 MB data set.
Python Rust
28.5s 2.2s

We found this performance unexpectedly high. Having specifically worked in Python for this research direction for close to a decade, we have few remaining optimizations… By contrast, we… were concerned a lack of familiarity with the language may incur heavy costs due to unnecessary borrows or poor choice of types.

Summary

  • Leverage cloud and language advances for encoding.
  • No need for specialized tools (e.g. waveform viewers).
  • Package traces easily into Parquet to parallelize.
  • Reproducability