Concepts

This page introduces the core concepts and abstractions used throughout FINN.

Intermediate Representation: QONNX and FINN-ONNX

FINN uses ONNX as an intermediate representation (IR) for neural networks. Almost every component inside FINN uses ONNX and its Python API.

Key ONNX resources:

See the Tutorials chapter for a Jupyter notebook that demonstrates working with ONNX models in FINN.

Note

FINN supports two specialized variants of ONNX called QONNX and FINN-ONNX, and not all ONNX graphs are supported by FINN (and vice versa).

QONNX vs FINN-ONNX

QONNX represents quantization using explicit Quant and BipolarQuant nodes. This format is used for models exported from Brevitas and during the early stages of the FINN flow.

FINN-ONNX uses quantization annotations (via the quantization_annotation field in ONNX) to annotate tensors with their FINN DataType information. This is FINN’s internal representation during compilation.

See the QONNX repository for details on QONNX.

Custom Quantization Annotations

Standard ONNX does not support arbitrary-precision integer datatypes. FINN supports arbitrary integer quantization (e.g., 1-bit bipolar, 3-bit, 5-bit, up to 32-bit and beyond). To support this, FINN-ONNX uses quantization annotations to attach FINN DataType (qonnx.core.datatype.DataType) information to tensors.

Key principle: All tensors use single-precision floating point (float32) as the container datatype, even for 1-bit values. The FINN DataType annotation specifies the actual bit width and signedness. The FINN compiler flow produces packed representations for target hardware.

Floating Point as Carrier Datatype

FINN uses floating point tensors as a carrier data type to represent integers. Floating point arithmetic can introduce rounding errors, e.g., (int_num * float_scale) / float_scale is not always equal to int_num.

When using the custom ONNX execution flow, FINN will attempt to sanitize rounding errors for integer tensors. See qonnx.util.basic.sanitize_quant_values for more information.

This behavior can be disabled (not recommended) by setting the environment variable SANITIZE_QUANT_TENSORS=0.

Custom Operations (CustomOps)

FINN uses many custom operations (op_type in ONNX NodeProto) that are not defined in the ONNX operator schema. These custom nodes are marked with domain="finn.*" or domain="qonnx.*" in the protobuf to identify them as such.

Custom operations can represent:

  • Specific operations needed for low-bit networks (e.g., MultiThreshold, Bipolar quantization)

  • Operations specific to a particular hardware backend (e.g., MatrixVectorActivation, ConvolutionInputGenerator)

  • Graph organization nodes (e.g., StreamingDataflowPartition)

See the CustomOps tutorial in Tutorials or the finn.custom_op module for details. For implementing new CustomOps, see the Implementation Guide.

Custom ONNX Execution Flow

To verify correct operation of FINN-ONNX graphs, FINN provides its own ONNX execution flow (finn.core.onnx_exec). This flow supports the standard set of ONNX operations as well as the custom FINN operations.

Warning

This execution flow is only meant for checking the correctness of models after applying transformations, and not for high performance inference.

ModelWrapper

FINN provides a ModelWrapper class (qonnx.core.modelwrapper.ModelWrapper) as a thin wrapper around ONNX to make it easier to analyze and manipulate ONNX graphs. This wrapper provides many helper functions, while still giving full access to the ONNX protobuf representation.

Creating a ModelWrapper

The ModelWrapper instance can be created from a .onnx file or by directly passing a ModelProto instance:

from qonnx.core.modelwrapper import ModelWrapper
model = ModelWrapper("model.onnx")

Accessing the Graph

Access the ONNX ModelProto:

modelproto = model.model

Access the graph:

graphproto = model.graph

Access the node list:

nodes = model.graph.node
first_node = nodes[0]
num_nodes = len(nodes)

Tensor Operations

List all tensor names:

tensor_list = model.get_all_tensor_names()

Find producer/consumer nodes:

# Find producer of third tensor
model.find_producer(tensor_list[2])

# Find consumer of third tensor
model.find_consumer(tensor_list[2])

If a tensor does not have a producer or consumer node (e.g., it’s a constant), None is returned.

Get/set tensor shape:

# Get tensor shape
shape = model.get_tensor_shape(tensor_list[2])

# Set tensor shape
tensor_shape = [1, 1, 28, 28]
model.set_tensor_shape(tensor_list[2], tensor_shape)

Optionally, the dtype (container datatype) can be specified as a third argument. By default it is set to TensorProto.FLOAT.

Get/set FINN DataType:

from qonnx.core.datatype import DataType

# Get FINN DataType
finn_dtype = model.get_tensor_datatype(tensor_list[2])

# Set FINN DataType
model.set_tensor_datatype(tensor_list[2], DataType["BIPOLAR"])

Get tensor initializer:

# Get initializer (returns None if no initializer exists)
initializer = model.get_initializer(tensor_list[2])

See qonnx.core.modelwrapper.ModelWrapper for the complete API.

Analysis Passes

An analysis pass traverses the graph structure and produces information about certain properties. It receives a ModelWrapper as input and returns a dictionary of extracted properties.

Purpose: Extract information without modifying the model (e.g., resource estimates, performance metrics, node counts).

Examples:

  • op_and_param_counts - Counts operations and parameters

  • exp_cycles_per_layer - Reports expected cycles per layer

  • res_estimation - Estimates FPGA resource usage

See the custom analysis pass notebook for a tutorial on writing analysis passes, and finn.analysis for existing implementations.

Transformation Passes

A transformation pass changes (transforms) the given model. It receives a ModelWrapper as input and returns:

  1. The modified ModelWrapper

  2. A model_was_changed flag indicating if the transformation should be applied again

Purpose: Progressively lower the model from high-level operations to hardware-ready operators.

Examples:

  • InferShapes - Propagates tensor shapes through the graph

  • InferDataTypes - Propagates FINN DataTypes through the graph

  • ConvertToHWLayers - Converts high-level operations to hardware layers

  • SpecializeLayers - Selects HLS vs RTL backend for each layer

  • InsertFIFO - Inserts streaming FIFOs between layers

See the custom transformation pass notebook for a tutorial on writing transformation passes, and finn.transformation for existing implementations.

Transformation Patterns

Transformations in FINN typically fall into these categories:

Graph rewriting

Replace subgraphs with equivalent but more optimized representations (e.g., AbsorbAddIntoMultiThreshold, MoveFlattenPastAffine)

Shape/datatype inference

Propagate tensor properties through the graph (e.g., InferShapes, InferDataTypes)

Hardware conversion

Convert high-level operations to hardware-specific implementations (e.g., ConvertToHWLayers, SpecializeLayers)

Optimization

Apply performance or resource optimizations (e.g., SetFolding, MinimizeAccumulatorWidth)

IP generation

Generate HDL code for hardware layers (e.g., PrepareIP, HLSSynthIP, CreateStitchedIP)

See Implementation Guide for guidance on implementing new transformation passes.

See Also