Concepts
This page introduces the core concepts and abstractions used throughout FINN.
Intermediate Representation: QONNX and FINN-ONNX
FINN uses ONNX as an intermediate representation (IR) for neural networks. Almost every component inside FINN uses ONNX and its Python API.
Key ONNX resources:
See the Tutorials chapter for a Jupyter notebook that demonstrates working with ONNX models in FINN.
Note
FINN supports two specialized variants of ONNX called QONNX and FINN-ONNX, and not all ONNX graphs are supported by FINN (and vice versa).
QONNX vs FINN-ONNX
QONNX represents quantization using explicit Quant and BipolarQuant nodes. This format is used for models exported from Brevitas and during the early stages of the FINN flow.
FINN-ONNX uses quantization annotations (via the quantization_annotation field in ONNX) to annotate tensors with their FINN DataType information. This is FINN’s internal representation during compilation.
See the QONNX repository for details on QONNX.
Custom Quantization Annotations
Standard ONNX does not support arbitrary-precision integer datatypes. FINN supports arbitrary integer quantization (e.g., 1-bit bipolar, 3-bit, 5-bit, up to 32-bit and beyond). To support this, FINN-ONNX uses quantization annotations to attach FINN DataType (qonnx.core.datatype.DataType) information to tensors.
Key principle: All tensors use single-precision floating point (float32) as the container datatype, even for 1-bit values. The FINN DataType annotation specifies the actual bit width and signedness. The FINN compiler flow produces packed representations for target hardware.
Floating Point as Carrier Datatype
FINN uses floating point tensors as a carrier data type to represent integers. Floating point arithmetic can introduce rounding errors, e.g., (int_num * float_scale) / float_scale is not always equal to int_num.
When using the custom ONNX execution flow, FINN will attempt to sanitize rounding errors for integer tensors. See qonnx.util.basic.sanitize_quant_values for more information.
This behavior can be disabled (not recommended) by setting the environment variable SANITIZE_QUANT_TENSORS=0.
Custom Operations (CustomOps)
FINN uses many custom operations (op_type in ONNX NodeProto) that are not defined in the ONNX operator schema. These custom nodes are marked with domain="finn.*" or domain="qonnx.*" in the protobuf to identify them as such.
Custom operations can represent:
Specific operations needed for low-bit networks (e.g., MultiThreshold, Bipolar quantization)
Operations specific to a particular hardware backend (e.g., MatrixVectorActivation, ConvolutionInputGenerator)
Graph organization nodes (e.g., StreamingDataflowPartition)
See the CustomOps tutorial in Tutorials or the finn.custom_op module for details. For implementing new CustomOps, see the Implementation Guide.
Custom ONNX Execution Flow
To verify correct operation of FINN-ONNX graphs, FINN provides its own ONNX execution flow (finn.core.onnx_exec). This flow supports the standard set of ONNX operations as well as the custom FINN operations.
Warning
This execution flow is only meant for checking the correctness of models after applying transformations, and not for high performance inference.
ModelWrapper
FINN provides a ModelWrapper class (qonnx.core.modelwrapper.ModelWrapper) as a thin wrapper around ONNX to make it easier to analyze and manipulate ONNX graphs. This wrapper provides many helper functions, while still giving full access to the ONNX protobuf representation.
Creating a ModelWrapper
The ModelWrapper instance can be created from a .onnx file or by directly passing a ModelProto instance:
from qonnx.core.modelwrapper import ModelWrapper
model = ModelWrapper("model.onnx")
Accessing the Graph
Access the ONNX ModelProto:
modelproto = model.model
Access the graph:
graphproto = model.graph
Access the node list:
nodes = model.graph.node
first_node = nodes[0]
num_nodes = len(nodes)
Tensor Operations
List all tensor names:
tensor_list = model.get_all_tensor_names()
Find producer/consumer nodes:
# Find producer of third tensor
model.find_producer(tensor_list[2])
# Find consumer of third tensor
model.find_consumer(tensor_list[2])
If a tensor does not have a producer or consumer node (e.g., it’s a constant), None is returned.
Get/set tensor shape:
# Get tensor shape
shape = model.get_tensor_shape(tensor_list[2])
# Set tensor shape
tensor_shape = [1, 1, 28, 28]
model.set_tensor_shape(tensor_list[2], tensor_shape)
Optionally, the dtype (container datatype) can be specified as a third argument. By default it is set to TensorProto.FLOAT.
Get/set FINN DataType:
from qonnx.core.datatype import DataType
# Get FINN DataType
finn_dtype = model.get_tensor_datatype(tensor_list[2])
# Set FINN DataType
model.set_tensor_datatype(tensor_list[2], DataType["BIPOLAR"])
Get tensor initializer:
# Get initializer (returns None if no initializer exists)
initializer = model.get_initializer(tensor_list[2])
See qonnx.core.modelwrapper.ModelWrapper for the complete API.
Analysis Passes
An analysis pass traverses the graph structure and produces information about certain properties. It receives a ModelWrapper as input and returns a dictionary of extracted properties.
Purpose: Extract information without modifying the model (e.g., resource estimates, performance metrics, node counts).
Examples:
op_and_param_counts- Counts operations and parametersexp_cycles_per_layer- Reports expected cycles per layerres_estimation- Estimates FPGA resource usage
See the custom analysis pass notebook for a tutorial on writing analysis passes, and finn.analysis for existing implementations.
Transformation Passes
A transformation pass changes (transforms) the given model. It receives a ModelWrapper as input and returns:
The modified ModelWrapper
A
model_was_changedflag indicating if the transformation should be applied again
Purpose: Progressively lower the model from high-level operations to hardware-ready operators.
Examples:
InferShapes- Propagates tensor shapes through the graphInferDataTypes- Propagates FINN DataTypes through the graphConvertToHWLayers- Converts high-level operations to hardware layersSpecializeLayers- Selects HLS vs RTL backend for each layerInsertFIFO- Inserts streaming FIFOs between layers
See the custom transformation pass notebook for a tutorial on writing transformation passes, and finn.transformation for existing implementations.
Transformation Patterns
Transformations in FINN typically fall into these categories:
- Graph rewriting
Replace subgraphs with equivalent but more optimized representations (e.g.,
AbsorbAddIntoMultiThreshold,MoveFlattenPastAffine)- Shape/datatype inference
Propagate tensor properties through the graph (e.g.,
InferShapes,InferDataTypes)- Hardware conversion
Convert high-level operations to hardware-specific implementations (e.g.,
ConvertToHWLayers,SpecializeLayers)- Optimization
Apply performance or resource optimizations (e.g.,
SetFolding,MinimizeAccumulatorWidth)- IP generation
Generate HDL code for hardware layers (e.g.,
PrepareIP,HLSSynthIP,CreateStitchedIP)
See Implementation Guide for guidance on implementing new transformation passes.
See Also
Implementation Guide - Extending FINN with new operators and transformations
Tutorials - Jupyter notebooks with hands-on examples
qonnx.core.modelwrapper.ModelWrapper- ModelWrapper API documentationfinn.transformation- Transformation pass implementationsfinn.analysis- Analysis pass implementations