MatMul Fusion Patterns¶
Overview¶
oneDNN supports both floating-point and quantized MatMul fusion patterns to optimize performance and reduce memory bandwidth requirements. This document describes the supported floating-point fusion patterns for MatMul. For quantized MatMul fusion patterns, refer to Quantized MatMul Fusion Patterns for more details.
Pattern Structure¶
oneDNN defines floating-point MatMul fusion patterns as follows. The blue nodes are required when defining a MatMul fusion pattern while the brown nodes are optional.

MatMul Operation : Performs matrix multiplication between the
src
andweights
tensors. Thebias
tensor is optional. See the MatMul operation in the Graph API for more details.Epilogue Subgraph : Optional and can include the following operations:
BiasAdd operation.
Binary and Unary operations: refer to the Note in Fusion Patterns.
Select operation.
Combination Rules:
BiasAdd : If present, must be the first op in the epilogue subgraph and can only appear once.
0 to 4 Binary or Unary operations are supported in the epilogue subgraph.
Select : If present, must follow binary/unary operations (if present) and can only appear once.
Data Types¶
oneDNN supports the following combinations of data types for src, weights, bias and dst:
src |
weights |
bias |
dst |
---|---|---|---|
f32,bf16,f16 |
f32,bf16,f16 |
f32,bf16,f16 |
f32,bf16,f16 |
The definition of the data types and support status on different CPU and GPU platforms follow the general description in the Data Types Guide.
Example¶
oneDNN provides a CPU MatMul example and a GPU MatMul example demonstrating how to construct a typical floating-point MatMul pattern with oneDNN Graph API on CPU and GPU.