MatMul Fusion Patterns#

Overview#

oneDNN supports both floating-point and quantized MatMul fusion patterns to optimize performance and reduce memory bandwidth requirements. This document describes the supported floating-point fusion patterns for MatMul. For quantized MatMul fusion patterns, refer to Quantized MatMul Fusion Patterns for more details.

Pattern Structure#

oneDNN defines floating-point MatMul fusion patterns as follows. The blue nodes are required when defining a MatMul fusion pattern while the brown nodes are optional.

MatMul Operation : Performs matrix multiplication between the src and weights tensors. The bias tensor is optional. See the MatMul operation in the Graph API for more details.
Epilogue Subgraph : Optional and can include the following operations:
- BiasAdd operation.
- Binary and Unary operations: refer to the Note in Fusion Patterns.
- Select operation.
Combination Rules:
- BiasAdd : If present, must be the first op in the epilogue subgraph and can only appear once.
- 0 to 4 Binary or Unary operations are supported in the epilogue subgraph.
- Select : If present, must follow binary/unary operations (if present) and can only appear once.

Data Types#

oneDNN supports the following combinations of data types for src, weights, bias and dst:

src	weights	bias	dst
f32,bf16,f16	f32,bf16,f16	f32,bf16,f16	f32,bf16,f16

The definition of the data types and support status on different CPU and GPU platforms follow the general description in the Data Types Guide.

Example#

oneDNN provides a CPU MatMul example and a GPU MatMul example demonstrating how to construct a typical floating-point MatMul pattern with oneDNN Graph API on CPU and GPU.

MatMul Fusion Patterns

Contents

MatMul Fusion Patterns#

Overview#

Pattern Structure#

Data Types#

Example#