MatMul Fusion Patterns

Overview

oneDNN supports both floating-point and quantized MatMul fusion patterns to optimize performance and reduce memory bandwidth requirements. This document describes the supported floating-point fusion patterns for MatMul. For quantized MatMul fusion patterns, refer to Quantized MatMul Fusion Patterns for more details.

Pattern Structure

oneDNN defines floating-point MatMul fusion patterns as follows. The blue nodes are required when defining a MatMul fusion pattern while the brown nodes are optional.

MatMul pattern
  1. MatMul Operation : Performs matrix multiplication between the src and weights tensors. The bias tensor is optional. See the MatMul operation in the Graph API for more details.

  2. Epilogue Subgraph : Optional and can include the following operations:

    Combination Rules:

    epilogue subgraph
    • BiasAdd : If present, must be the first op in the epilogue subgraph and can only appear once.

    • 0 to 4 Binary or Unary operations are supported in the epilogue subgraph.

    • Select : If present, must follow binary/unary operations (if present) and can only appear once.

Data Types

oneDNN supports the following combinations of data types for src, weights, bias and dst:

src

weights

bias

dst

f32,bf16,f16

f32,bf16,f16

f32,bf16,f16

f32,bf16,f16

The definition of the data types and support status on different CPU and GPU platforms follow the general description in the Data Types Guide.

Example

oneDNN provides a CPU MatMul example and a GPU MatMul example demonstrating how to construct a typical floating-point MatMul pattern with oneDNN Graph API on CPU and GPU.