Softmax Fusion Patterns#

Overview#

oneDNN supports various SoftMax fusion patterns to optimize performance and reduce memory bandwidth requirements. This document describes the supported fusion patterns for SoftMax.

Pattern Structure#

oneDNN defines floating-point SoftMax fusion patterns as follows. The blue nodes are required when defining a SoftMax fusion pattern while the brown nodes are optional.

SoftMax Operation : Performs the softmax function for the src tensor. See the SoftMax operation in the Graph API for more details.
F2F Conversion Subgraph : Converts the output tensor from floating-point to another floating-point. It is constructed by a TypeCast operation.
Epilogue Subgraph : Optional and can include the following operations:
- Binary and Unary operations: refer to the Note in Fusion Patterns.
Combination Rules:
- N=20, 0 to 20 Binary or Unary operations are supported in the epilogue subgraph.
F2Q Conversion Subgraph : Converts the output tensor from floating-point to quantized data type. It can be one of the following subgraphs. See TypeCast and Quantize operations in Graph API.

Data Types#

oneDNN supports the following combinations of data types for src and dst:

src	dst
bf16,f16,f32	u8,s8,bf16,f16,f32

The definition of data types and their support status on different CPU and GPU platforms follow the general description in the Data Types Guide.

Softmax Fusion Patterns

Contents

Softmax Fusion Patterns#

Overview#

Pattern Structure#

Data Types#