Softmax Fusion Patterns#
Overview#
oneDNN supports various SoftMax fusion patterns to optimize performance and reduce memory bandwidth requirements. This document describes the supported fusion patterns for SoftMax.
Pattern Structure#
oneDNN defines floating-point SoftMax fusion patterns as follows. The blue nodes are required when defining a SoftMax fusion pattern while the brown nodes are optional.

SoftMax Operation : Performs the softmax function for the
src
tensor. See the SoftMax operation in the Graph API for more details.F2F Conversion Subgraph : Converts the output tensor from floating-point to another floating-point. It is constructed by a TypeCast operation.
Epilogue Subgraph : Optional and can include the following operations:
Binary and Unary operations: refer to the Note in Fusion Patterns.
Combination Rules:
0 to 4 Binary or Unary operations are supported in the epilogue subgraph.
F2Q Conversion Subgraph : Converts the output tensor from floating-point to quantized data type. It can be one of the following subgraphs. See TypeCast and Quantize operations in Graph API.
Data Types#
oneDNN supports the following combinations of data types for src and dst:
src |
dst |
---|---|
bf16,f16,f32 |
u8,s8,bf16,f16,f32 |
The definition of data types and their support status on different CPU and GPU platforms follow the general description in the Data Types Guide.