Softmax Fusion Patterns

Softmax Fusion Patterns#

Overview#

oneDNN supports various SoftMax fusion patterns to optimize performance and reduce memory bandwidth requirements. This document describes the supported fusion patterns for SoftMax.

Pattern Structure#

oneDNN defines floating-point SoftMax fusion patterns as follows. The blue nodes are required when defining a SoftMax fusion pattern while the brown nodes are optional.

Softmax pattern
  1. SoftMax Operation : Performs the softmax function for the src tensor. See the SoftMax operation in the Graph API for more details.

  2. F2F Conversion Subgraph : Converts the output tensor from floating-point to another floating-point. It is constructed by a TypeCast operation.

    f2f_conversion_subgraph
  3. Epilogue Subgraph : Optional and can include the following operations:

    Combination Rules:

    epilogue subgraph
    • 0 to 4 Binary or Unary operations are supported in the epilogue subgraph.

  4. F2Q Conversion Subgraph : Converts the output tensor from floating-point to quantized data type. It can be one of the following subgraphs. See TypeCast and Quantize operations in Graph API.

    f2q_conversion_subgraph

Data Types#

oneDNN supports the following combinations of data types for src and dst:

src

dst

bf16,f16,f32

u8,s8,bf16,f16,f32

The definition of data types and their support status on different CPU and GPU platforms follow the general description in the Data Types Guide.