SoftMax#
General#
SoftMax operation applies the following formula on every element of \(\src\) tensor (the variable names follow the standard Naming Conventions):
where \(C\) is a size of tensor along axis dimension. Subtracting the maximum value along the axis improves numerical stability.
If the optional stats output is requested, it is defined as:
Operation attributes#
Attribute Name |
Description |
Value Type |
Supported Values |
Required or Optional |
|---|---|---|---|---|
Represents the axis from which the SoftMax is calculated. |
s64 |
Arbitrary s64 value ( |
Optional |
|
Specifies the computation mode of SoftMax |
string |
|
Optional |
When the operation attribute mode is not set or set to none, the operation performs the normal SoftMax calculation. In this case, the operation will generate NaN if all the input elements are -infinity along the axis dimension. To prevent this, you can set the attribute to inf_as_zero so that the operation generates zeros for -infinity inputs.
Execution arguments#
The inputs and outputs must be provided according to below index order when constructing an operation.
Inputs#
Index |
Argument Name |
Required or Optional |
|---|---|---|
0 |
|
Required |
Outputs#
Index |
Argument Name |
Required or Optional |
|---|---|---|
0 |
|
Required |
1 |
|
Optional |
Supported data types#
SoftMax operation supports the following data type combinations.
Src |
Dst |
Stats |
|---|---|---|
f32 |
f32, bf16, f16 |
f32 |
bf16 |
f32, bf16 |
f32 |
f16 |
f32, f16 |
f32 |
Implementation Notes#
SoftMax supports in-place operations, meaning that src can be used as both input and output (dst). In case of in-place operation, the original src data will be overwritten. This support is limited to cases when data types of src and dst are identical. Use in-place operations whenever possible for performance.