LayerNormBackward#
General#
LayerNormBackward performs the backward of LayerNorm operation.
The backward propagation computes \(\diffsrc(t, n, c)\), \(\diffgamma(c)^*\), and \(\diffbeta(c)^*\) based on \(\diffdst(t, n, c)\), \(src(t, n, c)\), \(\mu(t, n)\), \(\sigma^2(t, n)\), \(\gamma(c) ^*\), and \(\beta(c) ^*\).
The tensors marked with an asterisk are used only when the operation is configured to use \(\gamma(c)\), and \(\beta(c)\)
Operation attributes#
Attribute Name |
Description |
Value Type |
Supported Values |
Required or Optional |
|---|---|---|---|---|
|
s64 |
[-r,r-1],where r=rank(src). -1 is default |
Optional |
|
When set to True, this module has learnable per-element affine parameters. |
bool |
|
Optional |
|
The constant to improve numerical stability. |
f32 |
Arbitrary positive f32 value, 1e-5 (default) |
Optional |
Execution arguments#
The inputs and outputs must be provided according to below index order when constructing an operation.
Inputs#
Index |
Argument Name |
Required or Optional |
|---|---|---|
0 |
|
Required |
1 |
|
Required |
2 |
|
Required |
3 |
|
Required |
4 |
|
Optional |
5 |
|
Optional |
Note
gamma is scaling for normalized value. beta is the bias added to the scaled normalized value. They are both 1D tensor with the same span as src’s channel axis and required if attribute use_affine is set to True.
Outputs#
Index |
Argument Name |
Required or Optional |
|---|---|---|
0 |
|
Required |
1 |
|
Optional |
2 |
|
Optional |
Supported data types#
LayerNormBackward operation supports the following data type combinations.
Src / Diff_dst / Diff_src |
Gamma / Beta / Mean / Variance / Diff_gamma / Diff_beta |
|---|---|
f32 |
f32 |
bf16 |
f32, bf16 |
f16 |
f32 |
Implementation Notes#
LayerNormBackward supports in-place operations, meaning that diff_dst can be used as both input and output (diff_src). In case of in-place operation, the original diff_dst data will be overwritten. This support is limited to cases when data types of diff_dst and diff_src are identical. Use in-place operations whenever possible for performance.