GELU¶
General¶
GELU operation applies following formula on every element of \(\src\) tensor (the variable names follow the standard Naming Conventions):
\[dst = 0.5 * src * (1.0 + erf(src) / \sqrt2)\]
When the attribute mode
is specified as gelu_tanh
, an approximation implementation is used:
\[dst = 0.5 * src * (1.0 + \tanh[\sqrt{\frac{2}{\pi}} (src + 0.044715 * s^3)])\]
Operation attributes¶
Attribute Name |
Description |
Value Type |
Supported Values |
Required or Optional |
---|---|---|---|---|
Specifies the computation mode of GELU. |
string |
|
Optional |
Execution arguments¶
The inputs and outputs must be provided according to below index order when constructing an operation.
Inputs¶
Index |
Argument Name |
Required or Optional |
---|---|---|
0 |
|
Required |
Outputs¶
Index |
Argument Name |
Required or Optional |
---|---|---|
0 |
|
Required |
Supported data types¶
GELU operation supports the following data type combinations.
Src |
Dst |
---|---|
f32 |
f32 |
f16 |
f16 |
bf16 |
bf16 |