Quantize#
Quantize operation converts a f32 tensor to a quantized (u8/s8) tensor. It supports both per-tensor and per-channel asymmetric linear quantization. Output data type is specified in output tensor data type. Rounding mode is library-implementation defined.
For per-tensor quantization:
For per-channel quantization, taking channel axis = 1 as an example:
where \(ic\) is the number of channels.
Operation Attributes#
  | 
Description  | 
Value Type  | 
  | 
  | 
|---|---|---|---|---|
Specifies which de-quantization type is used  | 
string  | 
  | 
Optional  | 
|
Specifies dimension on which per-channel de-quantization is applied  | 
s64  | 
A s64 value
in the
range of
[-r, r-1]
where r =
rank(src),
  | 
Optional  | 
|
Scalings applied on the src data  | 
f32  | 
A f32 list
(only
contain one
element if
qtype is
  | 
Required  | 
|
Offset values that maps to float zero  | 
s64  | 
A s64 list
(only
contain one
element if
qtype is
  | 
Required  | 
Execution Arguments#
The inputs and outputs must be provided according to the below index order when constructing an operation.
Inputs#
Index  | 
Argument Name  | 
Required or Optional  | 
|---|---|---|
0  | 
  | 
Required  | 
Outputs#
Index  | 
Argument Name  | 
Required or Optional  | 
|---|---|---|
0  | 
  | 
Required  | 
Supported Data Types#
Quantize operation supports the following data type combinations.
Src  | 
Dst  | 
|---|---|
f32  | 
s8, u8  | 
@note This operation is to support int8 quantization model.