Configuration of data layouts#
The usage of prepended namespace specifiers oneapi::mkl::dft
is
omitted below for conciseness.
The DFT interface provides the configuration parameters
config_param::FWD_STRIDES
(resp. config_param::BWD_STRIDES
)
to define the data layout locating entries of relevant data sequences in the
forward (resp. backward) domain. In case of batched transforms, i.e., if
\(M > 1\) is configured by setting config_param::NUMBER_OF_TRANSFORMS
accordingly, config_param::FWD_DISTANCE
(resp.
config_param::BWD_DISTANCE
) completes the description of the data layout by
specifying the distances between successive data sequences in the forward (resp.
backward) domain.
Using the notations from the introduction and the superscript \(\text{fwd}\) (resp. \(\text{bwd}\)) for data sequences belonging to forward (resp. backward) domain, for any \(m\) and multi-index \(\left(k_1, k_2, \ldots, k_d\right)\) within valid range, the corresponding entry \(\left(\cdot\right)^{m}_{k_{1}, k_{2}, \dots, k_d }\) – or the real or imaginary part thereof – of the relevant data sequence is located at index
of the corresponding data container (sycl::buffer
object or
device-accessible USM allocation) provided to the compute function, the base
data type of which is (possibly implicitly re-interpreted) as documented in the
table
below. In the index expression (1),
\(\text{x} = \text{f}\) (resp. \(\text{x} = \text{b}\)) for entries of
forward-domain (resp. backward-domain) data sequences and
\(s^{\text{xwd}}_j\), \(\forall j \in \lbrace 0, \ldots, d\rbrace\) represents the offset and generalized strides defining the locations of entries within each \(d\)-dimensional data sequence in the forward (resp. backward) domain if \(\text{x} = \text{f}\) (resp. if \(\text{x} = \text{b}\)), counted in number of elements of the relevant implicitly-assumed elementary data type;
\(l^{\text{xwd}}\) represents the distance between successive \(d\)-dimensional data sequences in the forward (resp. backward) domain if \(\text{x} = \text{f}\) (resp. if \(\text{x} = \text{b}\)), counted in number of elements of the relevant implicitly-assumed elementary data type.
Note
All data sequences (or respective real and imaginary parts thereof if separately stored) must belong to the same block allocation, as a consequence of the generalized index (1).
Implicitly-assumed elementary data type
When reading or writing an element at index (1) of any
user-provided data container used at compute time, a descriptor
object may
re-interpret the base data type of that data container into an
implicitly-assumed elementary data type.
That implicitly-assumed data type depends on the object type, i.e., on the
specialization values used for the template parameters when instantiating the
descriptor
class template, and, in case of
complex descriptors, on the configuration value set for its configuration parameter
config_param::COMPLEX_STORAGE
. The table below lists the implicitly-assumed
data type in either domain (last 2 columns) based on the object type and
its configuration value for config_param::COMPLEX_STORAGE
(first 2 columns).
Object type |
Configuration value for configuration parameter |
Implicitly-assumed elementary data type in forward domain |
Implicitly-assumed elementary data type in backward domain |
---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
irrelevant |
|
|
|
irrelevant |
|
|
Configuring data layouts for batched transforms
The value \(l^{\text{xwd}}\) in (1) above is
communicated as an std::int64_t
configuration value, set for the
configuration parameter config_param::FWD_DISTANCE
if \(\text{x} =
\text{f}\) (resp. config_param::BWD_DISTANCE
if \(\text{x} = \text{b}\)).
This value is irrelevant for unbatched transforms, i.e., for descriptors set
to handle a number of transforms \(M\) equal to \(1\) (default behavior).
In case of batched transforms, the number \(M > 1\) of desired DFTs must
be set explicitly as an std::int64_t
configuration value for the
configuration parameter config_param::NUMBER_OF_TRANSFORMS
. In that case,
the configuration parameters config_param::FWD_DISTANCE
and
config_param::BWD_DISTANCE
must also be set explicitly since their default
configuration values of \(0\) would break the consistency
requirements for any \(M > 1\).
Configuring strides in forward and backward domains
The values \(s^{\text{xwd}}_0, s^{\text{xwd}}_1, \dots, s^{\text{xwd}}_d\)
in (1) above are communicated as elements, in that order,
of a \((d+1)\)-long std::vector<std::int64_t>
configuration value, set
for the configuration parameter config_param::FWD_STRIDES
if
\(\text{x} = \text{f}\) (resp. config_param::BWD_STRIDES
if
\(\text{x} = \text{b}\)). The element \(s^{\text{xwd}}_0\) represents an
absolute offset (or “displacement”) in the data sets while the subsequent
elements \(s^{\text{xwd}}_j\ (j > 0)\) are generalized strides to be
considered along dimensions \(j \in \lbrace 1, \ldots, d\rbrace\).
The default values set for the forward and backward strides correspond to the data layout configurations for unbatched, in-place transforms using unit stride along the last dimension with no offset (and minimal padding in forward domain in case of real descriptors, aligning with the requirements for in-place transforms). In other words, the default values are \(s^{\text{fwd}}_0 = s^{\text{bwd}}_0 = 0\), \(s^{\text{fwd}}_d = s^{\text{bwd}}_d = 1\) and, for \(d\)-dimensional DFTs with \(d > 1\),
\(s^{\text{fwd}}_{d-1} = s^{\text{bwd}}_{d-1} = n_{d}\) for complex descriptors;
\(s^{\text{bwd}}_{d-1} = \lfloor \frac{n_{d}}{2} \rfloor + 1\), and \(s^{\text{fwd}}_{d-1} = 2 s^{\text{bwd}}_{d-1}\) for real descriptors;
if \(d > 2\), \(s^{\text{xwd}}_k = n_{k+1} s^{\text{xwd}}_{k+1}\) for \(k \in \lbrace 1, \ldots, d - 2\rbrace\) (for \(\text{x} = \text{f}\) and \(\text{x} = \text{b}\)).
General consistency requirements
In general, the distances and strides must be set so that every index value (1) corresponds to a unique entry of the data sequences under consideration. In other words, there must not be one index value as expressed in (1) that corresponds to two different \((d+1)\)-tuples \((m, k_{1}, k_{2}, \dots, k_d)\) that are both within the elementary range of indices considered by oneMKL.
Additionally, for in-place transforms (configuration value
config_value::INPLACE
associated with configuration parameter
config_param::PLACEMENT
), the smallest stride value must be associated with
the same dimension in forward and backward domains and the data layouts must
abide by following “consistency requirement”: the memory address(es) of
leading entry(ies) along the last dimension must be identical in forward and
backward domains. Specifically, considering any \((d+1)\)-tuple
\((m, k_{1}, k_{2}, \dots, k_{d-1}, 0)\) within valid
range, the memory address of the
element of corresponding index value (1) in forward domain
(considering the implicitly assumed
type in forward
domain) must be identical to the memory address of the element of corresponding
index value (1) in backward domain (considering the
implicitly assumed
type in
backward domain). Equivalently,
for complex descriptors, the offset, stride(s) (and distances, if relevant) must be equal in forward and backward domain;
for real descriptors, offsets and strides must satisfy \(s^{\text{fwd}}_{j} = 2 s^{\text{bwd}}_{j}\ \forall j \in \lbrace 0, \ldots, d - 1\rbrace\) (note that \(0 \leq j < d\)) and distances, if relevant, must satisfy \(l^{\text{fwd}} = 2 l^{\text{bwd}}\). Note that this leads to some data padding being required in forward domain if unit strides are used along the last dimension in forward and backward domains.
Configuring strides for input and output data [deprecated, not recommended]
Instead of specifying strides by domain, one may choose to specify the strides
for input and output data sequences. Let
\(s^{\text{x}}_{j}, \ j \in \lbrace 0, 1, \ldots, d\rbrace\) be the stride
values for input (resp. output) data sequences if \(\text{x} = \text{i}\)
(resp. \(\text{x} = \text{o}\)). Such
\(s^{\text{x}}_0, s^{\text{x}}_1, \dots, s^{\text{x}}_d\) values may be
communicated as elements, in that order, of a \((d+1)\)-long
std::vector<std::int64_t>
configuration value, set for the (deprecated)
configuration parameter config_param::INPUT_STRIDES
if
\(\text{x} = \text{i}\) (resp. config_param::OUTPUT_STRIDES
if
\(\text{x} = \text{o}\)).
The values of \(s^{\text{i}}_{j}\) and \(s^{\text{o}}_{j}\) are to be
used and considered by oneMKL if and only if
\(s^{\text{fwd}}_{j} = s^{\text{bwd}}_{j} = 0, \forall j \in \lbrace 0, 1, \ldots, d\rbrace\).
This will happen automatically if config_param::INPUT_STRIDES
and
config_param::OUTPUT_STRIDES
are set and config_param::FWD_STRIDES
and
config_param::BWD_STRIDES
are not (see note below).
In such a case, descriptor
objects must consider the data layouts
corresponding to the two compute directions separately. As detailed above,
relevant data sequence entries are accessed as elements of data containers
(sycl::buffer
objects or device-accessible USM allocations) provided to the
compute function, the base data type of which is (possibly implicitly re-interpreted)
as documented in the above
table. If
using input and output strides, for any \(m\) and multi-index
\(\left(k_1, k_2, \ldots, k_d\right)\) within valid
range, the index to be used when
accessing a data sequence entry – or part thereof – in forward domain is
where \(\text{x} = \text{i}\) (resp. \(\text{x} = \text{o}\)) for forward (resp. backward) DFT(s). Similarly, the index to be used when accessing a data sequence entry – or part thereof – in backward domain is
where \(\text{x} = \text{o}\) (resp. \(\text{x} = \text{i}\)) for forward (resp. backward) DFT(s).
As a consequence, configuring descriptor
objects using these deprecated
configuration parameters makes their configuration direction-dependent when
different stride values are used in forward and backward domains. Since the
intended compute direction is unknown to the object when
committing it, every direction that results
in a consistent data layout in
forward and backward domains must be supported by successfully-committed
descriptor
objects.
Note
For descriptor
objects with strides configured via these deprecated
configuration parameters, the
consistency requirements may be
satisfied for only one of the two compute directions, i.e., for only one
of the forward or backward DFT(s). Such a configuration should not cause an
exception to be thrown by the descriptor’s commit
member function but the behavior of
oneMKL is undefined if using that object for the compute direction that does
not align with the consistency requirements.
Note
Setting either of config_param::INPUT_STRIDES
or
config_param::OUTPUT_STRIDES
triggers any default or previously-set
values for config_param::FWD_STRIDES
and config_param::BWD_STRIDES
to reset to std::vector<std::int64_t>(d+1, 0)
, and vice versa.
This default behavior prevents mix-and-matching usage of either of
config_param::INPUT_STRIDES
or config_param::OUTPUT_STRIDES
with
either of config_param::FWD_STRIDES
or config_param::BWD_STRIDES
,
which is not to be supported. If such a configuration is attempted, an
exception is to be thrown at commit time due to invalid configuration, as
the stride values that were implicitly reset surely invalidate the
consistency requirements for any
non-trivial DFT.
If specifying the data layout strides using these deprecated configuration parameters and if the strides differ in forward and backward domain, the descriptor must be re-configured and re-committed for computing the DFT in the reverse direction as shown below.
namespace dft = oneapi::mkl::dft;
// ...
desc.set_value(dft::config_param::INPUT_STRIDES, fwd_domain_strides);
desc.set_value(dft::config_param::OUTPUT_STRIDES, bwd_domain_strides);
desc.commit(queue);
compute_forward(desc, ...);
// ...
desc.set_value(dft::config_param::INPUT_STRIDES, bwd_domain_strides);
desc.set_value(dft::config_param::OUTPUT_STRIDES, fwd_domain_strides);
desc.commit(queue);
compute_backward(desc, ...);
The config_param::INPUT_STRIDES
and config_param::OUTPUT_STRIDES
parameters are deprecated. A warning message “{IN,OUT}PUT_STRIDES are deprecated:
please use {F,B}WD_STRIDES, instead.” is to be reported to applications using
these configuration parameters.
Parent topic DFT-related scoped enumeration types