imatcopy_batch#
Computes a group of in-place scaled matrix transpose or copy operations using general dense matrices.
Description
The imatcopy_batch routines perform a series of in-place scaled matrix
copies or transpositions. They are batched versions of imatcopy,
but the imatcopy_batch routines perform their operations with
groups of matrices. Each group contains matrices with the same parameters.
There is a strided API, in which the matrices in a batch are a set
distance away from each other in memory and in which all matrices
share the same parameters (for example matrix size), and a more
flexible group API where each group of matrices has the same
parameters but the user may provide multiple groups that have
different parameters. The group API argument structure is better
suited to USM pointers than to sycl::buffer arguments, so we
only specify it for USM inputs. The strided API works with both USM
and buffer memory.
strided API
group API
Buffer memory
supported
not supported
USM pointers
supported
supported
imatcopy_batch supports the following precisions:
T
float
double
std::complex<float>
std::complex<double>
imatcopy_batch (Buffer Version)#
Description
The buffer version of imatcopy_batch supports only the strided API.
The operation for the strided API is defined as:
for i = 0 … batch_size – 1
    C is a matrix at offset i * stride in matrix_array_in_out
    C := alpha * op(C)
end for
where:
op(X) is one of op(X) = X, or op(X) = XT, or op(X) = XH,
alpha is a scalar,
C is a matrix to be transformed in place,
and C is m x n.
The matrix_array_in_out buffer contains all the input matrices. The stride
between matrices is given by the stride parameter. The total
number of matrices in matrix_array_in_out is given by the batch_size
parameter.
Strided API
Syntax
namespace oneapi::mkl::blas::column_major {
    void imatcopy_batch(sycl::queue &queue,
                        oneapi::mkl::transpose trans,
                        std::int64_t m,
                        std::int64_t n,
                        T alpha,
                        sycl::buffer<T, 1> &matrix_array_in_out,
                        std::int64_t ld_in,
                        std::int64_t ld_out,
                        std::int64_t stride,
                        std::int64_t batch_size);
}
namespace oneapi::mkl::blas::row_major {
    void imatcopy_batch(sycl::queue &queue,
                        oneapi::mkl::transpose trans,
                        std::int64_t m,
                        std::int64_t n,
                        T alpha,
                        sycl::buffer<T, 1> &matrix_array_in_out,
                        std::int64_t ld_in,
                        std::int64_t ld_out,
                        std::int64_t stride,
                        std::int64_t batch_size);
}
Input Parameters
- queue
 The queue where the routine should be executed.
- trans
 Specifies op(
C), the transposition operation applied to the matricesC. See oneMKL defined datatypes for more details.- m
 Number of rows of each matrix
Con input. Must be at least zero.- n
 Number of columns of each matrix
Con input. Must be at least zero.- alpha
 Scaling factor for the matrix transpositions or copies.
- matrix_array_in_out
 Buffer holding the input matrices
Cwith sizestride*batch_size.- ld_in
 The leading dimension of the matrices
Con input. It must be positive, and must be at leastmif column major layout is used, and at leastnif row-major layout is used.- ld_out
 The leading dimension of the matrices
Con output. It must be positive.Cnot transposedCtransposedColumn major
ld_outmust be at leastm.ld_outmust be at leastn.Row major
ld_outmust be at leastn.ld_outmust be at leastm.- stride
 Stride between different
Cmatrices.Cnot transposedCtransposedColumn major
stridemust be at leastmax(ld_in*m, ld_out*m).stridemust be at leastmax(ld_in*m, ld_out*n).Row major
stridemust be at leastmax(ld_in*n, ld_out*n).stridemust be at leastmax(ld_in*n, ld_out*m).- batch_size
 Specifies the number of matrix transposition or copy operations to perform.
Output Parameters
- matrix_array_in_out
 Output buffer, overwritten by
batch_sizematrix copy or transposition operations of the formalpha* op(C).
Throws
This routine shall throw the following exceptions if the associated condition is detected. An implementation may throw additional implementation-specific exception(s) in case of error conditions not covered here.
imatcopy_batch (USM Version)#
Description
The USM version of imatcopy_batch supports the group API and the strided API.
The operation for the group API is defined as:
idx = 0
for i = 0 … group_count – 1
    m,n, alpha, ld_in, ld_out and group_size at position i in their respective arrays
    for j = 0 … group_size – 1
        C is a matrix at position idx in matrix_array_in_out
        C := alpha * op(C)
        idx := idx + 1
    end for
end for
The operation for the strided API is defined as:
for i = 0 … batch_size – 1
    C is a matrix at offset i * stride in matrix_array_in_out
    C := alpha * op(C)
end for
where:
op(X) is one of op(X) = X, or op(X) = XT, or op(X) = XH,
alpha is a scalar,
C is a matrix to be transformed in place,
and C is m x n.
For the group API, the matrices are given by arrays of pointers. C
represents a matrix stored at the address pointed to by matrix_array_in_out.
The number of entries in matrix_array_in_out is given by:
For the strided API, the single array C contains all the matrices
to be transformed in place. The locations of the individual matrices within
the buffer or array are given by stride lengths, while the number of
matrices is given by the batch_size parameter.
Group API
Syntax
namespace oneapi::mkl::blas::column_major {
    event imatcopy_batch(sycl::queue &queue,
                         const oneapi::mkl::transpose *trans_array,
                         const std::int64_t *m_array,
                         const std::int64_t *n_array,
                         const T *alpha_array,
                         T **matrix_array_in_out,
                         const std::int64_t *ld_in_array,
                         const std::int64_t *ld_out_array,
                         std::int64_t group_count,
                         const std::int64_t *groupsize,
                         const std::vector<sycl::event> &dependencies = {});
}
namespace oneapi::mkl::blas::row_major {
    event imatcopy_batch(sycl::queue &queue,
                         const oneapi::mkl::transpose *trans_array,
                         const std::int64_t *m_array,
                         const std::int64_t *n_array,
                         const T *alpha_array,
                         T **matrix_array_in_out,
                         const std::int64_t *ld_in_array,
                         const std::int64_t *ld_out_array,
                         std::int64_t group_count,
                         const std::int64_t *groupsize,
                         const std::vector<sycl::event> &dependencies = {});
}
Input Parameters
- queue
 The queue where the routine should be executed.
- trans_array
 Array of size
group_count. Each elementiin the array specifiesop(C)the transposition operation applied to the matrices C.- m_array
 Array of size
group_countof number of rows of C on input. Each must be at least 0.- n_array
 Array of size
group_countof number of columns of C on input. Each must be at least 0.- alpha_array
 Array of size
group_countcontaining scaling factors for the matrix transpositions or copies.- matrix_array_in_out
 Array of size
total_batch_count, holding pointers to arrays used to store C matrices.- ld_in_array
 Array of size
group_count. The leading dimension of the matrix inputC. If matrices are stored using column major layout,ld_in_array[i]must be at leastm_array[i]. If matrices are stored using row major layout,ld_in_array[i]must be at leastn_array[i]. Must be positive.- ld_out_array
 Array of size
group_count. The leading dimension of the output matrixC. Each entryld_out_array[i]must be positive and at least:m_array[i]if column major layout is used andCis not transposedm_array[i]if row major layout is used andCis transposedn_array[i]otherwise
- group_count
 Number of groups. Must be at least 0.
- group_size
 Array of size
group_count. The elementgroup_size[i]is the number of matrices in the groupi. Each element ingroup_sizemust be at least 0.- dependencies
 List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.
Output Parameters
- matrix_array_in_out
 Output array of pointers to
Cmatrices, overwritten bytotal_batch_countmatrix transpose or copy operations of the formalpha*op(C).
Return Values
Output event to wait on to ensure computation is complete.
Strided API
Syntax
namespace oneapi::mkl::blas::column_major {
    sycl::event imatcopy_batch(sycl::queue &queue,
                               oneapi::mkl::transpose trans,
                               std::int64_t m,
                               std::int64_t n,
                               value_or_pointer<T> alpha,
                               const T *matrix_array_in_out,
                               std::int64_t ld_in,
                               std::int64_t ld_out,
                               std::int64_t stride,
                               std::int64_t batch_size,
                               const std::vector<sycl::event> &dependencies = {});
namespace oneapi::mkl::blas::row_major {
    sycl::event imatcopy_batch(sycl::queue &queue,
                               oneapi::mkl::transpose trans,
                               std::int64_t m,
                               std::int64_t n,
                               value_or_pointer<T> alpha,
                               const T *matrix_array_in_out,
                               std::int64_t ld_in,
                               std::int64_t ld_out,
                               std::int64_t stride,
                               std::int64_t batch_size,
                               const std::vector<sycl::event> &dependencies = {});
Input Parameters
- queue
 The queue where the routine should be executed.
- trans
 Specifies
op(C), the transposition operation applied to the matrices C.- m
 Number of rows for each matrix
Con input. Must be at least 0.- n
 Number of columns for each matrix
Con input. Must be at least 0.- alpha
 Scaling factor for the matrix transpose or copy operation. See Scalar Arguments in BLAS for more details.
- matrix_array_in_out
 Array holding the matrices
C. Must have size at leaststride*batch_size.- ld_in
 Leading dimension of the
Cmatrices on input. If matrices are stored using column major layout,ld_inmust be at leastm. If matrices are stored using row major layout,ld_inmust be at leastn. Must be positive.- ld_out
 Leading dimension of the
Cmatrices on output. If matrices are stored using column major layout,ld_outmust be at leastmifCis not transposed ornifCis transposed. If matrices are stored using row major layout,ld_outmust be at leastnifCis not transposed or at leastmifCis transposed. Must be positive.- stride
 Stride between different
Cmatrices withinmatrix_array_in_out.Cnot transposedCtransposedColumn major
stridemust be at leastmax(ld_in*m, ld_out*m).stridemust be at leastmax(ld_in*m, ld_out*n).Row major
stridemust be at leastmax(ld_in*n, ld_out*n).stridemust be at leastmax(ld_in*n, ld_out*m).- batch_size
 Specifies the number of matrices to transpose or copy.
- dependencies
 List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.
Output Parameters
- matrix_array_in_out
 Output array, overwritten by
batch_sizematrix transposition or copy operations of the formalpha*op(C).
Return Values
Output event to wait on to ensure computation is complete.
Throws
This routine shall throw the following exceptions if the associated condition is detected. An implementation may throw additional implementation-specific exception(s) in case of error conditions not covered here.
oneapi::mkl::unsupported_device
Parent topic: BLAS-like Extensions