Advanced Topic: Other Kinds of Iteration Spaces#
The examples so far have used the class blocked_range<T>
to specify ranges.
This class is useful in many situations, but it does not fit every situation.
You can use oneAPI Threading Building Blocks (oneTBB) to define your own iteration space objects. The object
must specify how it can be split into subspaces by providing a basic splitting
constructor, an optional proportional splitting constructor, and two predicate
methods. If your class is called R
, the methods and constructors should be
as follows:
class R {
// True if range is empty
bool empty() const;
// True if range can be split into non-empty subranges
bool is_divisible() const;
// Splits r into subranges r and *this
R( R& r, split );
// (optional) Splits r into subranges r and *this in proportion p
R( R& r, proportional_split p );
...
};
The method empty
should return true if the range is empty. The
method is_divisible
should return true if the range can be split
into two non-empty subspaces, and such a split is worth the overhead.
The basic splitting constructor should take two arguments:
The first of type
R
The second of type oneapi::tbb::split
The second argument is not used; it serves only to distinguish the
constructor from an ordinary copy constructor. The basic splitting
constructor should attempt to split r
roughly into two halves, and
update r
to be the first half, and set the constructed object as the
second half.
Unlike the basic splitting constructor, the proportional splitting
constructor is optional and takes the second argument of type
oneapi::tbb::proportional_split
. The type has methods left
and right
that return the values of the proportion. These values should be used to
split r
accordingly, so that the updated r
corresponds to the
left part of the proportion, and the constructed object corresponds to
the right part.
Both splitting constructors should guarantee that the updated r
part
and the constructed object are not empty. The parallel algorithm
templates call the splitting constructors on r
only if
r.is_divisible
is true.
The iteration space does not have to be linear. Look at
oneapi/tbb/blocked_range2d.h
for an example of a range that is
two-dimensional. Its splitting constructor attempts to split the range
along its longest axis. When used with parallel_for
, it causes the
loop to be “recursively blocked” in a way that improves cache usage.
This nice cache behavior means that using parallel_for
over a
blocked_range2d<T>
can make a loop run faster than the sequential
equivalent, even on a single processor.
The blocked_range2d
allows you to use different value types for
its first dimension, rows, and the second one, columns.
That means you can combine indexes, pointers, and iterators into a joint
iteration space. Use the methods rows()
and cols()
to obtain
blocked_range
objects that represent the respective dimensions.
The blocked_range3d
class template extends this approach to 3D by adding
pages()
as the first dimension, followed by rows()
and cols()
.
The blocked_nd_range<T,N>
class template represents a blocked iteration
space of any dimensionality. Unlike the previously described 2D and 3D ranges,
blocked_nd_range
uses the same value type for all its axes, and its
constructor requires you to pass N instances of blocked_range<T>
instead of
individual boundary values. The change in the naming pattern reflects these
differences.
Example of a Multidimensional Iteration Space#
The example demonstrates calculation of a 3-dimensional filter over the pack of feature maps.
The convolution3d
function iterates over the output cells, assigning to
each cell the result of the kernel3d
function that combines the values
from a range in the feature maps.
To run the computation in parallel, tbb::parallel_for
is called with
tbb::blocked_nd_range<int,3>
as an argument. The body function processes
the received 3D subrange in nested loops, using the method dim
to get
the loop boundaries for each dimension.
#include "oneapi/tbb/blocked_nd_range.h"
#include "oneapi/tbb/parallel_for.h"
template<typename Features>
float kernel3d(const Features& feature_maps, int i, int j, int k,
int kernel_length, int kernel_width, int kernel_height) {
float result = 0.f;
for (int feature_i = i; feature_i < i + kernel_length; ++feature_i)
for (int feature_j = j; feature_j < j + kernel_width; ++feature_j)
for (int feature_k = k; feature_k < k + kernel_width; ++feature_k)
result += feature_maps[feature_i][feature_j][feature_k];
return result;
}
template<typename Features, typename Output>
void convolution3d(const Features& feature_maps, Output& out,
int out_length, int out_width, int out_heigth,
int kernel_length, int kernel_width, int kernel_height) {
using range_t = oneapi::tbb::blocked_nd_range<int, 3>;
oneapi::tbb::parallel_for(
range_t({0, out_length}, {0, out_width}, {0, out_heigth}),
[&](const range_t& out_range) {
auto out_x = out_range.dim(0);
auto out_y = out_range.dim(1);
auto out_z = out_range.dim(2);
for (int i = out_x.begin(); i < out_x.end(); ++i)
for (int j = out_y.begin(); j < out_y.end(); ++j)
for (int k = out_z.begin(); k < out_z.end(); ++k)
out[i][j][k] = kernel3d(feature_maps, i, j, k,
kernel_length, kernel_width, kernel_height);
}
);
}