Appendix B Mixing With Other Threading Packages#
Correct Interoperability#
You can use oneTBB with other threading packages. No additional effort is required.
Here is an example that parallelizes an outer loop with OpenMP and an inner loop with oneTBB.
int M, N;
struct InnerBody {
int i;
void operator()(tbb::blocked_range<int> const& r) const {
for (auto j = r.begin(); j != r.end(); ++j) {
// do the work for (i, j) element
}
}
};
void TBB_NestedInOpenMP() {
#pragma omp parallel
{
#pragma omp for
for(int i = 0; i < M; ++i) {
tbb::parallel_for(tbb::blocked_range<int>(0, N, 10), InnerBody(i));
}
}
}
#pragma omp parallel
instructs OpenMP to create a team of
threads. Each thread executes the code block statement associated with
the directive.
#pragma omp for
indicates that the compiler should distribute
the iterations of the following loop among the threads in the existing
thread team, enabling parallel execution of the loop body.
See the similar example with the POSIX* Threads:
int M, N;
struct InnerBody {
int i;
void operator()(tbb::blocked_range<int> const& r) const {
for (auto j = r.begin(); j != r.end(); ++j) {
// do the work for (i, j) element
}
}
};
void* OuterLoopIteration(void* args) {
int i = reinterpret_cast<intptr_t>(args);
tbb::parallel_for(tbb::blocked_range<int>(0, N, 10), InnerBody(i));
return nullptr;
}
void TBB_NestedInPThreads() {
std::vector<pthread_t> id(M);
// Create thread for each outer loop iteration
for(int i = 0; i < M; ++i) {
std::intptr_t arg = i;
pthread_create(&id[i], NULL, OuterLoopIteration, (void*)arg);
}
// Wait for outer loop threads to finish
for(int i = 0; i < M; ++i)
pthread_join(id[i], NULL);
}
Avoid CPU Overutilization#
While you can safely use oneTBB with other threading packages without affecting the execution correctness, running a large number of threads from multiple thread pools concurrently can lead to oversubscription. This may significantly overutilize system resources, affecting the execution performance.
Consider the previous example with nested parallelism, but with an OpenMP parallel region executed within the parallel loop:
int M, N;
void InnerBody(int i, int j) {
// do the work for (i, j) element
}
void OpenMP_NestedInTBB() {
tbb::parallel_for(0, M, [&](int i) {
#pragma omp parallel for
for(int j = 0; j < N; ++j) {
InnerBody(i, j);
}
});
}
Due to the semantics of the OpenMP parallel region, this composition of parallel runtimes may result in a quadratic number of simultaneously running threads. Such oversubscription can degrade the performance.
oneTBB solves this issue with Thread Composability Manager (TCM). It is an experimental CPU resource coordination layer that enables better cooperation between different threading runtimes.
By default, TCM is disabled. To enable it, set TCM_ENABLE
environment variable to 1
. To make sure it works as intended set
TCM_VERSION
environment variable to 1
before running your
application and check the output for lines starting with TCM:
. The
TCM: TCM_ENABLE 1
line confirms that Thread Composability Manager is
active.
Example output:
TCM: VERSION 1.3.0
<...>
TCM: TCM_ENABLE 1
When used with the OpenMP implementation of Intel(R) DPC++/C++ Compiler, TCM allows to avoid simultaneous scheduling of excessive threads in the scenarios similar to the one above.
Submit feedback or ask questions about Thread Composability Manager through oneTBB GitHub Issues or Discussions.
Note
Coordination on the use of CPU resources requires support for Thread Composability Manager. For optimal coordination, make sure that each threading package in your application integrates with TCM.
See also