Appendix B Mixing With Other Threading Packages

Appendix B Mixing With Other Threading Packages#

Correct Interoperability#

You can use oneTBB with other threading packages. No additional effort is required.

Here is an example that parallelizes an outer loop with OpenMP and an inner loop with oneTBB.

int M, N;

struct InnerBody {
    int i;
    void operator()(tbb::blocked_range<int> const& r) const {
        for (auto j = r.begin(); j != r.end(); ++j) {
            // do the work for (i, j) element
        }
    }
};

void TBB_NestedInOpenMP() {
#pragma omp parallel
    {
#pragma omp for
        for(int i = 0; i < M; ++i) {
            tbb::parallel_for(tbb::blocked_range<int>(0, N, 10), InnerBody(i));
        }
    }
}

#pragma omp parallel instructs OpenMP to create a team of threads. Each thread executes the code block statement associated with the directive.

#pragma omp for indicates that the compiler should distribute the iterations of the following loop among the threads in the existing thread team, enabling parallel execution of the loop body.

See the similar example with the POSIX* Threads:

int M, N;

struct InnerBody {
    int i;
    void operator()(tbb::blocked_range<int> const& r) const {
        for (auto j = r.begin(); j != r.end(); ++j) {
            // do the work for (i, j) element
        }
    }
};

void* OuterLoopIteration(void* args) {
    int i = reinterpret_cast<intptr_t>(args);
    tbb::parallel_for(tbb::blocked_range<int>(0, N, 10), InnerBody(i));
    return nullptr;
}

void TBB_NestedInPThreads() {
    std::vector<pthread_t> id(M);
    // Create thread for each outer loop iteration
    for(int i = 0; i < M; ++i) {
        std::intptr_t arg = i;
        pthread_create(&id[i], NULL, OuterLoopIteration, (void*)arg);
    }
    // Wait for outer loop threads to finish
    for(int i = 0; i < M; ++i)
        pthread_join(id[i], NULL);
}

Avoid CPU Overutilization#

While you can safely use oneTBB with other threading packages without affecting the execution correctness, running a large number of threads from multiple thread pools concurrently can lead to oversubscription. This may significantly overutilize system resources, affecting the execution performance.

Consider the previous example with nested parallelism, but with an OpenMP parallel region executed within the parallel loop:

int M, N;

void InnerBody(int i, int j) {
    // do the work for (i, j) element
}

void OpenMP_NestedInTBB() {
    tbb::parallel_for(0, M, [&](int i) {
        #pragma omp parallel for
        for(int j = 0; j < N; ++j) {
            InnerBody(i, j);
        }
    });
}

Due to the semantics of the OpenMP parallel region, this composition of parallel runtimes may result in a quadratic number of simultaneously running threads. Such oversubscription can degrade the performance.

oneTBB solves this issue with Thread Composability Manager (TCM). It is an experimental CPU resource coordination layer that enables better cooperation between different threading runtimes.

By default, TCM is disabled. To enable it, set TCM_ENABLE environment variable to 1. To make sure it works as intended set TCM_VERSION environment variable to 1 before running your application and check the output for lines starting with TCM:. The TCM: TCM_ENABLE 1 line confirms that Thread Composability Manager is active.

Example output:

TCM: VERSION            1.3.0
<...>
TCM: TCM_ENABLE         1

When used with the OpenMP implementation of Intel(R) DPC++/C++ Compiler, TCM allows to avoid simultaneous scheduling of excessive threads in the scenarios similar to the one above.

Submit feedback or ask questions about Thread Composability Manager through oneTBB GitHub Issues or Discussions.

Note

Coordination on the use of CPU resources requires support for Thread Composability Manager. For optimal coordination, make sure that each threading package in your application integrates with TCM.

See also