Round-Robin Policy#

The dynamic selection API is an experimental feature in the oneAPI DPC++ Library (oneDPL) that selects an execution resource based on a chosen selection policy. There are several policies provided as part of the API. Policies encapsulate the logic and any associated state needed to make a selection.

The round-robin policy cycles through the set of resources at each selection. round_robin_policy is useful for offloading kernels of similar cost to devices of similar capabilities. In those cases, a round-robin assignment of kernels to devices will achieve a good load balancing.

namespace oneapi::dpl::experimental {
template <typename ResourceType = sycl::queue, typename ResourceAdapter = oneapi::dpl::identity,
        typename Backend = default_backend<ResourceType, ResourceAdapter>>
  class round_robin_policy
    : public policy_base<round_robin_policy<ResourceType, ResourceAdapter, Backend>,
                         ResourceAdapter, Backend>
  {
    public:
      using resource_type = ResourceType;
      using backend_type = Backend;

      round_robin_policy(deferred_initialization_t);
      round_robin_policy();
      round_robin_policy(const std::vector<ResourceType>& u, ResourceAdapter adapter = {});

      // deferred initializer
      void initialize();
      void initialize(const std::vector<resource_type>& u);
      // other implementation defined functions...
  };

}

This policy can be used with all the dynamic selection free functions, as well as with policy traits.

Example#

The following example demonstrates a simple approach to send work to each queue in a set of queues, and then wait for all devices to complete the work before repeating the process. A round_robin_policy is used rotate through the available devices.

 #include <oneapi/dpl/dynamic_selection>
 #include <sycl/sycl.hpp>
 #include <iostream>

 const std::size_t N = 10000;
 namespace ex = oneapi::dpl::experimental;

void f(sycl::handler& h, float* v);


 int round_robin_example(std::vector<sycl::queue>& similar_devices,
                         std::vector<float*>& usm_data) {

   ex::round_robin_policy p{similar_devices}; // (1)

   auto num_devices = p.get_resources().size();
   auto num_arrays = usm_data.size();

   // (2)
   auto submission_group_size = (num_arrays < num_devices) ? num_arrays : num_devices;

   std::cout << "Running with " << num_devices << " queues\n"
             << "             " << num_arrays  << " usm arrays\n"
             << "Will perform " << submission_group_size << " concurrent offloads\n";

   for (int i = 0; i < 100; i += submission_group_size) { // (3)
     for (int j = 0; j < submission_group_size; ++j) {  // (4)
       ex::submit(p, [&](sycl::queue q) { // (5)
         float* data = usm_data[j];
         return q.submit([=](sycl::handler &h) { // (6)
           f(h, data);
         });
       });
     }
     ex::wait(p.get_submission_group()); // (7)
   }
   return 0;
 }

The key points in this example are:

  1. A round_robin_policy is constructed that rotates between the CPU and GPU queues.

  2. The total number of concurrent offloads, submission_group_size, will be limited to the number of USM arrays or the number of queues, whichever is smaller.

  3. The outer i-loop iterates from 0 to 99, stepping by the submission_group_size. This number of submissions will be offloaded concurrently.

  4. The inner j-loop iterates over submission_group_size submissions.

  5. submit is used to select a queue and pass it to the user’s function, but does not block until the event returned by that function completes. This provides the opportunity for concurrency across the submissions.

  6. The queue is used in a function to perform an asynchronous offload. The SYCL event returned from the call to submit is returned. Returning an event is required for functions passed to submit and submit_and_wait.

  7. wait is called to block for all the concurrent submission_group_size submissions to complete.

Selection Algorithm#

The selection algorithm for round_robin_policy rotates through the elements of the set of available resources. A simplified, expository implementation of the selection algorithm follows:

//not a public function, for exposition purposes only
template<typename ...Args>
selection_type round_robin_policy::select(Args&&...) {
  if (initialized_) {
    auto& r = resources_[next_context_++ % num_resources_];
    return selection_type{*this, r};
  } else {
    throw std::logic_error("selected called before initialization");
  }
}

where resources_ is a container of resources, such as std::vector of sycl::queue, next_context_ is a counter that increments at each selection, and num_resources_ is the size of the resources_ vector.

Constructors#

round_robin_policy provides three constructors.

Signature

Description

round_robin_policy(deferred_initialization_t);

Defers initialization. An initialize function must be called prior to use.

round_robin_policy();

Initialized to use the default set of resources.

round_robin_policy(
const std::vector<ResourceType>& u,
ResourceAdapter adapter = {});

Overrides the default set of resources with an optional resource adapter.

Deferred Initialization#

A round_robin_policy that was constructed with deferred initialization must be initialized by calling one of its initialize member functions before it can be used to select or submit.

Signature

Description

initialize();

Initialize to use the default set of resources.

initialize(const std::vector<resource_type>& u);

Overrides the default set of resources.

Queries#

A round_robin_policy has get_resources and get_submission_group member functions.

Signature

Description

std::vector<resource_type> get_resources();

Returns the set of resources the policy is selecting from.

auto get_submission_group();

Returns an object that can be used to wait for all active submissions.