Auto-Tune Policy ################ The dynamic selection API is an experimental feature in the |onedpl_long| (|onedpl_short|) that selects an *execution resource* based on a chosen *selection policy*. There are several policies provided as part of the API. Policies encapsulate the logic and any associated state needed to make a selection. The auto-tune policy selects resources using runtime profiling. ``auto_tune_policy`` is useful for determining which resource performs best for a given kernel. The choice is made based on runtime performance history, so this policy is only useful for kernels that have stable performance. Initially, this policy acts like ``round_robin_policy``, rotating through each resource (one or more times). Then, once it has determined which resource is performing best, it uses that resource thereafter. Optionally, a resampling interval can be set to return to the profiling phase periodically. .. code:: cpp namespace oneapi::dpl::experimental { template , typename... KeyArgs> class auto_tune_policy : public policy_base, ResourceAdapter, Backend, execution_info::task_time_t> { public: using resource_type = ResourceType; using backend_type = Backend; auto_tune_policy(deferred_initialization_t); auto_tune_policy(uint64_t resample_interval_ms = 0); auto_tune_policy(const std::vector& u, ResourceAdapter adapter = {}, uint64_t resample_interval_ms = 0); // deferred initializer void initialize(uint64_t resample_interval_ms = 0); void initialize(const std::vector& u, uint64_t resample_interval_ms = 0); // other implementation defined functions... }; } This policy can be used with all the dynamic selection :doc:`free functions `, as well as with :ref:`policy traits `. Task Identification with KeyArgs ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The template parameter pack ``KeyArgs`` allows the policy to track performance for submitted jobs. By default (empty ``KeyArgs``), all invocations of the same function share performance history. When ``KeyArgs`` are specified, the policy uses both the function pointer and the specified arguments to create a unique key for tracking performance. .. Note:: The number of ``KeyArgs`` types must exactly match the number of extra arguments passed to the user function beyond the resource. This requirement is enforced at compile-time. For example, ``auto_tune_policy`` will track performance separately for each distinct ``std::size_t`` argument value, useful when performance varies with problem size. Here, one ``KeyArg`` type corresponds to one extra argument passed to the function after the ``sycl::queue``. Example ------- In the following example, an ``auto_tune_policy`` is used to dynamically select between two queues, a CPU queue and a GPU queue. .. code:: cpp #include #include #include namespace ex = oneapi::dpl::experimental; int main() { std::vector r { sycl::queue{sycl::cpu_selector_v}, sycl::queue{sycl::gpu_selector_v} }; const std::size_t N = 10000; std::vector av(N, 0.0); std::vector bv(N, 0.0); std::vector cv(N, 0.0); for (int i = 0; i < N; ++i) { av[i] = bv[i] = i; } ex::auto_tune_policy p{r}; // (1) { sycl::buffer a_b(av); sycl::buffer b_b(bv); sycl::buffer c_b(cv); for (int i = 0; i < 6; ++i) { ex::submit_and_wait(p, [&](sycl::queue q) { // (2) // (3) std::cout << (q.get_device().is_cpu() ? "using cpu\n" : "using gpu\n"); return q.submit([&](sycl::handler &h) { // (4) sycl::accessor a_a(a_b, h, sycl::read_only); sycl::accessor b_a(b_b, h, sycl::read_only); sycl::accessor c_a(c_b, h, sycl::read_write); h.parallel_for(N, [=](auto i) { c_a[i] = a_a[i] + b_a[i]; }); }); }); }; } for (int i = 0; i < N; ++i) { if (cv[i] != 2*i) { std::cout << "ERROR!\n"; } } std::cout << "Done.\n"; } The key points in this example are: #. An ``auto_tune_policy`` is constructed to select between the CPU and GPU. #. ``submit_and_wait`` is invoked with the policy as the first argument. The selected queue will be passed to the user-provided function. #. For clarity when run, the type of device is displayed. #. The queue is used in function to perform and asynchronous offload. The SYCL event returned from the call to ``submit`` is returned. Returning an event is required for functions passed to ``submit`` and ``submit_and_wait``. Selection Algorithm ------------------- The selection algorithm for ``auto_tune_policy`` uses runtime profiling to choose the best resource for the given function. A simplified, expository implementation of the selection algorithm follows: .. code:: cpp //not a public function, for exposition purposes only template selection_type auto_tune_policy::select(Function&& f, Args&&...args) { if (initialized_) { auto k = make_task_key(f, args...); auto tuner = get_tuner(k); auto offset = tuner->get_resource_to_profile(); if (offset == use_best) { return selection_type{*this, tuner->best_resource_, tuner}; } else { auto r = resources_[offset]; return selection{*this, r, tuner}; } } else { throw std::logic_error("selected called before initialization"); } } where ``make_task_key`` combines the inputs, including the function and its arguments, into a key that uniquely identifies the user function that is being profiled. ``tuner`` is the encapsulated logic for performing runtime profiling and choosing the best option for a given key. When the call to ``get_resource_to_profile()`` return ``use_best``, the tuner is not in the profiling phase, and so the previously determined best resource is used. Otherwise, the resource at index ``offset`` in the ``resources_`` vector is used and its resulting performance is profiled. When an ``auto_tune_policy`` is initialized with a non-zero resample interval, the policy will periodically return to the profiling phase base on the provided interval value. Constructors ------------ ``auto_tune_policy`` provides three constructors. .. list-table:: :widths: 50 50 :header-rows: 1 * - Signature - Description * - ``auto_tune_policy(deferred_initialization_t);`` - Defers initialization. An ``initialize`` function must be called prior to use. * - | ``auto_tune_policy(`` | ``uint64_t resample_interval_ms = 0);`` - Initialized to use the default set of resources. An optional resampling interval can be provided. * - | ``auto_tune_policy(`` | ``const std::vector& u,`` | ``ResourceAdapter adapter = {},`` | ``uint64_t resample_interval_ms = 0);`` - Overrides the default set of resources with an optional resource adapter. An optional resampling interval can be provided. .. Note:: When initializing the ``auto_tune_policy`` with SYCL queues, constructing the queues with the ``sycl::property::queue::enable_profiling`` property allows a more accurate determination of the best-performing device to be made. Deferred Initialization ----------------------- An ``auto_tune_policy`` that was constructed with deferred initialization must be initialized by calling one of its ``initialize`` member functions before it can be used to select or submit. .. list-table:: :widths: 50 50 :header-rows: 1 * - Signature - Description * - | ``initialize(`` | ``uint64_t resample_interval_ms = 0);`` - Initialize to use the default set of resources. An optional resampling interval can be provided. * - | ``initialize(`` | ``const std::vector& u,`` | ``uint64_t resample_interval_ms = 0);`` - Overrides the default set of resources. An optional resampling interval can be provided. .. Note:: When initializing the ``auto_tune_policy`` with SYCL queues, constructing the queues with the ``sycl::property::queue::enable_profiling`` property allows a more accurate determination of the best-performing device to be made. Queries ------- An ``auto_tune_policy`` has ``get_resources`` and ``get_submission_group`` member functions. .. list-table:: :widths: 50 50 :header-rows: 1 * - Signature - Description * - ``std::vector get_resources();`` - Returns the set of resources the policy is selecting from. * - ``auto get_submission_group();`` - Returns an object that can be used to wait for all active submissions. Reporting Requirements ---------------------- A ``auto_tune_policy`` requires the ``task_time`` reporting requirement. See the :ref:`Execution Information ` section for more information about reporting requirements.