.. index:: pair: page; Single op partition on GPU .. _doxid-graph_sycl_single_op_partition_cpp: Single op partition on GPU ========================== This is an example to demonstrate how to build a simple op graph and run it on gpu. This is an example to demonstrate how to build a simple op graph and run it on gpu. Example code: :ref:`sycl_single_op_partition.cpp ` Some key take-aways included in this example: * how to build a single-op partition quickly * how to create an engine, allocator and stream * how to compile a partition * how to execute a compiled partition Some assumptions in this example: * Only workflow is demonstrated without checking correctness * Unsupported partitions should be handled by users themselves .. _doxid-graph_sycl_single_op_partition_cpp_1graph_sycl_single_op_partition_cpp_headers: Public headers ~~~~~~~~~~~~~~ To start using oneDNN Graph, we must include the ``dnnl_graph.hpp`` header file in the application. All the C++ APIs reside in namespace ``:ref:`dnnl::graph ```. .. ref-code-block:: cpp #include "oneapi/dnnl/dnnl_graph.hpp" #include "oneapi/dnnl/dnnl_graph_sycl.hpp" #include "oneapi/dnnl/dnnl_sycl.hpp" using namespace :ref:`dnnl::graph `; using namespace :ref:`sycl `; #include #include #include #include #include #include #include "example_utils.hpp" #include "graph_example_utils.hpp" using namespace :ref:`dnnl::graph `; using :ref:`data_type ` = :ref:`logical_tensor::data_type `; using :ref:`layout_type ` = :ref:`logical_tensor::layout_type `; using dim = :ref:`logical_tensor::dim `; using dims = :ref:`logical_tensor::dims `; .. _doxid-graph_sycl_single_op_partition_cpp_1graph_sycl_single_op_partition_cpp_tutorial: sycl_single_op_partition_tutorial() function ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. _doxid-graph_sycl_single_op_partition_cpp_1graph_sycl_single_op_partition_cpp_get_partition: Build Graph and Get Partitions ------------------------------ In this section, we are trying to create a partition containing the single op ``matmul`` without building a graph and getting partition. Create first ``Matmul`` op (:ref:`dnnl::graph::op `) and attaches attributes to it, including ``transpose_a`` and ``transpose_b``. .. ref-code-block:: cpp logical_tensor matmul_src0_desc {0, data_type::f32}; logical_tensor matmul_src1_desc {1, data_type::f32}; logical_tensor matmul_dst_desc {2, data_type::f32}; op matmul(0, op::kind::MatMul, {matmul_src0_desc, matmul_src1_desc}, {matmul_dst_desc}, "matmul"); matmul.set_attr(op::attr::transpose_a, false); matmul.set_attr(op::attr::transpose_b, false); .. _doxid-graph_sycl_single_op_partition_cpp_1graph_sycl_single_op_partition_cpp_compile: Compile and Execute Partition ----------------------------- In the real case, users like framework should provide device information at this stage. But in this example, we just use a self-defined device to simulate the real behavior. Create a :ref:`dnnl::graph::allocator ` with two user-defined :ref:`dnnl_graph_sycl_allocate_f ` and :ref:`dnnl_graph_sycl_deallocate_f ` call-back functions. .. ref-code-block:: cpp allocator alloc = :ref:`sycl_interop::make_allocator `( sycl_malloc_wrapper, sycl_free_wrapper); Define SYCL queue (code outside of oneDNN graph) .. ref-code-block:: cpp sycl::queue q = (ekind == engine::kind::gpu) ? sycl::queue( sycl::gpu_selector_v, sycl::property::queue::in_order {}) : sycl::queue( sycl::cpu_selector_v, sycl::property::queue::in_order {}); Create a :ref:`dnnl::engine ` based on SYCL device and context. Also, set a user-defined :ref:`dnnl::graph::allocator ` to this engine. .. ref-code-block:: cpp :ref:`dnnl::engine ` eng = :ref:`sycl_interop::make_engine_with_allocator `( q.get_device(), q.get_context(), alloc); Create a :ref:`dnnl::stream ` on a given engine .. ref-code-block:: cpp :ref:`dnnl::stream ` strm = :ref:`dnnl::sycl_interop::make_stream `(eng, q); Skip building graph and getting partition, and directly create the single-op partition .. ref-code-block:: cpp partition part(matmul, :ref:`dnnl::engine::kind::cpu `); Compile the partition to generate compiled partition with the input and output logical tensors. .. ref-code-block:: cpp compiled_partition cp = part.compile(inputs, outputs, eng); Execute the compiled partition on the specified stream. .. ref-code-block:: cpp cp.execute(strm, inputs_ts, outputs_ts);