Compiler View

oneAPI Construction Kit provides a compiler suite that can consume kernels written using open standards including OpenCL C and SPIR-V. These compilers accept source code or IR provided by an application and compile it into an executable that is used by runtime APIs to execute work on custom processor architectures.

The ComputeMux Compiler is exposed as collection of C++ interfaces and a base implementation of those interfaces that are designed to be extended by ComputeMux targets to implement code generation for disparate hardware.

In summary, source code or IR is passed to one of the front-ends exposed by compiler::Module, optimization passes are run, then the target backend is called to link builtins and generate code to run on a ComputeMux device. Optionally, kernel compilation can be deferred as late as kernel enqueue time, using the compiler::Kernel interface.

The following diagram illustrates the order of operations that drives the compilation process:

digraph { node [fontname=Lato]; fontname=Lato compound=true style="filled" splines=false ranksep=0.9 subgraph cluster_module { label="Module" labeljust=l fillcolor="#f2daf2" color="#ccb8cc" penwidth=3 clc [ label="Module::compileOpenCLC" style="filled" shape=box fillcolor="#fff8a6" color="#d7d18c" penwidth=3] spirv [ label="Module::compileSPIRV" style="filled" shape=box fillcolor="#fff8a6" color="#d7d18c" penwidth=3] subgraph cluster_module_finalize { labeljust=c label="Module::finalize" fillcolor="#fff8a6" color="#d7d18c" penwidth=3 module_backend [ label="BaseModule::getLateTargetPasses" style="filled" shape=box fillcolor="#aaddff" color="#88aadd" penwidth=3] subgraph cluster_module_passes { label="Passes" style="dashed" color="#d7d18c" penwidth=3 labeljust=l passes_vecz [ label="Vecz" style="filled" shape=box fillcolor="#fff8a6" color="#d7d18c" penwidth=3] passes_builtins [ label="Builtins" style="filled" shape=box fillcolor="#fff8a6" color="#d7d18c" penwidth=3] passes_others [ label="..." style="filled" shape=box fillcolor="#fff8a6" color="#d7d18c" penwidth=3] } module_backend->passes_vecz [color="#6a7880" style=dashed penwidth=1] module_backend->passes_builtins [color="#6a7880" style=dashed penwidth=1] module_backend->passes_others [color="#6a7880" style=dashed penwidth=1] } module_create_binary [ label="Module::createBinary" style="filled" shape=box fillcolor="#aaddff" color="#88aadd" penwidth=3] subgraph cluster_module_get_kernel { labeljust=c label="Module::getKernel" fillcolor="#fff8a6" color="#d7d18c" penwidth=3 module_create_kernel [ label="Module::createKernel" style="filled,dashed" shape=box fillcolor="#aaddff" color="#88aadd" penwidth=3] } clc -> module_backend [lhead=cluster_module_finalize color="#6a7880" penwidth=3] spirv -> module_backend [lhead=cluster_module_finalize color="#6a7880" penwidth=3] passes_vecz -> module_create_binary [style=invis] passes_others -> module_create_kernel [style=invis] passes_builtins -> module_create_binary [ltail=cluster_module_finalize color="#6a7880" penwidth=3] passes_builtins -> module_create_kernel [ltail=cluster_module_finalize lhead=cluster_module_get_kernel color="#6a7880" penwidth=3] } subgraph cluster_kernel { labeljust=l label="Kernel" fillcolor="#f2daf2" color="#ccb8cc" style="filled,dashed" penwidth=3 kernel_create_specialized_kernel [ label="Kernel::createSpecializedKernel" style="filled,dashed" shape=box fillcolor="#aaddff" color="#88aadd" penwidth=3] } module_create_kernel -> kernel_create_specialized_kernel [color="#6a7880" penwidth=3] }

Compilation flow using compiler::Module and compiler::Kernel (dashed components are optional)

OpenCL-C Consumption

OpenCL C source code is passed to the compiler::Module::compileOpenCLC method, which is implemented by the oneAPI Construction Kit.

Clang, a component of LLVM, is used to parse the source code, generate LLVM IR and run initial optimization passes ready to pass to the backend. Clang fully supports the OpenCL C 1.1 and 1.2 standards. Clang experimentally supports the OpenCL C 3.0 standard, and is currently only tested to work with LLVM 10 after applying an internal patch.

oneAPI Construction Kit also provides the headers for OpenCL C builtin functions to Clang as a pre-compiled header, which later get linked as part of compiler::Module::finalize.

SPIR-V Consumption

SPIR-V binaries are passed to the compiler::Module::compileSPIRV method, which is implemented by the oneAPI Construction Kit.

SPIR-V is an intermediate representation for kernel programs which can be passed to an OpenCL driver that is at least version 2.1, or has the cl_khr_il_program extension enabled.

oneAPI Construction Kit comes with spirv-ll, a static library that implements translation from binary SPIR-V modules to an llvm::Module. spirv-ll fully supports SPIR-V 1.0 and a number of SPIR-V extensions:

  • SPV_KHR_no_integer_wrap_decoration

  • SPV_KHR_variable_pointers

  • SPV_KHR_16bit_storage

  • SPV_KHR_float_controls

  • SPV_KHR_storage_buffer_storage_class

  • SPV_KHR_vulkan_memory_model

  • SPV_codeplay_usm_generic_storage_class

compileSPIRV uses spirv-ll to convert the SPIR-V module into an llvm::Module, then runs a number of fixup passes and a generic optimization pipeline based on the one Clang normally runs during code generation.

Binaries

Once the source code has been consumed, compiler::Module::finalize is called to run any remaining backend specific passes, such as linking builtins and vectorization, then device specific passes are run. The LLVM module is now ready for either offline or deferred compilation.

Offline Compilation

Offline compilation is the process of generating a binary from a finalized module, ready to be passed to the muxCreateExecutable endpoint in the ComputeMux Runtime. This binary contains all the kernels contained within the program, and associated metadata. This is usually represented using the ELF binary format, but it’s up to the ComputeMux target to determine which format is used.

Offline binaries are generated using the compiler::Module::createBinary method. Unlike the compile* methods mentioned above, createBinary needs to be implemented by a ComputeMux target, and the binary returned by this function must be loadable by muxCreateExecutable.

Deferred Compilation

Deferred compilation is the process of deferring the finalization and code generation of a kernel as late as possible, to take advantage of additional information only available at runtime, such as the local size, global size, or descriptors. This feature is optional and is not required to be supported by a ComputeMux target.

Instead of generating a binary from a program using compiler::Module::createBinary, the function compiler::Module::createKernel is called instead. This creates a compiler::Kernel object, representing a single kernel function within the module, and is called once for each kernel function that is enqueued.

Before enqueuing a kernel, the method compiler::Kernel::createSpecializedKernel will be called with the execution options. At this point, the compiler::Kernel object can, for example, run optimization passes knowing the exact local size that the kernel will be executed with. compiler::Kernel::createSpecializedKernel must return a binary containing at least the kernel function represented by the compiler::Kernel object, and the binary must also be loadable by muxCreateExecutable.

More details about the deferred compilation flow can be found in the ComputeMux Compiler Specification.