Kernel Argument PackingΒΆ

In order to pass kernel arguments to the entry point function, the arguments must be packed (serialized) into a buffer and included in the Kernel Uniform Block. Serialization of kernel arguments can be delegated to the hal_argpack_t argument packer, an utility class included in the HAL interface. The CMP_REG_KARGS_INFO register must also be initialized with the offset into the KUB where the packed arguments are stored.

Since the starting address of the KUB is passed to the kernel entry point as the args kernel arguments parameter, we have to pack kernel arguments at the start of the KUB, before the work-group scheduling info. As a side-effect, CMP_REG_KARGS_INFO can be initialized by simply clearing the register since the kernel arguments offset into the KUB will always be zero.

This can be done by passing the kernels arguments to the allocateKUB function and updating the KUB generation code to use the hal_argpack_t class for packing arguments in the KUB buffer, before the work-group scheduling info is packed:

// refsi_hal.cpp
#include "arg_pack.h"   // Added
...
hal::hal_addr_t refsi_hal_device::allocateKUB(const hal::hal_arg_t *args,
                                              uint32_t num_args,
                                              const exec_state_t &exec,
                                              hal::hal_size_t &kub_desc,
                                              hal::hal_size_t &tsd_info) {
  ...

  std::vector<uint8_t> kub_data;

  // Pack arguments.
  hal::util::hal_argpack_t packer(64);
  if (!packer.build(args, num_args)) {
    return false;
  }
  kub_data.resize(packer.size());
  memcpy(kub_data.data(), packer.data(), packer.size());
  alignBuffer(kub_data, sizeof(uint64_t));

  // Pack work-group scheduling info into the KUB.
  ...
}

The kernel_exec operation also needs to be changed to use the updated allocateKUB function:

hal::hal_size_t kub_desc = 0;
hal::hal_size_t kargs_info = 0;   // Added
hal::hal_size_t tsd_info = 0;
hal::hal_addr_t kub_addr = allocateKUB(args, num_args, exec, kub_desc,
                                       tsd_info); // Updated
if (!kub_addr) {
  return false;
}

Finally, a command that writes to the CMP_REG_KARGS_INFO register needs to be added:

// Encode the command buffer.
refsi_command_buffer cb;
size_t num_instances = exec.wg.num_groups[0];
cb.addWRITE_REG64(CMP_REG_ENTRY_PT_FN, kernel_wrapper->symbol);
cb.addWRITE_REG64(CMP_REG_KUB_DESC, kub_desc);
cb.addWRITE_REG64(CMP_REG_KARGS_INFO, kargs_info);    // Added
cb.addWRITE_REG64(CMP_REG_TSD_INFO, tsd_info);
cb.addRUN_KERNEL_SLICE(/* num_harts */ 0, num_instances, 0);
cb.addFINISH();

Please note that the kargs_info value is set to zero instead of being computed based on the offset of the packed arguments into the KUB. Since the arguments are always packed at the beginning of the KUB, the offset is always zero.

At the end of this sub-step, all clik tests but two now pass:

Failed tests:
  blur
  matrix_multiply

Passed:            9 ( 81.8 %)
Failed:            2 ( 18.2 %)
Timeouts:          0 (  0.0 %)