RefSi Driver Interface

The RefSi virtual device can be controlled by the RefSi device driver, which mimics how a real accelerator would be exposed to the system. The driver has a C API and can be used to manage memory transfers between the host and simulated device as well as enqueue commands (work) onto the device.

The RefSi driver currently supports two families of accelerators, RefSi M and RefSi G. While there are significant differences to how kernels can be executed on each kind of accelerator, the driver API is largely the same for both families. Functions that can be only used on one or the other will be marked as such.

Driver start-up and teardown

The refsiInitialize function must be called prior to calling any other driver function. It allocated global resources needed to manage the virtual RefSi device. These resources can and should be freed by calling the refsiTerminate function prior to the process unloading.

/// @brief Initialize the driver. No other driver function can be called prior
/// to calling this function.
REFSI_API refsi_result refsiInitialize();

/// @brief Terminate the driver. No driver function other than refsiInitialize
/// can be called after calling this function.
REFSI_API refsi_result refsiTerminate();

Once refsiInitialize has been called, the refsiOpenDevice function can be used to establish a connection to a virtual RefSi device where the device family is passed as an argument, returning a handle to the device. Calling this function multiple times with the same family returns the same handle; multiple devices are not created.

refsiShutdownDevice can be used to stop a RefSi device using its handle and reclaim its resources. Calling refsiOpenDevice then results in a new device being created.

/// @brief Represents the kind of RefSi device to control.
enum refsi_device_family {
  REFSI_DEFAULT = 0,
  REFSI_M = 1,
  REFSI_G = 2
};

/// @brief Open the device. This establishes a connection with the device and
/// ensures that it has been successfully started. Device memory functions, as
/// well as command buffer execution functions, can only be called after this
/// function has been called.
/// @param family Type of RefSi device to open a connection to.
REFSI_API refsi_device_t refsiOpenDevice(refsi_device_family family);

/// @brief Shut down the device. Any pointer returned by refsiGetMappedAddress
/// can no longer be used after this function is called.
REFSI_API refsi_result refsiShutdownDevice(refsi_device_t device);

Device queries

Not unlike real platforms, RefSi virtual devices can have different configurations that can be characterized by a number of metrics such as number of cores and hardware threads, vector register size, ISA extensions, memory size and so on. The refsiQueryDeviceInfo function can be used to query a RefSi device for a number of these characteristics. It is highly recommended to use this function rather than hard-coding device metrics, since they can change through build-time configuration or newer RefSi driver releases.

/// @brief Provides information about the device.
typedef struct refsi_device_info {
  /// @brief Kind of device.
  refsi_device_family family;
  /// @brief Number of accelerator cores contained within the device.
  unsigned num_cores;
  /// @brief Number of hardware threads contained in each accelerator core.
  unsigned num_harts_per_core;
  /// @brief Number of entries in the device's memory map.
  unsigned num_memory_map_entries;
  /// @brief String that describes the ISA exposed by the cores.
  const char *core_isa;
  /// @brief Width of the cores' vector registers, in bits.
  unsigned core_vlen;
  /// @brief Maximum width of an element in a vector register, in bits.
  unsigned core_elen;
} refsi_device_info_t;

/// @brief Query information about the device.
/// @param device Device to query information for.
/// @param device_info To be filled with information about the device.
REFSI_API refsi_result refsiQueryDeviceInfo(refsi_device_t device,
                                            refsi_device_info_t *device_info);

Another aspect of RefSi configuration that can be queried through the RefSi driver API is the device’s memory map. The map is simply a list of entries, where each entry a kind of memory, its size and the memory address it is mapped at. This includes both ‘conventional’ memory kinds like DRAM and TCM as well as memory-like regions such as memory-mapped registers (e.g. DMA and performance counter registers). The number of entries in this map can be found in the num_memory_map_entries entries of the struct returned by refsiQueryDeviceInfo.

enum refsi_memory_map_kind {
  /// @brief The kind of memory for this memory map entry is unknown.
  UNKNOWN = 0,
  /// @brief Refers to the area of memory where the device's dedicated memory is
  /// mapped. DRAM is shared between all device cores.
  DRAM = 1,
  /// @brief Refers to the area of memory where the device's entire
  /// tightly-coupled instruction memory is mapped, for all cores.
  TCIM = 2,
  /// @brief Refers to the area of memory where the device's entire
  /// tightly-coupled data memory is mapped, for all cores.
  TCDM = 3,
  /// @brief Refers to the area of memory where each core's tightly-coupled data
  /// memory is mapped. This range has the same address for all cores, however
  /// each core will see different contents when accessing it.
  TCDM_PRIVATE = 4,
  /// @brief Refers to the area of memory where Kernel DMA registers are mapped
  /// for all hardware threads.
  KERNEL_DMA = 5,
  /// @brief Refers to the area of memory where a hardware thread's Kernel DMA
  /// registers are mapped. This range has the same address for all hardware
  /// threads, however each hart will see different contents when accessing it.
  KERNEL_DMA_PRIVATE = 6,
  /// @brief Refers to the area of memory where Performance Counter registers
  /// are mapped. This is divided into a per-hardware-thread area and a global
  /// area shared between all units in the RefSi device..
  PERF_COUNTERS = 7,
};

/// @brief Represents an entry in the device's memory map.
struct refsi_memory_map_entry {
  /// @brief Kind of memory this memory range refers to.
  refsi_memory_map_kind kind;
  /// @brief Starting address of the memory range in device memory.
  refsi_addr_t start_addr;
  /// @brief Size of the memory range in device memory, in bytes.
  size_t size;
};

/// @brief Query an entry in the device's memory map.
/// @param device Device to query memory map info for.
/// @param index Index of the entry to query.
/// @param entry To be filled with information about the memory map entry.
REFSI_API refsi_result refsiQueryDeviceMemoryMap(refsi_device_t device,
                                                 size_t index,
                                                 refsi_memory_map_entry *entry);

Device memory allocation

In order to run programs (kernels) on an accelerator, some kind of device memory is typically required. The refsiAllocDeviceMemory and refsiFreeDeviceMemory can be used to allocate and free memory for a particular RefSi device. The kind argument can be used to specify where the memory should be allocated, however only DRAM allocations are currently guaranteed to be supported by the device.

/// @brief Allocate device memory.
/// @param device Device to allocate memory on.
/// @param size Size of the memory range to allocate, in bytes.
/// @param alignment Minimum alignment for the returned physical address.
/// @param kind Kind of memory to allocate, e.g. DRAM, scratchpad.
REFSI_API refsi_addr_t refsiAllocDeviceMemory(refsi_device_t device,
                                              size_t size, size_t alignment,
                                              refsi_memory_map_kind kind);

/// @brief Free device memory allocated with refsiAllocDeviceMemory.
/// @param device Device to free memory from.
/// @param phys_addr Device address to free.
REFSI_API refsi_result refsiFreeDeviceMemory(refsi_device_t device,
                                             refsi_addr_t phys_addr);

Device memory access

Device memory allocated through refsiAllocDeviceMemory can be accessed through the host (i.e. using the RefSi driver interface) in one of two ways.

The first way is through memcpy-like functions that can either read from or write to device memory, refsiReadDeviceMemory and refsiWriteDeviceMemory. These functions are blocking and only return once the operation has been completed. There is also no need to manually control the device’s cache(s).

Note how the unit_id parameter can be used to target different kinds of memory. For example, different harts can access a particular area of TCDM at the same address while seeing different contents (i.e. there is one copy of this area for each hart). Passing a hart ID as unit_id allows for accessing hart-local storage for that specific hart.

/// @brief Read data from device memory.
/// @param device Device to read from.
/// @param dest Buffer to copy read data to.
/// @param phys_addr Device address that defines the start of the memory range
/// to read from.
/// @param size Size of the memory range to read, in bytes.
/// @param unit_id UnitID of the execution unit to use when making memory
/// requests. This is usually 'external' but hart IDs can also be used.
REFSI_API refsi_result refsiReadDeviceMemory(refsi_device_t device,
                                             uint8_t *dest,
                                             refsi_addr_t phys_addr,
                                             size_t size, uint32_t unit_id);

/// @brief Write data to device memory.
/// @param device Device to write to.
/// @param phys_addr Device address that defines the start of the memory range
/// to write to.
/// @param source Buffer that contains the data to write to device memory.
/// @param size Size of the memory range to write, in bytes.
/// @param unit_id UnitID of the execution unit to use when making memory
/// requests. This is usually 'external' but hart IDs can also be used.
REFSI_API refsi_result refsiWriteDeviceMemory(refsi_device_t device,
                                              refsi_addr_t phys_addr,
                                              const uint8_t *source,
                                              size_t size, uint32_t unit_id);

The second way of accessing device memory from the host is through memory mapping. The RefSi driver maps all device memory on the host, so that a host pointer can be used to access it. The refsiGetMappedAddress function can be used to retrieve a pointer to a specific device memory address. With this pointer, data can be transparently copied to and from the device using functions like memcpy.

The refsiFlushDeviceMemory and refsiInvalidateDeviceMemory are meant for controlling the caches between the host and the device. The former should be used after writing to a pointer returned from refsiGetMappedAddress while the latter should be used after the device has potentially written to device memory (e.g. by executing a kernel) prior to reading from mapped memory.

/// @brief Get a CPU-accessible pointer that maps to the given device address.
/// Device memory is first mapped when a connection to the device is established
/// and unmapped when the connection is closed.
/// @param device Device to retrieve a mapped pointer for.
/// @param phys_addr Device address to map to a virtual address (CPU pointer).
/// @param size Size of the memory range to access, in bytes.
REFSI_API void *refsiGetMappedAddress(refsi_device_t device,
                                      refsi_addr_t phys_addr, size_t size);

/// @brief Flush any changes to device memory from the CPU cache.
/// @param device Device to flush data changes to.
/// @param phys_addr Device address that defines the start of the memory range
/// to flush from the CPU cache.
/// @param size Size of the memory range to flush, in bytes.
REFSI_API refsi_result refsiFlushDeviceMemory(refsi_device_t device,
                                              refsi_addr_t phys_addr,
                                              size_t size);

/// @brief Invalidate any cached device data from the CPU cache.
/// @param device Device to flush data changes from.
/// @param phys_addr Device address that defines the start of the memory range
/// to invalidate from the CPU cache.
/// @param size Size of the memory range to invalidate, in bytes.
REFSI_API refsi_result refsiInvalidateDeviceMemory(refsi_device_t device,
                                                   refsi_addr_t phys_addr,
                                                   size_t size);

Device execution (RefSi M1)

On RefSi M1, executing work is done through the command processor (CMP). The CMP executes command buffers, which are simply lists of commands that are stored in device memory.

Executing command buffers

A command buffer can be enqueued on the device from the host using the refsiExecuteCommandBuffer function by passing a device address and size of the command buffer. This function is asynchronous and will likely return before the commands have finished executing. refsiWaitForDeviceIdle can be used to block the current host thread until previously enqueued commands have been executed.

/// @brief Asynchronously execute a series of commands on the device.
/// @param device Device to execute a command buffer on.
/// @param cb_addr Address of the command buffer in device memory.
/// @param size Size of the command buffer, in bytes.
REFSI_API refsi_result refsiExecuteCommandBuffer(refsi_device_t device,
                                                 refsi_addr_t cb_addr,
                                                 size_t size);

/// @brief Wait for all previously enqueued command buffers to be finished.
/// @param device Device to wait for.
REFSI_API void refsiWaitForDeviceIdle(refsi_device_t device);

Encoding command buffers

/// @brief Identifies a command that can be executed by the command processor.
enum refsi_cmp_command_id {
  CMP_NOP = 0,
  CMP_FINISH = 1,
  CMP_WRITE_REG64 = 2,
  CMP_LOAD_REG64 = 3,
  CMP_STORE_REG64 = 4,
  CMP_STORE_IMM64 = 5,
  CMP_COPY_MEM64 = 6,
  CMP_RUN_KERNEL_SLICE = 7,
  CMP_RUN_INSTANCES = 8,
  CMP_SYNC_CACHE = 9
};

/// @brief Encode a CMP command header.
/// @param opcode Command's opcode to encode.
/// @param chunk_count Command's chunk count to encode.
/// @param inline_chunk Command's inline chunk to encode.
REFSI_API uint64_t refsiEncodeCMPCommand(refsi_cmp_command_id opcode,
                                         uint32_t chunk_count,
                                         uint32_t inline_chunk);

/// @brief Try to decode a CMP command header.
/// @param opcode Populated with the command's opcode.
/// @param chunk_count Populated with the command's chunk count.
/// @param inline_chunk Populated with the command's inline chunk.
REFSI_API refsi_result refsiDecodeCMPCommand(uint64_t header,
                                             refsi_cmp_command_id *opcode,
                                             uint32_t *chunk_count,
                                             uint32_t *inline_chunk);

Device execution (RefSi G1)

Execution of kernels differs between RefSi G1 and RefSi M1. Unlike M1, G1 does not feature a command processor (CMP) and as such the refsiExecuteCommandBuffer command cannot be used to enqueue work onto the device. Executing a kernel can be done using the refsiExecuteKernel, which boots the RISC-V cores on the given number of harts. A RISC-V bootloader is typically used to invoke the kernel.

/// @brief Synchronously execute a kernel on the device. Only supported on RefSi
/// G1 devices.
REFSI_API refsi_result refsiExecuteKernel(refsi_device_t device,
                                          refsi_addr_t entry_fn_addr,
                                          uint32_t num_harts);