ComputeMux Change Log
The change log contains, with most recent items at the beginning, changes made to the ComputeMux API. Versioning follows the semantic versioning scheme; major version increments signify incompatible API changes, minor version bumps denote the addition of new functionality which is backward compatible, and patch version increases mean backward compatible bug fixes have been applied.
Versions prior to 1.0.0 may contain breaking changes in minor versions as the API is still under development.
0.81.0
Removed
mux-degenerate-subgroups.
0.80.0
Added atomic capabilities.
0.79.0
Added sub-group shuffle builtins.
0.78.0
Added sub-group, work-group, and vector-group operation builtins.
0.77.0
The DMA builtins now permit any event type chosen by the target, as long as they’re consistent across the module.
0.76.0
Added
num_sub_group_sizesandsub_group_sizestomux_device_info_s.
0.75.0
Snapshots have been removed.
0.74.0
Several
compiler::BaseModulemethods and fields have been reworked to make compilation of OpenCL less stateful. - The class-member lists of macro defs and OpenCL options have been removed,and are set up and torn down on the fly when compiling OpenCL C.
Several methods such as
compiler::BaseModule::populatePPOptsare nowconst.compiler::BaseModule::executeOpenCLActionhas been removed.
0.73.0
Added the
muxQueryMaxNumSubGroupsentry point.Removed
mux_kernel_s::sub_group_size, which is now an implementation detail.
0.72.0
Added the
__mux_mem_barrier,__mux_work_group_barrier, and__mux_sub_group_barrierbuiltins. They replace the older__mux_global_barrier,__mux_shared_local_barrier, and__mux_full_barrierbuiltins, which have been removed.
0.71.0
Added the
mux_device_s::supports_generic_address_spacefield.
0.70.0
compiler::BaseModule::runBackendPassesandcompiler::BaseModule::addLatePasseshave been removed. Their functionality are covered by both the newcompiler::BaseModule::getLateTargetPassesmethod and the pre-existingcompiler::Module::createBinary.
0.69.0
Remove
muxWaitand move its functionality inmuxTryWaitwhentimeoutisUINT64_MAX.
0.68.1
Require implementations of
muxCloneCommandBufferto deep copy ndrange kernel commands.
0.68.0
Add
timeoutparameter tomuxTryWait.
0.67.0
Replace
mux_error_command_buffer_not_readywithmux_fence_not_ready.Replace
mux_error_command_buffer_failurewithmux_error_fence_failure.Remove
mux_error_command_buffer_wait_semaphore_failure.
0.66.0
Rename all builtin functions from
__coreto__mux.
0.65.0
compiler::BaseTargetnow owns the builtins module optionally created incompiler::BaseTarget::initand initialized with the target as part ofcompiler::BaseTarget::initWithBuiltins.
0.64.0
Update
muxDispatchto accept an optionalmux_fence_tparameter.Update
muxTryWaitto wait on amux_fence_trather than amux_command_buffer_t.Update
muxWaitto wait on amux_fence_trather than amux_command_buffer_t.
0.63.0
Add
muxCreateFence,muxDestroyFenceandmuxResetFenceentry points.
0.62.0
Add
mux_sync_point_stype, representing intra command-buffer synchronization points for ordering commands inside a command-buffer.MuxCommand*entry-points have been updated to return a sync-point, as well as taking a list of sync-points to wait on.
0.61.0
A new method
getBuiltinCapabilitieshas been added tocompiler::Info. Calling this function will return a bitfield of the builtin capabilities of the device, based on the mux device info.
0.60.0
Add the
muxQueryWFVInfoForLocalSizeentry point.
0.59.0
Add
mux_fence_stype. There are currently no Mux entry points to create, wait on, query, reset or destorymux_fence_sobjects, these will be added in a future spec version.
0.58.0
BaseModulehas an additional virtual methodcreatePassMachinery(). This will provide aPassMachinerywhich can be used throughout the pipeline to handle state needed for the new pass manager interface.
0.57.0
Added the
mux_device_s::supports_work_group_collectivesfield.
0.56.1
Extend valid usage description of
muxUpdateDescriptorsto include text on changing the size of POD descriptors being undefined behaviour.
0.56.0
Add the following entry points: *
compiler::Kernel::querySubGroupSizeForLocalSize*compiler::Kernel::queryLocalSizeForSubGroupCount*compiler::kernel::queryMaxSubGroupCount*muxQuerySubGroupSizeForLocalSize*muxQueryLocalSizeForSubGroupCountRemove the following entry point: *
compiler::Kernel::getSubGroupSize()Add the following fields: *
mux_device_info_s::max_sub_group_count*mux_kernel_s::max_sub_group_countRemove the following field: *
mux_device_info_s::max_num_sub_groups*mux_kernel_s::sub_group_size
0.55.0
Add the
__core_dma_read_3Dbuiltin.Add the
__core_dma_write_3Dbuiltin.Modify
__core_dma_read_2Dand__core_dma_write_2Dto handle source and destination strides.
0.54.0
cargo::optional<mux_device_t> deviceandmux_allocator_info_t allocator_infohas been removed fromcompiler::Info::createTarget.compiler::BaseKernel::createSpecializedKernelhas been moved tocompiler::Kernel::createSpecializedKernel.compiler::Kernel::createMuxSpecializedKernelwas an implementation detail ofcompiler::BaseKernelwhich has now been removed.compiler::SpecializedKernelhas been removed.compiler::BaseTargetnow loads the builtins module for the given builtin capabilities as part ofcompiler::BaseTarget::init. Compiler targets should implementcompiler::BaseTarget::initWithBuiltinsinstead. Unlikeinit,initWithBuiltinsdoes not need to delegate tocompiler::BaseTargetfirst, as it’s a pure virtual function.The notification callback passed to
compiler::Target::initis now passed tocompiler::Info::createTargetand is now of typecompiler::NotifyCallbackFn. This should be passed along tocompiler::BaseTarget’s constructor.
0.53.2
Change the
user_functionargument ofmuxCommandUserCallbackto use themux_command_user_callback_ttype, rather than the function pointer type explicitly.
0.53.1
Remove note mandating that targets do their own validation of
dataandstridemuxGetQueryPoolResultsparameters.
0.53.0
Add the
uint32_t mux_query_counter_s::hardware_countersfield.Add the
uint32_t mux_device_info_s::max_hardware_countersfield.
0.52.0
Rename member
max_subgroup_sizeinmux_device_info_ttomax_work_width.Rename member function
getDynamicSubgroupSizeincompiler::KerneltogetDynamicWorkWidth.
0.51.0
Added the
__core_get_max_sub_group_size()builtin.
0.50.0
Version bump to maintain parity with Core which has had the
__core_get_num_sub_groupsbuiltin added.
0.49.0
Version bump to maintain parity with Core which has had the
__core_get_sub_group_idbuiltin added.
0.48.0
Add the
size_t mux_kernel_s::sub_group_sizefield.Add the
cargo::expected<uint32_t, Result> compiler::Kernel::getSubGroupSize()method.
0.47.0
Add the
uint32_t mux_device_info_s::max_num_sub_groupsfield.Add the
bool mux_device_info_s::sub_groups_support_ifpfield.
0.46.0
Add member
scalable_vector_supporttocompiler::Infoto represent that the compiler supports generating scalable vector code.Add member
scalable_vectorstocompiler::Optionsto indicate that the executable should be finalized with scalable vectors.
0.45.0
Version bump to maintain parity with Core which has had the
__core_dma_write_2Dand__core_dma_write_2Dbuiltins added.
0.44.0
Initial release of the ComputeMux specification. The changelog for the Core specification has been duplicated here to preserve history.
Remove the
corePushBarrierentry point, which was rendered obsolete when command groups were guaranteed to execute in order.
0.43.1
Add
core_source_type_llvm_140andcore_source_capabilities_llvm_140for supporting LLVM 14
0.43.0
Add the
coreCloneCommandGroupentry point.Add the
bool core_device_info_s::can_clone_command_groupsfield.
0.42.1
Relax thread-safety requirements of implementing
coreFinalizeCommandGroup(), so that the entry-point is only thread-safe with respect to the same command-group handle rather than across all invocations.
0.42.0
Add the
coreUpdateDescriptorsentry point.Add the
bool core_device_info_s::descriptors_updatablefield.
0.41.0
Add the
coreFinalizeCommandGroupentry point.
0.40.3
Add
core_source_type_llvm_130andcore_source_capabilities_llvm_130for supporting LLVM version 13.0.0.
0.40.2
Add
core_source_type_llvm_120andcore_source_capabilities_llvm_120for supporting LLVM version 12.0.0.
0.40.1
Add the
size_t __core_get_global_linear_id()builtin.Add the
size_t __core_get_local_linear_id()builtin.Add the
size_t __core_get_enqueued_local_size(uint)builtin.
0.40.0
Remove
host_pointerargument fromcoreAllocateMemory.Remove
core_allocation_type_use_hostfromcore_allocation_type_e.Rename
core_allocation_capabilities_eenumscore_allocation_capabilities_alloc_hosttocore_allocation_capabilities_coherent_hostandcore_allocation_capabilities_use_hosttocore_allocation_capabilities_cached_host.
0.39.3
Require stricter device capability
core_allocation_capabilities_alloc_hostto support entry pointcoreCreateMemoryFromHost, as this implies the device architecture has cache coherent memory with host.
0.39.2
Forbid mapping already mapped memory objects with
coreMapMemory.Specify flushing cache coherent memory as a nop.
Require
core_memory_property_host_visibleas a property of memory objects mapped withcoreMapMemory.
0.39.1
Add a valid use clarification for
coreCreateSpecializedKernel.
0.39.0
Add
alignmentargument tocoreAllocateMemoryto specify the minimum alignment for the allocated memory.Add
handlemember tocore_memory_sto allow the host runtime a way to represent the underlying memory address.Add entry point
coreCreateMemoryFromHostto allow APIs to create acore_memory_tdevice visible object from pre-allocated host memory.
0.38.7
Rename the
core_vectorization_order_eenum tocore_work_item_order_e, and the enum values to match thework_itemnaming.Rename the
vec_orderfield ofcore_executable_options_ttowork_item_order, to match the rename of-cl-wfv-orderto-cl-wi-order.Upgrade Guidance:
utils::createHandleBarriersPass()must now be passed a parameter of typeenum core_work_item_order_eto specify the work item dimension priority.
0.38.6
Add
core_vectorization_order_eenum type to represent vectorization priority order.Add
vec_orderfield tocore_executable_options_tstruct for supporting the-cl-wfv-orderextension.
0.38.5
Add
core_source_type_llvm_110andcore_source_capabilities_llvm_110for supporting LLVM version 11.0.0.
0.38.4
Add documentation for maximum built-in kernel name length.
0.38.3
Add
core_source_type_llvm_100andcore_source_capabilities_llvm_100for supporting LLVM version 10.0.0.
0.38.2
Add
__core_usefast()and__core_isembeddedprofile()functions as required builtins that core targets must replace.Added
core_floating_point_capabilities_fullflag tocore_floating_point_capabilities_efor IEEE-754 compliant representations.
0.38.1
Add flags to
core_executable_flags_eto represent the various OpenCL math optimization build options, namely:core_executable_flags_mad_enablecore_executable_flags_no_signed_zeroescore_executable_flags_unsafe_math_optimizationscore_executable_flags_finite_math_only
0.38.0
Add
compilation_optionsC string tocore_device_info_sto hold custom build options provided by the device.Add
core_executable_options_tstruct which encapsulates thecore_executable_flags_ebitfield and a C string for the name and value of any device specific build options passed by the user.Redefine
core_executable_sstruct to have acore_executable_options_tmember rather than thecore_executable_flags_ebitfield.Redefine
coreCreateBinaryFromSource()andcoreCreateExecutable()to take acore_executable_options_targument rather than acore_executable_flags_ebitfield.
0.37.1
Add
core_executable_flags_prevec_loopandcore_executable_flags_prevec_slpenum values tocore_executable_flags_efor activation of “early vectorization” passes:Loop Vectorization
SLP Vectorization
Load/Store Vectorization
0.37.0
Core now accepts 3D descriptions of memory in the
corePush*Regionentry points, these layouts are passed down to the implementation.Reduce the overhead significantly.
Redefine
core_buffer_region_info_sto describe a buffer in 1D, 2D or 3D. This design is based on OpenCL’sclEnqueue*BufferRectentry points.
0.36.0
Add support for query counters, extending the mechanism for reporting performance statistics to the application by providing a configurable method for enabling a set of hardware counters alongside metadata which can be used by a profiling visualisation tool to describe the queried data.
Extend
core_query_type_eto includecore_query_type_counter.Add
coreGetSupportedQueryCounters()to enable applications to discover the full list of supported query counters.Add
core_query_counter_tused to describe how to enable and interpret a query counter.Add
core_query_counter_description_tused to provide human readable metadata about a query counter.Extend
coreCreateQueryPoolto accept an array ofcore_query_counter_config_ts to select which query counters to enable and pass through additional target specific counter configuration if necessary.Extend
corePushBeginQuery/corePushEndQueryto accept aquery_countin addition to aquery_index, this allows multiple queries to be enabled at once.Add
core_query_counter_result_tused to return the result of a single query counter to the application usingcoreGetQueryPoolResults().
0.35.0
Add support for queries, a mechanism for targets to report performance statistics to the application.
The
core_query_pool_tobject is used to store the query results,coreCreateQueryPool()andcoreDestroyQueryPool()define the objects lifecycle,coreGetQueryPoolResults()is used to provide the results to the application.The
core_query_type_eenumeration defines a set of possible queries, currently onlycore_query_type_durationis supported and is intended to report the start and end timestamps of a command, results are reported using thecore_query_duration_result_tobject.The
corePushBeginQuery()andcorePushEndQuery()entry points define the range of commands for which acore_query_pool_tis to be used in acore_command_group_t,corePushResetQueryPool()is used to zero all query results in the spcified range within thecore_query_pool_t.
0.34.3
Remove unnecessary member
vectorizefromcore_kernel_t.
0.34.2
Fix
core.xmlcomment to state thatCL_DEVICE_NAMEis matched withcore_device_info_s::device_name.
0.34.1
Added
core_source_capabilities_e::core_source_capabilities_llvm_anybit mask to match any of the LLVM source capability bits.
0.34.0
Add support for custom buffer descriptors, this allows passing through arbitrary data from the user to the Core target in addition to the address space provided by the compiler frontend. This includes:
The
custom_buffer_capabilitiesdata member ofcore_device_info_sdescribing which custom buffer capabilities the Core target supports.The
core_custom_capabilities_eenumeration of custom buffer capabilities.The
core_descriptor_info_custom_buffer_sstructure to describe the custom buffer to the Core target.The
core_descriptor_info_type_custom_bufferenumeration value to specify that a descriptor is a custom buffer.
0.33.1
Clarify that whitespace characters other than `` `` are not supported in built-in kernel declarations.
0.33.0
Unify snapshot descriptions to favor snapshot “stages” over snapshot “points”. Rename:
coreListSnapshotPointstocoreListSnapshotStagescoreSetSnapshotPointtocoreSetSnapshotStage
Specify that passing an invalid snapshot stage name to
coreSetSnapshotStagemust returncore_error_malformed_parameter.Remove
core_snapshot_type_noneto make it harder to set an invalid format.Rename
core_snapshot_type_etocore_snapshot_format_eto unify how the format information is called and used.Introduce
core_snapshot_format_defaultto unify how the format information is used.Re-order the parameters of
coreSetSnapshotStage, i.e., move thesnapshot_formatparameter before thesnapshot_callbackparameter.
0.32.3
Added built-in kernel usage section to the Core
spec.mddocument.
0.32.2
Clarify syntax for built-in kernel declarations.
Clarify that
build_flagshave no effect oncoreCreateExecutablewhen the source type iscore_source_type_builtin_kernel.
0.32.1
Clarify that Core implementations of command groups must not access signal semaphores of completed command groups they depend on.
0.32.0
Add
core_callback_info_tto support implementations providing detailed messages to users about API usage.Change
<client>CreateFinalizerto take acore_callback_info_tparameter to support provision of detailed messages about compilation.Change
<client>CreateCommandGroupto take acore_callback_info_tparameter to support provision of detailed messages about command execution.
0.31.4
Clarify the error return codes of
coreCreateExecutableandcoreCreateBinaryFromSourcefor unknown or invalidsource_typearguments.
0.31.3
Clarify the valid usage of permitted actions in the
user_functioncallback ofcoreDispatch.Clarify when a command group passed to
coreDispatchis considered complete.
0.31.2
Add allocator validity check to
id.hand rename it toutils.h.
0.31.1
Weaken requirement that host-side allocations must use the user provided allocator to that they should use it. This enables use of third-party libraries, like LLVM or the C standard library, which do not support user provided allocators and should not affect existing target implementations.
0.31.0
Supersede
generate_core_headerwithadd_core_target, this also simplifies the mechanism by which targets register themselves and how they specify their capabilities in addition to creating a CMake target to generate the core target header.Add
add_core_cross_compilerswhich simplifies the mechanism for registering a targets cross-compilers with thecrosstarget.
0.30.0
Add requirement that commands in a command group must be executed in the order they were pushed onto the command group, making command groups in-order.
Add addition valid usage requirements for the usage
core_semaphore_tdefining when it can be reset and destroyed relating to the lifetime of acoreDispatch().
0.29.2
Changed
builtin_kernel_namestobuiltin_kernel_declarationsto better represent what information is contained.
0.29.1
Numerous clarifications and inconsistencies corrected in the specification and Doxygen comments of
core.h.
0.29.0
Add
core_device_type_compilertocore_device_type_eto represent a target which only implements the compilation entry points for use in compiling offline and cross-compiled kernels.Change
core_device_type_eenumerations to make them usable in a bitfield and addcore_device_type_allfor selecting all device types.Change
coreGetDeviceInfosto take a bitfield ofcore_device_type_ein order to selectively initialize only desired devices.
0.28.4
Changed type of
devicemember variable incore_finalizer_sfromcore_device_ttocore_device_info_t.
0.28.3
Add
core_source_type_llvm_80andcore_source_capabilities_llvm_80for supporting LLVM version 8.0.0.
0.28.2
Add back in the removed
idmember from thecore_device_sstruct to fix compilation failures incoreSelect.hwhen multiple targets are registered.
0.28.1
Add support for builtin kernels to core.
Added
core_source_type_unknown,core_source_type_builtin_kernelandcore_source_capabilities_builtin_kerneltocore_source_type_eandcore_source_capabilities_e.Added
core_source_type_builtin_kernelas one of the supported types tocoreCreateExecutablefor creation of acore_executablewith builtin kernels.Reordered values in
core_source_type_eandcore_source_capabilities_e.
0.28.0
Changed
coreCreateFinalizerandcoreDestroyFinalizerentrypoints to takecore_device_info_ts instead ofcore_device_ts.Added a new type
core_binary_t.Removed
coreGetBinaryand replaced it with a newcoreCreateBinaryFromExecutableentrypoint.Added
coreCreateBinaryFromSourceentrypoint for offline/cross-compilation support.Added a matching
coreDestroyBinaryto destroy binaries created by the above two functions.
0.27.0
Separate device enumeration from initialization by adding a new structure:
core_device_info_t, and a new function:coreGetDeviceInfos.coreCreateDeviceshook API has changed - a new hook forcoreGetDeviceInfoswas added, which has an almost identical interface to the existingcoreCreateDeviceshook.
0.26.1
Add
core_executable_flags_dma_neverandcore_executable_flags_vectorize_neverenum values tocore_executable_flags_e, so that the core implementations are informed of whether the user chose explicitly to enable/disable these optimizations, or if the default behavior is to be used when neither thenevernoralwaysflags are present.
0.26.0
Add member
endiannesstocore_device_tto represent whether the device is big- or little-endian.
0.25.0
Change to CMake to build only the required builtins based on target capabilities. Capabilities must be reported in a
<target_name>_CAPABILITIESvariable.
0.24.2
Change the CMake mechanism to generate
<client>API headers, it is now possible to override theclang-formatexecutable used during header generation.
0.24.1
Change references to
command_bufferin Doxygen documentation and parameter variable names tocommand_group.
0.24.0
Add member
dma_optimizabletocore_device_tto represent that DMA optimizations can be performed for this device.Add
core_executable_flags_dma_alwaystocore_executable_flags_eto represent that DMA optimizations must be performed.
0.23.0
Add a new command
<client>ResetSemaphore()to reset a semaphore such that it has no previous signalled state.
0.22.5
Add member
image2d_array_writestocore_device_t.
0.22.4
Add member
integer_capabilitiestocore_device_t.Add enum
core_integer_capabilities_e.
0.22.3
Add member
vectorizabletocore_device_tto represent that vectorization can be performed for this device.Add member
vectorizetocore_kernel_t.Add
core_executable_flags_vectorize_alwaystocore_executable_flags_eto represent that vectorization must be performed.
0.22.2
Add
core_executable_flags_denorms_may_be_zerotocore_executable_flags_eto represent that denormal floats may be flushed to zero.
0.22.1
Added member
local_memory_sizetocore_kernel_t.
0.22.0
Add a new command
<client>PushBarrier()to enforce the execution order of commands within a command group.
0.21.0
Add a
core_finalizer_targument to<client>DestroyExecutable(),<client>DestroyKernel()and<client>DestroyScheduledKernel(). Note that<client>DestroySpecializedKernel()does not take acore_finalizer_t.
0.20.5
Add
core_source_type_llvm_70andcore_source_capabilities_llvm_70for supporting LLVM version 7.0.0.
0.20.4
Remove dead symbol references in Doxygen documentation.
0.20.3
Add
allocation_sizetocore_device_sto represent the maximum size of a single memory allocation.
0.20.2
Add
__core_get_work_dim(),__core_get_group_id(),__core_get_global_id(),__core_get_local_id(),__core_get_num_groups(),__core_get_global_size(),__core_get_local_size(),__core_get_global_offset(),__core_full_barrier(),__core_shared_local_barrier(), and__core_global_barrier(), required builtins that core targets must replace.
0.20.1
Add
core_source_type_llvm_60andcore_source_capabilities_llvm_60for supporting the latest version of LLVM.
0.20.0
Add
<client>PushReadBufferRegions()to allow for multiple regions within a source buffer to be copied to a destination host pointer.Add
<client>WriteCopyBufferRegions()to allow for multiple regions within a host pointer to be copied to a destination buffer.Add
<client>PushCopyBufferRegions()to allow for multiple regions within a source buffer to be copied to a destination buffer.Add
core_buffer_regions_info_sas a helper struct to specify to the new entry points above what source offset, destination offset, and size to use for each region.
0.19.2
Add
max_subgroup_sizetocore_device_sto represent the maximum subgroup size for kernels on a device, anddynamic_subgroup_sizetocore_scheduled_kernel_sto represent the actual subgroup size for that scheduled kernel.
0.19.1
Add
core_source_type_llvm_50tocore_source_flags_eto allow input binaries to be from LLVM 5.0.Add
core_source_capabilities_llvm_50tocore_source_capabilities_eto allow input binaries to be from LLVM 5.0.
0.19.0
Add
core_device_targument to create entry points which were not already passed a device making the API consistent across all create and destroy functions.
0.18.1
Add
__core_dma_read_1d(),__core_dma_read_2d(), and__core_dma_wait()functions as builtins that core targets must replace if they use the automatic DMA.
0.18.0
Add
core_allocator_infoargument to all entry points which perform host allocations to support Vulkan style user allocator override.Change order of entry points so that
<client>Create<Object>is directly before<client>Destroy<Object>.
0.17.3
Add
compute_unitstocore_device_sto let implementations pass information on how many compute units their device has.
0.17.2
Add
device_prioritytocore_device_s. This is used to keep track of device priorities when returning default devices.
0.17.1
Add
__core_isftz()function as a required builtin that core targets must replace.
0.17.0
Add support for multiple memory heaps.
Add
supported_heapsbitfield tocore_memory_requirements_sallowing the client target to state which heaps are supported for a specific buffer or image.Change
core_buffer_tto have amemory_requirementsdata member, replacingsizeand adding support for specifyingalignmentandsupported_heaps.Add
heapargument to<client>AllocateMemoryto specify the heap to allocate memory from.
0.16.0
Added
native_vector_widthandpreferred_vector_widthtocore_device_tto let devices expose what vector width (in bytes) their hardware is, and what size of vectors they would prefer implementations give them.
0.15.0
Added
preferred_local_size_x,preferred_local_size_y, andpreferred_local_size_ztocore_kernel_tto let implementations pass information on what would be a suitable local work group size to use for a given kernel.
0.14.0
Removed
<client>PushTerminate()as it put a higher burden on client targets than was necessary.
0.13.0
Add
<client>GetBinary()to retrieve the binary representation of acore_executable_t.Add
core_source_type_binarytocore_source_flags_eto allow the input to be a binary for the given core target.Add
core_source_capabilities_binarytocore_source_capabilities_eto allow a core target to advertise it can support creating executables from binaries.Rename
<client>CreateQueue()to<client>GetQueue()and change the function signature to take two extra parameters for the queue type and index.core_queue_t‘s now belong to the device, and are queried from the device, rather than an arbitrary number of them being created (which simplifies the engineering effort required by our customers).Add new enum
core_queue_type_eto denote all possible types of queue we can support - at present this only containscore_queue_type_compute, but is available for extension later.Add new field to
core_device_tto query the number of queues of eachcore_queue_type_ea device supports.Remove
<client>DestroyQueue(), as queues are now implicitly destroyed when the device they were retrieved from is destroyed.
0.12.4
Fix bug in
core::util::allocator::createwhere references were not correctly passed through to the constructor of the object being created.
0.12.3
Add
core_source_type_llvm_40tocore_source_flags_eto allow input binaries to be from LLVM 4.0.Add
core_source_capabilities_llvm_40tocore_source_capabilities_eto allow input binaries to be from LLVM 4.0.
0.12.2
Add
core_executable_flags_no_opttocore_executable_flags_e.Change semantics of
core_executable_flags_debugto mean built with debug info.
0.12.1
Add
core_executable_flags_soft_mathtocore_executable_flags_eto force finalization to occur using software math builtins.
0.12.0
Add
max_work_group_size_x,max_work_group_size_yandmax_work_group_size_ztocore_device_t.
0.11.1
Add
CORE_NULL_IDpreprocessor definition to be used by clients when initializingcore_<object>_s::id.
0.11.0
Add ID types
core_id_t,core_object_id_t,core_target_id_t.Generate
core_target_id_eenum incore/coreConfig.hfrom list of registered targets.Add
core_id_t idmember to all objects created by clients.Add missing
core_device_tparameter to<client>ListSnapshotPoints.Add
core/util/id.hutility header for working with object ID’s.
0.10.0
Added
builtins_type,builtins, andbuiltins_lengthparameters to<client>CreateFinalizer()to pass the compute APIs standard library to the core client target for linking. Client targets must now link in the builtin function definitions themselves to use our provided implementations. By moving the responsibility for linking to the client target, clients now have a mechanism to intercept any of the builtin functions with target specific optimizations, before linking in any remaining builtins that the client does not have optimized support for.
0.9.0
Remove no longer required
page_sizefromcore_device_t.Renamed
core_descriptor_info_shared_scratch_stocore_descriptor_info_shared_local_buffer_sto be more consistent with our naming.Renamed
core_descriptor_info_type_shared_scratchtocore_descriptor_info_type_shared_local_bufferto be more consistent with our naming.
0.8.1
Add overload to
core::allocator::alloc()which takes a non-template alignment parameter.
0.8.0
Add
image3d_writesflag tocore_device_sto signify support for writing to 3D images.
0.7.0
Add
<client>FlushMappedMemoryToDevice()to synchronize device memory with data currently residing in host memory.Add
<client>FlushMappedMemoryFromDevice()to synchronize host memory with data currently residing in device memory.Remove
flagsparameter tocoreMapMemory(), use<client>FlushMappedMemoryToDevice()and<client>FlushMappedMemoryFromDevice()to perform flushing instead.Remove
core_mapping_type_e,coreMapMemory()andcoreUnmapMemory()are no longer required to synchronize memory.
0.6.2
Remove
max_instructions_issued_per_cyclefromcore_device_sas it is no longer a required (or useful) piece of functionality to require our customers to guestimate.
0.6.1
Change
core_source_type_eandcore_source_capabilities_eto be the LLVM version of the bitcode module being passed in (which more correctly fits our usage).LLVM bitcode modules being passed in with
core_source_type_llvm_38andcore_source_type_llvm_39must have the “unknown-unknown-unknown” target triple now.
0.6.0
Add function
<client>ListSnapshotPointsto retrieve the list of compilation stages snapshots can be taken at in partner code.Add function
<client>SetSnapshotPointto set a snapshot point in partner code.Add enum
core_snapshot_type_eto describe snapshot formats.Add typedef
core_snapshot_callback_tto describe the function prototype for the callback invoked when a snapshot point is hit.
0.5.0
Add struct
core_semaphore_srepresenting a device semaphore object.Add function
<client>CreateSemaphoreto create device semaphore objects.Add function
<client>DestroySemaphoreto destroy device semaphore objects.Add function
<client>TryWaitto try and wait on command groups.Change
<client>Dispatchto include two arrays of semaphores, one to wait on before beginning execution of the command group, and one to signal when the command group has completed executing.Change
<client>Dispatchto include a command group complete callback and user data.Add
core_error_command_group_failuretocore_error_eenum to signal that a command group that was waited on failed.Add
core_error_command_group_wait_semaphore_failuretocore_error_eenum to signal that a command group that was waiting on another command group via a semaphore failed because the other command group failed.Add
core_error_command_group_not_readytocore_error_eenum to signal that a command group that was waited on was not yet complete.Add extra parameter to
<client>PushFillImageto specify the size of the user memory being passed in as the color parameter.Add function
<client>PushTerminateto signal that a command group should terminate, and any semaphore in the chain of waits on it, should not execute.Add function
<client>ResetCommandGroupto reset a command group such that it has no previous commands enqueued within it.
0.4.0
Add struct
core_image_srepresenting a device image object.And struct
core_sampler_srepresenting a device sampler object.Update struct
core_device_sto contain the devices image capabilities.Change enum
core_memory_type_eintocore_memory_property_eto describe the desired memory properties for an allocation,core_memory_type_ewas too restrictive and did not allow implementation ofCL_MEM_OBJECT_IMAGE1D_BUFFER.Add struct
core_memory_requirements_sto describe the device memory allocation requirements of acore_buffer_tor acore_image_t.Add struct
core_offset_3d_tto describe the offset into an image.Add struct
core_extent_3d_tto describe the region of an image.Add enum
core_image_type_eto describe the type of an image.Add enum
core_image_format_eto describe the format on an image.Add enum values
core_descriptor_info_type_imageandcore_descriptor_info_type_samplertocore_descriptor_info_type_e.Add enum
core_address_mode_eto describe sampler addressing modes.Add enum
core_filter_mode_eto describe sampler filter modes.Change
<client>AllocateMemoryto accept a bitfield ofcore_memory_property_eAdd function
<client>CreateImageto create device image objects.Add function
<client>DestroyImageto destroy device image objects.Add function
<client>BindImageMemoryto bind device memory to an image object.Add function
<client>GetSupportedImageFormatsto query the device for supported image formats.Add function
<client>PushReadImageto read an image in a command group.Add function
<client>PushWriteImageto write an image in a command group.Add function
<client>PushFillImageto fill an image in a command group.Add function
<client>PushCopyImageto copy and image to another in a command group.Add function
<client>PushCopyImageToBufferto copy an image to a buffer in a command group.Add function
<client>PushCopyBufferToImageto copy a buffer to an image in a command group.
0.3.1
Fixed
core_memory_type_e- it should have been a bitfield.Fixed core.h C compilation issue (enum types are called
enum <type>).
0.3.0
Added enum
core_executable_flags_efor build flags.Added
build_flagsfield to executable representing compilation/linking options set for the module.Added
build_flagsparameter to function<client>CreateExecutable.
0.2.0
Add handle
core_memory_tto take sole ownership of device memory allocations in preparation for image support.Add struct
core_memory_s.Add functions
<client>AllocateMemoryand<client>FreeMemoryto handle device memory allocations.Add function
<client>BindBufferMemoryto associate a device memory allocation with a buffer object. This also adds first class support to the API forclCreateSubBuffer.Add enum
core_memory_type_eused to specify if an allocation should support buffers, images, or both buffers and images. Add typedef to the definition to allow passing as a function parameter.Combine
core_buffer_mapping_type_eandcore_buffer_unmapping_type_eand rename the enum tocore_mapping_type_e. Add typedef to definition to allow passing as a function parameter.Simplify function
<client>CreateBufferto remove allocation specific parameters.Add
core_device_tparameter to function<client>DestroyBuffer.Remove functions
<client>MapBufferand<client>UnmapBuffer, this functionality now applies tocore_memory_tallocations.Add functions
<client>MapMemoryand<client>UnmapMemoryreplacing the buffer specific variety.Remove member
devicefrom structcore_buffer_s,deviceis now passed to API functions instead.
0.1.3
Fix documentation for API function
<client>CreateSpecializedKernel.
0.1.2
Removed
CORE_DEVICE_KHRONOS_CODEPLAY_IDandCORE_DEVICE_KHRONOS_CODEPLAY_NAMEas they are specific to the Codeplay backends.Added enum
core_floating_point_capabilities_efor floating point support.Added
half_capabilitiesto device for what half floating point mode is supported.Added
float_capabilitiesto device for what floating point mode is supported.Added
double_capabilitiesto device for what double floating point mode is supported.Added enum
core_shared_local_memory_type_efor local memory types.Added
shared_local_memory_typeto device for the type of shared local memory the device supports.Added
shared_local_memory_sizeto device for the size of the shared local memory the device has.
0.1.1
Added enum
core_cache_capabilities_efor read/write caching.Added
cache_capabilitiesfield to device for what caching is supported.Added
cache_sizefield to device for the size of the cache supported.Added
cacheline_sizefield to device for the length of a line within the cache.
0.1.0
Replace
<client>_hookwith<client>CreateDevices, adding support for multiple devices per target.
0.0.0
Add version to XML schema and generated headers.
Add compile time check for matching versions of all registered targets.