Examples and Tutorials#
This page provides an overview of oneDNN examples organized by functionality and use case.
Functional API Examples#
The Functional API provides access to individual oneDNN primitives.
Fundamental Concepts and API Basics#
Example |
Description |
---|---|
This C++ API example demonstrates the basics of the oneDNN programming model. |
|
This example demonstrates memory format propagation, which is critical for deep learning applications performance. |
|
This C++ API example demonstrates programming flow when reordering memory between CPU and GPU engines. |
Interoperability with External Runtimes#
Example |
Description |
---|---|
This C++ API example demonstrates programming for Intel(R) Processor Graphics with SYCL extensions API in oneDNN. |
|
This C++ API example demonstrates programming for Intel(R) Processor Graphics with SYCL extensions API in oneDNN. |
|
This C++ API example demonstrates programming for Intel(R) Processor Graphics with OpenCL* extensions API in oneDNN. |
Matrix Multiplication with Different oneDNN Features#
Basic Operations:
Example |
Description |
---|---|
This C++ API example demonstrates how to create and execute a MatMul primitive. |
|
C++ API example demonstrating MatMul as a replacement for SGEMM functions. |
Quantization flavors:
Example |
Description |
---|---|
C++ API example demonstrating how to use f8_e5m2 and f8_e4m3 data types for MatMul with scaling for quantization. |
|
C++ API example demonstrating how one can perform reduced precision matrix-matrix multiplication using MatMul and the accuracy of the result compared to the floating point computations. |
|
C++ API example demonstrating how one can use MatMul fused with ReLU in INT8 inference. |
Advanced Usages:
Example |
Description |
---|---|
This C++ API example demonstrates matrix multiplication (C = alpha * A * B) with a scalar scaling factor residing on the host. |
|
This C++ API example demonstrates how to create and execute a MatMul primitive that uses a source tensor encoded with the COO sparse encoding. |
|
This C++ API example demonstrates how to create and execute a MatMul primitive that uses a source tensor encoded with the CSR sparse encoding. |
|
This C++ API example demonstrates how to create and execute a MatMul primitive that uses a weights tensor encoded with the packed sparse encoding. |
|
C++ API example demonstrating how one can use MatMul with compressed weights. |
Inference and Training#
Neural network implementations demonstrating inference and training workflows:
Type |
Precision |
Mode |
Example |
Description |
---|---|---|---|---|
CNN |
f32 |
Inference |
This C++ API example demonstrates how to build an AlexNet neural network topology for forward-pass inference. |
|
CNN |
int8 |
Inference |
This C++ API example demonstrates how to run AlexNet’s conv3 and relu3 with int8 data type. |
|
CNN |
f32 |
Training |
This C++ API example demonstrates how to build an AlexNet model training. |
|
CNN |
bf16 |
Training |
This C++ API example demonstrates how to build an AlexNet model training using the bfloat16 data type. |
|
RNN |
f32 |
Inference |
This C++ API example demonstrates how to build GNMT model inference. |
|
RNN |
int8 |
Inference |
This C++ API example demonstrates how to build GNMT model inference. |
|
RNN |
f32 |
Training |
This C++ API example demonstrates how to build GNMT model training. |
Recurrent Neural Networks#
Example |
Description |
---|---|
This C++ API example demonstrates how to create and execute a Vanilla RNN primitive in forward training propagation mode. |
|
This C++ API example demonstrates how to create and execute an LSTM RNN primitive in forward training propagation mode. |
|
This C++ API example demonstrates how to create and execute a Linear-Before-Reset GRU RNN primitive in forward training propagation mode. |
|
This C++ API example demonstrates how to create and execute an AUGRU RNN primitive in forward training propagation mode. |
Performance Analysis#
A few techniques for performance measurements:
Example |
Description |
---|---|
This C++ example runs a simple matrix multiplication (matmul) performance test using oneDNN. |
|
This example demonstrates the best practices for application performance optimizations with oneDNN. |
Individual Primitives#
Convolution Operations:
Example |
Description |
---|---|
This C++ API example demonstrates how to create and execute a Convolution primitive in forward propagation mode in two configurations - with and without groups. |
|
This C++ API example demonstrates how to create and execute a Deconvolution primitive in forward propagation mode. |
Linear Operations:
Example |
Description |
---|---|
This C++ API example demonstrates how to create and execute an Inner Product primitive. |
Pooling and Sampling:
Example |
Description |
---|---|
This C++ API example demonstrates how to create and execute a Pooling primitive in forward training propagation mode. |
|
This C++ API example demonstrates how to create and execute a Resampling primitive in forward training propagation mode. |
Normalization Primitives:
Example |
Description |
---|---|
This C++ API example demonstrates how to create and execute a Batch Normalization primitive in forward training propagation mode. |
|
This C++ API example demonstrates how to create and execute a Group Normalization primitive in forward training propagation mode. |
|
This C++ API example demonstrates how to create and execute a Layer normalization primitive in forward propagation mode. |
|
This C++ API demonstrates how to create and execute a Local response normalization primitive in forward training propagation mode. |
Activation Functions:
Example |
Description |
---|---|
This C++ API example demonstrates how to create and execute an Element-wise primitive in forward training propagation mode. |
|
This C++ API example demonstrates how to create and execute an PReLU primitive in forward training propagation mode. |
|
This C++ API example demonstrates how to create and execute a Softmax primitive in forward training propagation mode. |
Tensor Operations:
Example |
Description |
---|---|
This C++ API example demonstrates how to create and execute a Binary primitive. |
|
The example implements the Batch normalization u8 via the following operations: binary_sub(src, mean), binary_div(tmp_dst, variance), binary_mul(tmp_dst, scale), binary_add(tmp_dst, shift). |
|
This C++ API example demonstrates how to create and execute a Concat primitive. |
|
This C++ API example demonstrates how to create and execute a Reduction primitive. |
|
This C++ API example demonstrates how to create and execute a Sum primitive. |
|
This C++ API example demonstrates how to create and execute a Shuffle primitive. |
Memory Transformations:
Example |
Description |
---|---|
This C++ API demonstrates how to create and execute a Reorder primitive. |
C API Examples#
Example |
Description |
---|---|
This C API example demonstrates programming flow when reordering memory between CPU and GPU engines. |
|
This C API example demonstrates how to build an AlexNet neural network topology for forward-pass inference. |
|
This C API example demonstrates how to build an AlexNet model training. The example implements a few layers from AlexNet model. |
Graph API Examples#
The Graph API provides an interface for defining computational graphs with optimization and fusion capabilities.
Getting Started with Graph API#
Example |
Description |
---|---|
This is an example to demonstrate how to build a simple graph and run it on CPU. |
|
This is an example to demonstrate how to build a simple graph and run on SYCL device. |
|
This is an example to demonstrate how to build a simple graph and run on OpenCL GPU runtime. |
Advanced Graph API Usage#
Example |
Description |
---|---|
This is an example to demonstrate how to build an int8 graph with Graph API and run it on CPU. |
|
This is an example to demonstrate how to build a simple op graph and run it on CPU. |
|
This is an example to demonstrate how to build a simple op graph and run it on GPU. |
Microkernel (uKernel) API Examples#
The oneDNN microkernel API is a low-level abstraction for CPU that provides maximum flexibility by allowing users to maintain full control over threading logic, blocking logic, and code customization with minimal overhead.
Example |
Description |
---|---|
This C++ API example demonstrates how to create and execute a BRGeMM ukernel. |
Running Examples#
Prerequisites and Building Examples#
Before running examples, ensure:
oneDNN is built from source. Note that examples are built automatically when building oneDNN with
-DONEDNN_BUILD_EXAMPLES=ON
(enabled by default).Environment is set up and oneDNN libraries are in the path.
Refer to Build from Source for detailed build instructions.
Running Examples#
Most examples accept an optional engine argument (cpu
or gpu
), and if no argument is provided, example will most likely default to CPU:
Linux/macOS:
# Run on CPU (default) ./examples/getting_started # Run on CPU explicitly ./examples/getting_started cpu # Run on GPU (if available) ./examples/getting_started gpu
Windows:
# Run on CPU (default) examples\getting_started.exe # Run on CPU explicitly examples\getting_started.exe cpu # Run on GPU (if available) examples\getting_started.exe gpu
Examples will output “Example passed on CPU/GPU.” upon successful completion and display an error status with message otherwise.