Examples and Tutorials#

This page provides an overview of oneDNN examples organized by functionality and use case.

Functional API Examples#

The Functional API provides access to individual oneDNN primitives.

Fundamental Concepts and API Basics#

Example

Description

oneDNN API Basic Workflow Tutorial

This C++ API example demonstrates the basics of the oneDNN programming model.

Memory Format Propagation

This example demonstrates memory format propagation, which is critical for deep learning applications performance.

Reorder between CPU and GPU engines

This C++ API example demonstrates programming flow when reordering memory between CPU and GPU engines.

Interoperability with External Runtimes#

Example

Description

Getting Started with SYCL Extensions API

This C++ API example demonstrates programming for Intel(R) Processor Graphics with SYCL extensions API in oneDNN.

SYCL USM Example

This C++ API example demonstrates programming for Intel(R) Processor Graphics with SYCL extensions API in oneDNN.

Getting started on GPU with OpenCL extensions API

This C++ API example demonstrates programming for Intel(R) Processor Graphics with OpenCL* extensions API in oneDNN.

Matrix Multiplication with Different oneDNN Features#

Basic Operations:

Example

Description

Matmul Primitive Example

This C++ API example demonstrates how to create and execute a MatMul primitive.

MatMul Tutorial: Comparison with SGEMM

C++ API example demonstrating MatMul as a replacement for SGEMM functions.

Quantization flavors:

Example

Description

Matrix Multiplication with f8 Quantization

C++ API example demonstrating how to use f8_e5m2 and f8_e4m3 data types for MatMul with scaling for quantization.

MatMul Tutorial: Quantization

C++ API example demonstrating how one can perform reduced precision matrix-matrix multiplication using MatMul and the accuracy of the result compared to the floating point computations.

MatMul Tutorial: INT8 Inference

C++ API example demonstrating how one can use MatMul fused with ReLU in INT8 inference.

Advanced Usages:

Example

Description

MatMul with Host Scalar Scale example

This C++ API example demonstrates matrix multiplication (C = alpha * A * B) with a scalar scaling factor residing on the host.

MatMul Primitive with Sparse Memory in COO Format

This C++ API example demonstrates how to create and execute a MatMul primitive that uses a source tensor encoded with the COO sparse encoding.

MatMul Primitive with Sparse Memory in CSR Format

This C++ API example demonstrates how to create and execute a MatMul primitive that uses a source tensor encoded with the CSR sparse encoding.

MatMul Primitive Example

This C++ API example demonstrates how to create and execute a MatMul primitive that uses a weights tensor encoded with the packed sparse encoding.

MatMul Tutorial: Weights Decompression

C++ API example demonstrating how one can use MatMul with compressed weights.

Inference and Training#

Neural network implementations demonstrating inference and training workflows:

Type

Precision

Mode

Example

Description

CNN

f32

Inference

CNN f32 inference example

This C++ API example demonstrates how to build an AlexNet neural network topology for forward-pass inference.

CNN

int8

Inference

CNN int8 inference example

This C++ API example demonstrates how to run AlexNet’s conv3 and relu3 with int8 data type.

CNN

f32

Training

CNN f32 training example

This C++ API example demonstrates how to build an AlexNet model training.

CNN

bf16

Training

CNN bf16 training example

This C++ API example demonstrates how to build an AlexNet model training using the bfloat16 data type.

RNN

f32

Inference

RNN f32 Inference Example

This C++ API example demonstrates how to build GNMT model inference.

RNN

int8

Inference

RNN int8 inference example

This C++ API example demonstrates how to build GNMT model inference.

RNN

f32

Training

RNN f32 training example

This C++ API example demonstrates how to build GNMT model training.

Recurrent Neural Networks#

Example

Description

Vanilla RNN Primitive Example

This C++ API example demonstrates how to create and execute a Vanilla RNN primitive in forward training propagation mode.

LSTM RNN Primitive Example

This C++ API example demonstrates how to create and execute an LSTM RNN primitive in forward training propagation mode.

Linear-Before-Reset GRU RNN Primitive Example

This C++ API example demonstrates how to create and execute a Linear-Before-Reset GRU RNN primitive in forward training propagation mode.

AUGRU RNN Primitive Example

This C++ API example demonstrates how to create and execute an AUGRU RNN primitive in forward training propagation mode.

Performance Analysis#

A few techniques for performance measurements:

Example

Description

Matrix Multiplication Performance Example

This C++ example runs a simple matrix multiplication (matmul) performance test using oneDNN.

Performance Profiling Example

This example demonstrates the best practices for application performance optimizations with oneDNN.

Individual Primitives#

Convolution Operations:

Example

Description

Convolution Primitive Example

This C++ API example demonstrates how to create and execute a Convolution primitive in forward propagation mode in two configurations - with and without groups.

Deconvolution Primitive Example

This C++ API example demonstrates how to create and execute a Deconvolution primitive in forward propagation mode.

Linear Operations:

Example

Description

Inner Product Primitive Example

This C++ API example demonstrates how to create and execute an Inner Product primitive.

Pooling and Sampling:

Example

Description

Pooling Primitive Example

This C++ API example demonstrates how to create and execute a Pooling primitive in forward training propagation mode.

Resampling Primitive Example

This C++ API example demonstrates how to create and execute a Resampling primitive in forward training propagation mode.

Normalization Primitives:

Example

Description

Batch Normalization Primitive Example

This C++ API example demonstrates how to create and execute a Batch Normalization primitive in forward training propagation mode.

Group Normalization Primitive Example

This C++ API example demonstrates how to create and execute a Group Normalization primitive in forward training propagation mode.

Layer Normalization Primitive Example

This C++ API example demonstrates how to create and execute a Layer normalization primitive in forward propagation mode.

Local Response Normalization Primitive Example

This C++ API demonstrates how to create and execute a Local response normalization primitive in forward training propagation mode.

Activation Functions:

Example

Description

Element-Wise Primitive Example

This C++ API example demonstrates how to create and execute an Element-wise primitive in forward training propagation mode.

Primitive Example

This C++ API example demonstrates how to create and execute an PReLU primitive in forward training propagation mode.

Softmax Primitive Example

This C++ API example demonstrates how to create and execute a Softmax primitive in forward training propagation mode.

Tensor Operations:

Example

Description

Binary Primitive Example

This C++ API example demonstrates how to create and execute a Binary primitive.

Bnorm u8 by binary post-ops example

The example implements the Batch normalization u8 via the following operations: binary_sub(src, mean), binary_div(tmp_dst, variance), binary_mul(tmp_dst, scale), binary_add(tmp_dst, shift).

Concat Primitive Example

This C++ API example demonstrates how to create and execute a Concat primitive.

Reduction Primitive Example

This C++ API example demonstrates how to create and execute a Reduction primitive.

Sum Primitive Example

This C++ API example demonstrates how to create and execute a Sum primitive.

Shuffle Primitive Example

This C++ API example demonstrates how to create and execute a Shuffle primitive.

Memory Transformations:

Example

Description

Reorder Primitive Example

This C++ API demonstrates how to create and execute a Reorder primitive.

C API Examples#

Example

Description

Reorder between CPU and GPU engines

This C API example demonstrates programming flow when reordering memory between CPU and GPU engines.

CNN f32 inference example

This C API example demonstrates how to build an AlexNet neural network topology for forward-pass inference.

CNN f32 training example

This C API example demonstrates how to build an AlexNet model training. The example implements a few layers from AlexNet model.

Graph API Examples#

The Graph API provides an interface for defining computational graphs with optimization and fusion capabilities.

Getting Started with Graph API#

Example

Description

Getting started on CPU with Graph API

This is an example to demonstrate how to build a simple graph and run it on CPU.

Getting started with SYCL extensions API and Graph API

This is an example to demonstrate how to build a simple graph and run on SYCL device.

Getting started with OpenCL extensions and Graph API

This is an example to demonstrate how to build a simple graph and run on OpenCL GPU runtime.

Advanced Graph API Usage#

Example

Description

Convolution int8 inference example with Graph API

This is an example to demonstrate how to build an int8 graph with Graph API and run it on CPU.

Single op partition on CPU

This is an example to demonstrate how to build a simple op graph and run it on CPU.

Single op partition on GPU

This is an example to demonstrate how to build a simple op graph and run it on GPU.

Microkernel (uKernel) API Examples#

The oneDNN microkernel API is a low-level abstraction for CPU that provides maximum flexibility by allowing users to maintain full control over threading logic, blocking logic, and code customization with minimal overhead.

Example

Description

BRGeMM ukernel example

This C++ API example demonstrates how to create and execute a BRGeMM ukernel.

Running Examples#

Prerequisites and Building Examples#

Before running examples, ensure:

  1. oneDNN is built from source. Note that examples are built automatically when building oneDNN with -DONEDNN_BUILD_EXAMPLES=ON (enabled by default).

  2. Environment is set up and oneDNN libraries are in the path.

Refer to Build from Source for detailed build instructions.

Running Examples#

Most examples accept an optional engine argument (cpu or gpu), and if no argument is provided, example will most likely default to CPU:

Linux/macOS:

# Run on CPU (default)
./examples/getting_started

# Run on CPU explicitly
./examples/getting_started cpu

# Run on GPU (if available)
./examples/getting_started gpu

Windows:

# Run on CPU (default)
examples\getting_started.exe

# Run on CPU explicitly
examples\getting_started.exe cpu

# Run on GPU (if available)
examples\getting_started.exe gpu

Examples will output “Example passed on CPU/GPU.” upon successful completion and display an error status with message otherwise.