Primitive Attributes: dropout

Introduction

In many DNN and GNN models, Dropout is used to improve training results. In some cases this layer can take a significant amount of time. To enhance training performance, optimize dropout by fusing it with the primitive.

Implementation

In oneDNN, dropout is a special operation akin to a binary post-op that gets applied to the output values of a primitive right before post-ops. It depends on a deterministic PRNG (current implementation uses a variation of Philox algorithm) and transforms the values as follows:

mask[:]=(PRNG(S,...)>P)dst[:]=mask[:]dst[:]1P

where:

  • mask is the output buffer (always of the same dimensions and usually of the same layout as dst, but potentially differing from it in type that can only be u8) whose values may be either 0 if the corresponding value in dst got zeroed (a.k.a. dropped out) or 1 otherwise

  • S is the integer seed for the PRNG algorithm

  • P is the probability for any given value to get dropped out, 0P1

API

If the dropout operation gets specified in the primitive’s attributes, the user must provide three additional buffers to it on execution:

  • DNNL_ARG_ATTR_DROPOUT_MASK : through this ID the user has to pass the mask output buffer

  • DNNL_ARG_ATTR_DROPOUT_PROBABILITY : this is a single-value f32 input buffer that holds P

  • DNNL_ARG_ATTR_DROPOUT_SEED : this is a single-value s32 input buffer that holds S