This code defines a function `compute_rnn` that performs computations for...

August 27, 2025 at 01:28 AM

void compute_rnn(const RNNoise *model, RNNState *rnn, float *gains, float *vad, const float *input, int arch) {
  float tmp[MAX_NEURONS];
  float cat[CONV2_OUT_SIZE + GRU1_OUT_SIZE + GRU2_OUT_SIZE + GRU3_OUT_SIZE];
  /*for (int i=0;i<INPUT_SIZE;i++) printf("%f ", input[i]);printf("\n");*/
  compute_generic_conv1d(&model->conv1, tmp, rnn->conv1_state, input, CONV1_IN_SIZE, ACTIVATION_TANH, arch);
  compute_generic_conv1d(&model->conv2, cat, rnn->conv2_state, tmp, CONV2_IN_SIZE, ACTIVATION_TANH, arch);
  compute_generic_gru(&model->gru1_input, &model->gru1_recurrent, rnn->gru1_state, cat, arch);
  compute_generic_gru(&model->gru2_input, &model->gru2_recurrent, rnn->gru2_state, rnn->gru1_state, arch);
  compute_generic_gru(&model->gru3_input, &model->gru3_recurrent, rnn->gru3_state, rnn->gru2_state, arch);
  RNN_COPY(&cat[CONV2_OUT_SIZE], rnn->gru1_state, GRU1_OUT_SIZE);
  RNN_COPY(&cat[CONV2_OUT_SIZE+GRU1_OUT_SIZE], rnn->gru2_state, GRU2_OUT_SIZE);
  RNN_COPY(&cat[CONV2_OUT_SIZE+GRU1_OUT_SIZE+GRU2_OUT_SIZE], rnn->gru3_state, GRU3_OUT_SIZE);
  compute_generic_dense(&model->dense_out, gains, cat, ACTIVATION_SIGMOID, arch);
  compute_generic_dense(&model->vad_dense, vad, cat, ACTIVATION_SIGMOID, arch);
  /*for (int i=0;i<22;i++) printf("%f ", gains[i]);printf("\n");*/
  /*printf("%f\n", *vad);*/
}

This code defines a function compute_rnn that performs computations for a Recurrent Neural Network (RNN) used in a system, likely relating to audio signal processing (e.g., noise suppression or speech enhancement). Here's a breakdown of what it does:

Function Inputs

model: A pointer to an RNNoise model structure containing neural network parameters for the RNN (like weights and biases for convolutional layers, GRUs, and dense layers).
rnn: A pointer to an RNNState structure that tracks the RNN's internal states (e.g., hidden states for GRUs, convolutional states).
gains: A pointer to an output array that will store computed gain values (possibly denoising gains applied to frequency bands).
vad: A pointer to an output value for Voice Activity Detection (VAD), indicating the presence or absence of voice.
input: A pointer to the input feature vector (e.g., audio features like Mel-frequency coefficients).
arch: An integer representing the architecture for computations (e.g., CPU or specific accelerators).

Overview of Steps

Temporary Buffers:
- The function initializes two temporary buffers, tmp and cat.
  - tmp has a size of MAX_NEURONS (likely for intermediary convolution layer activations).
  - cat holds concatenated outputs from convolutional and GRU layers to be used as inputs for later dense layers.
Convolutional Layers:
- Two 1D convolutional layers are computed using the function compute_generic_conv1d:
  - The first convolutional layer takes the input and produces activations stored in tmp.
  - The second convolutional layer operates on tmp and produces activations stored in cat.
GRU Layers:
- Three GRU (Gated Recurrent Unit) layers are computed sequentially using the function compute_generic_gru:
  - The first GRU operates on the cat data and stores its hidden states in rnn->gru1_state.
  - The second GRU uses the output of the first GRU (rnn->gru1_state) and updates rnn->gru2_state.
  - The third GRU uses the output of the second GRU (rnn->gru2_state) and updates rnn->gru3_state.
Concatenating GRU Outputs:
- The function then concatenates the outputs of the GRU layers (rnn->gru1_state, rnn->gru2_state, rnn->gru3_state) into the cat array, alongside the output of the second convolutional layer.
Dense Layers:
- Two dense layers are computed on the concatenated data (cat):
  - The first dense layer outputs gains (likely to determine audio gains for various frequency bands), with a sigmoid activation function.
  - The second dense layer outputs the VAD value (vad), also using a sigmoid activation function.

Comments and Logging

There are commented-out print statements for debugging. These would log the raw input, gains, or vad values but are not executed in the current implementation.

Purpose

This function processes a feature vector through a series of convolutional, GRU, and dense layers to produce:

Gain values (gains) for modifying an audio signal (e.g., suppressing noise in frequency bands).
A voice activity detection score (vad) to determine whether a voice is present in the audio.

The computations follow the architecture of an RNN-based model, such as a denoising RNN (e.g., RNNoise).

Generate your own explanations

Download our vscode extension

Read other generated explanations

Built by @thebuilderjr