This function, `compute_rnn`, appears to simulate or process audio data...

August 26, 2025 at 09:00 AM

void compute_rnn(const RNNoise *model, RNNState *rnn, float *gains, float *vad, const float *input, int arch) {
  float tmp[MAX_NEURONS];
  float cat[CONV2_OUT_SIZE + GRU1_OUT_SIZE + GRU2_OUT_SIZE + GRU3_OUT_SIZE];
  /*for (int i=0;i<INPUT_SIZE;i++) printf("%f ", input[i]);printf("\n");*/
  compute_generic_conv1d(&model->conv1, tmp, rnn->conv1_state, input, CONV1_IN_SIZE, ACTIVATION_TANH, arch);
  compute_generic_conv1d(&model->conv2, cat, rnn->conv2_state, tmp, CONV2_IN_SIZE, ACTIVATION_TANH, arch);
  compute_generic_gru(&model->gru1_input, &model->gru1_recurrent, rnn->gru1_state, cat, arch);
  compute_generic_gru(&model->gru2_input, &model->gru2_recurrent, rnn->gru2_state, rnn->gru1_state, arch);
  compute_generic_gru(&model->gru3_input, &model->gru3_recurrent, rnn->gru3_state, rnn->gru2_state, arch);
  RNN_COPY(&cat[CONV2_OUT_SIZE], rnn->gru1_state, GRU1_OUT_SIZE);
  RNN_COPY(&cat[CONV2_OUT_SIZE+GRU1_OUT_SIZE], rnn->gru2_state, GRU2_OUT_SIZE);
  RNN_COPY(&cat[CONV2_OUT_SIZE+GRU1_OUT_SIZE+GRU2_OUT_SIZE], rnn->gru3_state, GRU3_OUT_SIZE);
  compute_generic_dense(&model->dense_out, gains, cat, ACTIVATION_SIGMOID, arch);
  compute_generic_dense(&model->vad_dense, vad, cat, ACTIVATION_SIGMOID, arch);
  /*for (int i=0;i<22;i++) printf("%f ", gains[i]);printf("\n");*/
  /*printf("%f\n", *vad);*/
  }

This function, compute_rnn, appears to simulate or process audio data using a recurrent neural network (RNN) structure for tasks like voice activity detection (VAD) and gain computation. It does the following:

Function Input Summary

RNNoise *model: A neural network model, presumably pre-trained, that contains the weights (parameters) and configurations for various neural network layers.
RNNState *rnn: The current RNN state for this instance. This likely includes the state for recurrent layers (e.g., GRUs) and other intermediate states.
float *gains: A pointer to store the resulting gains after processing (used for audio processing or enhancement).
float *vad: A pointer to store the voice activity detection result after processing.
const float *input: The input audio features or data being processed.
int arch: Specifies the architecture, likely indicating various optimization settings (e.g., instruction set, hardware optimizations).

Function Logic and What It Does:

Intermediate Storage Allocation:
- Two arrays, tmp and cat, are created to hold intermediate layer outputs:
  - tmp: Holds the result of the first convolutional operation (conv1).
  - cat: Concatenates outputs from convolutional and GRU layers for later layers to process.
First Convolutional Layer:
- compute_generic_conv1d: Applies a 1D convolution operation on the input using model->conv1, with a tanh activation. The result is stored in tmp.
Second Convolutional Layer:
- compute_generic_conv1d: Applies another 1D convolution using model->conv2 (another part of the model), this time using the output of the first convolution as input. The result is stored in cat.
Recurrent Layers (GRUs):
- Three GRU (Gated Recurrent Unit) layers are sequentially computed:
  - gru1 processes the cat array created so far and updates the gru1_state.
  - gru2 takes gru1_state as input and updates the gru2_state.
  - gru3 builds further upon gru2_state, updating the gru3_state.
Internal State Concatenation:
- The cat array is updated to include outputs from the convolution and GRU layers:
  - Output from gru1, gru2, and gru3 states is concatenated to the cat array after the convolution output.
Dense Layers for Outputs:
- Two dense (fully-connected) layers are computed:
  - One (dense_out) processes cat to compute gains (values for enhancing or modifying audio).
  - Another (vad_dense) processes cat to compute vad (a single value likely representing whether speech is detected).
Commented Debugging Code:
- The function contains commented-out debug lines (e.g., printing input, gains, and vad) to display intermediate results, useful during development or debugging.

Summary

This function processes input audio data (or audio features) through a series of layers including 1D convolutions, GRUs, and dense layers. It generates two outputs:

gains: Likely used for audio enhancement (e.g., noise suppression or equalization).
vad: A score indicating the presence of speech (Voice Activity Detection).

This function clearly belongs to the processing pipeline of a system like RNNoise for speech enhancement or noise reduction.

Generate your own explanations

Download our vscode extension

Read other generated explanations

Built by @thebuilderjr