The code you provided is part of a denoising library,...

August 29, 2025 at 10:15 AM

float rnnoise_process_frame(DenoiseState *st, float *out, const float *in) {
  int i;
  kiss_fft_cpx X[FREQ_SIZE];
  kiss_fft_cpx P[FREQ_SIZE];
  float x[FRAME_SIZE];
  float Ex[NB_BANDS], Ep[NB_BANDS];
  float Exp[NB_BANDS];
  float features[NB_FEATURES];
  float g[NB_BANDS];
  float gf[FREQ_SIZE]={1};
  float vad_prob = 0;
  int silence;
  static const float a_hp[2] = {-1.99599, 0.99600};
  static const float b_hp[2] = {-2, 1};
  rnn_biquad(x, st->mem_hp_x, in, b_hp, a_hp, FRAME_SIZE);
  silence = rnn_compute_frame_features(st, X, P, Ex, Ep, Exp, features, x);

  if (!silence) {
#if !TRAINING
    compute_rnn(&st->model, &st->rnn, g, &vad_prob, features, st->arch);
#endif
    rnn_pitch_filter(st->delayed_X, st->delayed_P, st->delayed_Ex, st->delayed_Ep, st->delayed_Exp, g);
    for (i=0;i<NB_BANDS;i++) {
      float alpha = .6f;
      /* Cap the decay at 0.6 per frame, corresponding to an RT60 of 135 ms.
         That avoids unnaturally quick attenuation. */
      g[i] = MAX16(g[i], alpha*st->lastg[i]);
      /* Compensate for energy change across frame when computing the threshold gain.
         Avoids leaking noise when energy increases (e.g. transient noise). */
      st->lastg[i] = MIN16(1.f, g[i]*(st->delayed_Ex[i]+1e-3)/(Ex[i]+1e-3));
    }
    interp_band_gain(gf, g);
#if 1
    for (i=0;i<FREQ_SIZE;i++) {
      st->delayed_X[i].r *= gf[i];
      st->delayed_X[i].i *= gf[i];
    }
#endif
  }
  frame_synthesis(st, out, st->delayed_X);

  RNN_COPY(st->delayed_X, X, FREQ_SIZE);
  RNN_COPY(st->delayed_P, P, FREQ_SIZE);
  RNN_COPY(st->delayed_Ex, Ex, NB_BANDS);
  RNN_COPY(st->delayed_Ep, Ep, NB_BANDS);
  RNN_COPY(st->delayed_Exp, Exp, NB_BANDS);
  return vad_prob;
}

The code you provided is part of a denoising library, such as RNNoise, which is an open-source noise suppression algorithm designed to remove background noise from audio signals. Let me give a breakdown of what the function float rnnoise_process_frame(DenoiseState *st, float *out, const float *in) does:

Overall Purpose

This function processes a single frame of audio to suppress noise using a combination of signal processing and a recurrent neural network. It takes in audio input (in), denoises it, and produces a cleaned-up output (out).

Details of What Happens in the Code

1. High-Pass Filtering

rnn_biquad(x, st->mem_hp_x, in, b_hp, a_hp, FRAME_SIZE);

The first step is applying a high-pass filter to remove very low-frequency signals. This step helps clean up the data prior to further processing. The filter coefficients (a_hp and b_hp) define the properties of the filter. The result is stored in x.

2. Feature Extraction & Frame Analysis

silence = rnn_compute_frame_features(st, X, P, Ex, Ep, Exp, features, x);

The function calls rnn_compute_frame_features to:

Compute the Short-Time Fourier Transform (STFT) of the input in the frequency domain and store it in X and P.
Calculate features like energy across bands (Ex, Ep, Exp), which are used to characterize the signal.
Extract other features used by the neural network to detect speech and noise.

If the frame is silent (silence flag is set), the function will skip to the final steps without denoising.

3. Neural Network-Based Noise Suppression

compute_rnn(&st->model, &st->rnn, g, &vad_prob, features, st->arch);

This step runs the pre-trained recurrent neural network on the extracted features (features). The network processes these features to predict:

The gain values (g) for different frequency bands. These are used to suppress noisy bands while preserving speech bands.
The voice activity detection (VAD) probability (vad_prob), which indicates how likely the frame contains speech.

This step is skipped during training (indicated by #if !TRAINING).

4. Noise Filtering (Pitch Filtering & Suppression)

rnn_pitch_filter(..., g);

This line applies a pitch-based filter to further refine suppression, ensuring a smoother and more natural sound.

g[i] = MAX16(g[i], alpha * st->lastg[i]);

This block of code ensures the gain (g) does not decay too quickly, maintaining realistic attenuation. It prevents unnatural dips in the audio and avoids transient noise artifacts.

5. Gain Interpolation

interp_band_gain(gf, g);

This interpolates the gain values across the frequency spectrum (gf) to apply a smooth filter instead of abrupt changes. This ensures better quality when suppressing noise.

6. Frequency Domain Modification

for (i=0; i<FREQ_SIZE; i++) {
  st->delayed_X[i].r *= gf[i];
  st->delayed_X[i].i *= gf[i];
}

Here, the frequency components (st->delayed_X) are multiplied by the gain values (gf) to effectively suppress noise in different frequency bands.

7. Signal Reconstruction

frame_synthesis(st, out, st->delayed_X);

The modified frequency-domain signal (st->delayed_X) is converted back into the time-domain using an Inverse Fourier Transform (or similar technique) and saved in out. This produces the final denoised audio frame.

8. State Updates

RNN_COPY(...);

The delayed versions of the STFT results (st->delayed_X, st->delayed_P, etc.) are updated for use in processing the next frame. This ensures continuity between frames and allows for temporal smoothing.

9. Return VAD Probability

return vad_prob;

Finally, the function returns the voice activity detection probability, which indicates the likelihood of speech activity in the processed frame.

In Summary

This function:

Processes a single frame of audio.
Computes features and applies a pre-trained neural network to estimate gains for noise suppression.
Modifies the signal in the frequency domain to suppress noise while preserving speech.
Reconstructs the denoised audio in the time domain.
Maintains internal states for continuous processing.
Returns the VAD probability for potential use by the caller.

Essentially, it is a core part of a real-time noise suppression pipeline, designed to improve audio quality in noisy environments.

Generate your own explanations

Download our vscode extension

Read other generated explanations

Built by @thebuilderjr