This function, `rnn_pitch_filter`, applies a filtering process in the frequency...

August 27, 2025 at 01:36 AM

void rnn_pitch_filter(kiss_fft_cpx *X, const kiss_fft_cpx *P, const float *Ex, const float *Ep,
                  const float *Exp, const float *g) {
  int i;
  float r[NB_BANDS];
  float rf[FREQ_SIZE] = {0};
  float newE[NB_BANDS];
  float norm[NB_BANDS];
  float normf[FREQ_SIZE]={0};
  for (i=0;i<NB_BANDS;i++) {
#if 0
    if (Exp[i]>g[i]) r[i] = 1;
    else r[i] = Exp[i]*(1-g[i])/(.001 + g[i]*(1-Exp[i]));
    r[i] = MIN16(1, MAX16(0, r[i]));
#else
    if (Exp[i]>g[i]) r[i] = 1;
    else r[i] = SQUARE(Exp[i])*(1-SQUARE(g[i]))/(.001 + SQUARE(g[i])*(1-SQUARE(Exp[i])));
    r[i] = sqrt(MIN16(1, MAX16(0, r[i])));
#endif
    r[i] *= sqrt(Ex[i]/(1e-8+Ep[i]));
  }
  interp_band_gain(rf, r);
  for (i=0;i<FREQ_SIZE;i++) {
    X[i].r += rf[i]*P[i].r;
    X[i].i += rf[i]*P[i].i;
  }
  compute_band_energy(newE, X);
  for (i=0;i<NB_BANDS;i++) {
    norm[i] = sqrt(Ex[i]/(1e-8+newE[i]));
  }
  interp_band_gain(normf, norm);
  for (i=0;i<FREQ_SIZE;i++) {
    X[i].r *= normf[i];
    X[i].i *= normf[i];
  }
}

This function, rnn_pitch_filter, applies a filtering process in the frequency domain typically used for modifying or enhancing audio signals in applications like speech processing, noise suppression, or pitch filtering. Here's a breakdown of what the code does:

Key Steps:

Inputs:
- kiss_fft_cpx *X: The current frequency-domain representation of the signal (complex values).
- const kiss_fft_cpx *P: The pitch signal in the frequency domain (also complex values).
- const float *Ex: Array representing the energy of the input signal in the frequency domain.
- const float *Ep: Array representing the energy of the pitch signal in the frequency domain.
- const float *Exp: Array representing the correlation between the signal and the pitch.
- const float *g: Gain values or thresholds.
Compute r (band gain factors):
- For each frequency band (controlled by NB_BANDS), a gain factor r[i] is calculated, which determines how strongly the pitch signal should influence the input signal.
- If the pitch correlation Exp[i] is higher than the gain limit g[i], r[i] is set to 1 (maximum influence). Otherwise, it uses a formula involving squared values (Exp and g) to scale the gain for pitch correlation. The gain r[i] is clipped to ensure it's in the range [0, 1].
- Additionally, r[i] is adjusted by the ratio of the input energy Ex[i] to the pitch energy Ep[i] using a square root to normalize the scaling.
Interpolate Gain for Each Frequency Bin:
- The per-band gain factors r are interpolated over all frequency bins to form rf, representing smooth gain values for every frequency bin.
- This interpolation is done by the interp_band_gain function (not provided in the snippet).
Modify the Input Signal with the Pitch Signal:
- For every frequency bin i (controlled by FREQ_SIZE), the real and imaginary parts of the input spectrum X[i] are incremented by their corresponding components from the pitch signal P[i], scaled by the interpolated gain rf[i].
Normalize Energy of the Updated Signal:
- The energy of the modified signal X is computed into newE across bands.
- A normalization factor norm[i] is calculated per band, based on the ratio of original energy Ex[i] to the updated energy newE[i].
- The band-wise normalization factors norm are interpolated across all frequency bins into normf.
Apply Normalization:
- The reconstructed signal X is normalized in magnitude by multiplying all real and imaginary components by the interpolated normalization factors normf.

Purpose:

The primary goal of this code is to implement a frequency-domain adjustment that filters the input signal X based on its pitch correlation P while ensuring energy consistency. Here's what it does in context:

Modifies the signal based on its relationship to a pitch signal, with adaptive gain control (r[i]).
Ensures smooth transitions by interpolating band gains over individual frequency bins.
Normalizes the filtered signal to maintain consistent energy levels.

Applications:

This type of processing is commonly used in:

Speech enhancement: Aligns or emphasizes components of the signal based on pitch information, e.g., enhancing voiced speech.
Audio source separation: Extracts specific sources corresponding to pitch or harmonic information.
Noise suppression: Reduces noise by keeping the desired pitch-related components while suppressing others.

Generate your own explanations

Download our vscode extension

Read other generated explanations

Built by @thebuilderjr