This function, `rnn_pitch_filter`, applies a filtering process in the frequency...

August 27, 2025 at 01:36 AM

void rnn_pitch_filter(kiss_fft_cpx *X, const kiss_fft_cpx *P, const float *Ex, const float *Ep, const float *Exp, const float *g) { int i; float r[NB_BANDS]; float rf[FREQ_SIZE] = {0}; float newE[NB_BANDS]; float norm[NB_BANDS]; float normf[FREQ_SIZE]={0}; for (i=0;i<NB_BANDS;i++) { #if 0 if (Exp[i]>g[i]) r[i] = 1; else r[i] = Exp[i]*(1-g[i])/(.001 + g[i]*(1-Exp[i])); r[i] = MIN16(1, MAX16(0, r[i])); #else if (Exp[i]>g[i]) r[i] = 1; else r[i] = SQUARE(Exp[i])*(1-SQUARE(g[i]))/(.001 + SQUARE(g[i])*(1-SQUARE(Exp[i]))); r[i] = sqrt(MIN16(1, MAX16(0, r[i]))); #endif r[i] *= sqrt(Ex[i]/(1e-8+Ep[i])); } interp_band_gain(rf, r); for (i=0;i<FREQ_SIZE;i++) { X[i].r += rf[i]*P[i].r; X[i].i += rf[i]*P[i].i; } compute_band_energy(newE, X); for (i=0;i<NB_BANDS;i++) { norm[i] = sqrt(Ex[i]/(1e-8+newE[i])); } interp_band_gain(normf, norm); for (i=0;i<FREQ_SIZE;i++) { X[i].r *= normf[i]; X[i].i *= normf[i]; } }

This function, rnn_pitch_filter, applies a filtering process in the frequency domain typically used for modifying or enhancing audio signals in applications like speech processing, noise suppression, or pitch filtering. Here's a breakdown of what the code does:

Key Steps:

  1. Inputs:

    • kiss_fft_cpx *X: The current frequency-domain representation of the signal (complex values).
    • const kiss_fft_cpx *P: The pitch signal in the frequency domain (also complex values).
    • const float *Ex: Array representing the energy of the input signal in the frequency domain.
    • const float *Ep: Array representing the energy of the pitch signal in the frequency domain.
    • const float *Exp: Array representing the correlation between the signal and the pitch.
    • const float *g: Gain values or thresholds.
  2. Compute r (band gain factors):

    • For each frequency band (controlled by NB_BANDS), a gain factor r[i] is calculated, which determines how strongly the pitch signal should influence the input signal.
    • If the pitch correlation Exp[i] is higher than the gain limit g[i], r[i] is set to 1 (maximum influence). Otherwise, it uses a formula involving squared values (Exp and g) to scale the gain for pitch correlation. The gain r[i] is clipped to ensure it's in the range [0, 1].
    • Additionally, r[i] is adjusted by the ratio of the input energy Ex[i] to the pitch energy Ep[i] using a square root to normalize the scaling.
  3. Interpolate Gain for Each Frequency Bin:

    • The per-band gain factors r are interpolated over all frequency bins to form rf, representing smooth gain values for every frequency bin.
    • This interpolation is done by the interp_band_gain function (not provided in the snippet).
  4. Modify the Input Signal with the Pitch Signal:

    • For every frequency bin i (controlled by FREQ_SIZE), the real and imaginary parts of the input spectrum X[i] are incremented by their corresponding components from the pitch signal P[i], scaled by the interpolated gain rf[i].
  5. Normalize Energy of the Updated Signal:

    • The energy of the modified signal X is computed into newE across bands.
    • A normalization factor norm[i] is calculated per band, based on the ratio of original energy Ex[i] to the updated energy newE[i].
    • The band-wise normalization factors norm are interpolated across all frequency bins into normf.
  6. Apply Normalization:

    • The reconstructed signal X is normalized in magnitude by multiplying all real and imaginary components by the interpolated normalization factors normf.

Purpose:

The primary goal of this code is to implement a frequency-domain adjustment that filters the input signal X based on its pitch correlation P while ensuring energy consistency. Here's what it does in context:

  • Modifies the signal based on its relationship to a pitch signal, with adaptive gain control (r[i]).
  • Ensures smooth transitions by interpolating band gains over individual frequency bins.
  • Normalizes the filtered signal to maintain consistent energy levels.

Applications:

This type of processing is commonly used in:

  • Speech enhancement: Aligns or emphasizes components of the signal based on pitch information, e.g., enhancing voiced speech.
  • Audio source separation: Extracts specific sources corresponding to pitch or harmonic information.
  • Noise suppression: Reduces noise by keeping the desired pitch-related components while suppressing others.
Generate your own explanations
Download our vscode extension
Read other generated explanations

Built by @thebuilderjr
Sponsored by beam analytics
Read our terms and privacy policy
Forked from openai-quickstart-node