This code is part of a signal processing system, likely...

August 26, 2025 at 08:38 AM

int rnn_compute_frame_features(DenoiseState *st, kiss_fft_cpx *X, kiss_fft_cpx *P, float *Ex, float *Ep, float *Exp, float *features, const float *in) { int i; float E = 0; float Ly[NB_BANDS]; float p[WINDOW_SIZE]; float pitch_buf[PITCH_BUF_SIZE>>1]; int pitch_index; float gain; float *(pre[1]); float follow, logMax; rnn_frame_analysis(st, X, Ex, in); RNN_MOVE(st->pitch_buf, &st->pitch_buf[FRAME_SIZE], PITCH_BUF_SIZE-FRAME_SIZE); RNN_COPY(&st->pitch_buf[PITCH_BUF_SIZE-FRAME_SIZE], in, FRAME_SIZE); pre[0] = &st->pitch_buf[0]; rnn_pitch_downsample(pre, pitch_buf, PITCH_BUF_SIZE, 1); rnn_pitch_search(pitch_buf+(PITCH_MAX_PERIOD>>1), pitch_buf, PITCH_FRAME_SIZE, PITCH_MAX_PERIOD-3*PITCH_MIN_PERIOD, &pitch_index); pitch_index = PITCH_MAX_PERIOD-pitch_index; gain = rnn_remove_doubling(pitch_buf, PITCH_MAX_PERIOD, PITCH_MIN_PERIOD, PITCH_FRAME_SIZE, &pitch_index, st->last_period, st->last_gain); st->last_period = pitch_index; st->last_gain = gain; for (i=0;i<WINDOW_SIZE;i++) p[i] = st->pitch_buf[PITCH_BUF_SIZE-WINDOW_SIZE-pitch_index+i]; apply_window(p); forward_transform(P, p); compute_band_energy(Ep, P); compute_band_corr(Exp, X, P); for (i=0;i<NB_BANDS;i++) Exp[i] = Exp[i]/sqrt(.001+Ex[i]*Ep[i]); dct(&features[NB_BANDS], Exp); features[2*NB_BANDS] = .01*(pitch_index-300); logMax = -2; follow = -2; for (i=0;i<NB_BANDS;i++) { Ly[i] = log10(1e-2+Ex[i]); Ly[i] = MAX16(logMax-7, MAX16(follow-1.5, Ly[i])); logMax = MAX16(logMax, Ly[i]); follow = MAX16(follow-1.5, Ly[i]); E += Ex[i]; } if (!TRAINING && E < 0.04) { /* If there's no audio, avoid messing up the state. */ RNN_CLEAR(features, NB_FEATURES); return 1; } dct(features, Ly); features[0] -= 12; features[1] -= 4; return TRAINING && E < 0.1; }

This code is part of a signal processing system, likely within an audio-related application such as speech enhancement, noise reduction, or audio feature extraction for machine learning. Here's a breakdown of what this function does:

Function Purpose

The rnn_compute_frame_features function computes various features from an audio frame, which can be used as input to a neural network (such as an RNN - Recurrent Neural Network). The features typically include spectral, pitch-based, energy-based, and other audio characteristics.


Detailed Explanation

Inputs:

  1. DenoiseState *st: A state structure containing buffers, pitch memory, and other information for the signal processing.
  2. kiss_fft_cpx *X: Input FFT coefficients of the current audio frame.
  3. kiss_fft_cpx *P: FFT coefficients related to the pitch frame.
  4. float *Ex: Band energies from the input signal (computed in frequency bands).
  5. float *Ep: Band energies for the pitch signal.
  6. float *Exp: Cross-correlation between input and pitch signals in each band.
  7. float *features: Output feature vector to be computed.
  8. const float *in: Input audio frame samples (e.g., time-domain signal for the current frame).

Function Workflow:

  1. Initial Spectrum Analysis (rnn_frame_analysis):

    • Analyzes the input audio frame in to compute its FFT coefficients and band energies, which are stored in X and Ex.
  2. Pitch Preprocessing:

    • Shift the pitch buffer (st->pitch_buf) to make room for new audio data.
    • Append the current audio frame to the pitch buffer.
    • Downsample the pitch buffer using rnn_pitch_downsample.
  3. Pitch Detection and Refinement:

    • Perform pitch detection using rnn_pitch_search to find the pitch period.
    • Refine the detected pitch, removing possible errors or doubling effects with rnn_remove_doubling.
  4. Pitch-related Feature Computation:

    • Extract a windowed segment of the pitch buffer corresponding to the detected pitch period.
    • Compute FFT (forward_transform) of this pitch signal.
    • Compute band-wise energy and cross-correlation between the pitch signal and the input signal.
  5. Normalization:

    • Normalize cross-correlation features (Exp) using the energy of both the input (Ex) and pitch signals (Ep) to make them scale-invariant.
  6. Spectral Feature Computation:

    • Compute a Discrete Cosine Transform (DCT) on the normalized cross-correlations to derive decorrelated features.
    • Add pitch information as a feature: a scaled version of the pitch period is added to the output.
  7. Log Energy Smoothing and Detection:

    • Calculate the log scale of the band energies (Ex) to derive log-energy features (Ly).
    • Smooth these log-energy features using dynamic rules based on logMax and follow.
  8. Low Energy Handling:

    • If the total energy (E) of the frame is too low (indicating silent audio), the output feature vector is cleared to avoid introducing noisy state interactions.
  9. Final Spectral and Energy Features:

    • Another DCT is applied to Ly for spectral decorrelation.
    • Offsets are applied to make the features suited for the model (subtracting constants from features at index 0 and 1).

Outputs:

  • The function produces a vector of features (features) that describe properties of the audio frame. These features are often used as input to an RNN for further processing, e.g., denoising or noise classification.
  • It returns:
    • 1 if the frame is silent (very low energy).
    • A flag indicating low-energy content if the system is in training mode (for data augmentation or special handling).

Key Tasks Achieved by the Code

  1. Extracts spectral, pitch, and energy features.
  2. Implements pitch tracking and refinement.
  3. Smooths and normalizes features for stability and robustness.
  4. Handles silent frames gracefully to avoid artifacts in processing.

This code is common in audio and speech processing pipelines, where robust feature extraction is critical for a neural network's performance.

Generate your own explanations
Download our vscode extension
Read other generated explanations

Built by @thebuilderjr
Sponsored by beam analytics
Read our terms and privacy policy
Forked from openai-quickstart-node