Traditional Culture Encyclopedia - Traditional festivals - Some common acoustic features of speech recognition files

Some common acoustic features of speech recognition files

* Linear prediction coefficient LPC: The linear prediction analysis starts with the mechanism of human vocalization. Through the study of the short tube cascade model of sound channels, it is considered that the transfer function of the system conforms to the form of all-pole digital filter, so that the signal of n times can be estimated by the linear combination of the previous signals. The linear prediction coefficient LPC can be obtained by minimizing the mean square error between the actual speech sample value and the linear prediction sample value LMS. The calculation methods of LPC include autocorrelation method (Durbin method), covariance method and lattice method. Fast and effective calculation ensures the wide use of this acoustic feature. Similar to the LPC prediction parameter model, the acoustic characteristics of the cable spectrum pair LSP, reflection coefficient and so on.

Cepstrum coefficient CEP: Cepstrum coefficient can be obtained by using homomorphic processing method, taking logarithm after discrete Fourier transform DFT of speech signal, and then inverse transform iDFT. For LPC Cepstrum (LPCCEP), after obtaining the linear prediction coefficient of the filter, it can be calculated by a recursive formula. Experiments show that cepstrum can improve the stability of characteristic parameters.

* Mel cepstrum coefficient MFCC and perceptual linear prediction PLP: Unlike LPC and other acoustic features obtained by studying human vocal mechanism, Mel cepstrum coefficient MFCC and perceptual linear prediction PLP are acoustic features derived from the research results of human auditory system. The study of human auditory mechanism shows that when two tones with similar frequencies are emitted at the same time, people can only hear one tone. Critical bandwidth refers to such a bandwidth boundary that people's subjective feelings suddenly change. When the frequency difference between two tones is less than the critical bandwidth, people will listen to two tones as one, which is the so-called shielding effect. Mel scale is one of the methods to measure this critical bandwidth.

The calculation of MFCC firstly transforms the time domain signal into frequency domain through FFT, then convolves its logarithmic energy spectrum with triangular filter banks distributed according to Mel scale, and finally performs discrete cosine transform (DCT) on the vector formed by the output of each filter to obtain the first n coefficients. PLP still uses Durbin method to calculate LPC parameters, but it also uses DCT method to calculate autocorrelation parameters.