Speech parameterization using the Mel scale Part II. T. Thrasyvoulou and S. Benton

Speech parameterization using the scale Part II T. Thrasyvoulou and S. Benton

Speech Cepstrum procedure Speech Pre-emphasis Framing Windowing For FFT based Cepstrum FFT Cepstrum OR LPC LPC For LPC based Cepstrum

Pre-emphasis/Framing/Windowing -2ms frames Pre-emphasis Framing Windowing H(z)=+az - a=-.95 Used to boost the signal spectrum approximately 2dB/decade Hamming window 2 π n.54.46cos( ) N Reduces the edge effect when taking the FFT Improves spectral estimate accuracy

Speech Cepstrum procedure For FFT based Cepstrum Speech Pre-emphasis Framing Windowing FFT Cepstrum

FFT FFT FFT 2 Frame (Frequency domain) Frame (Time domain) FFT of length N

Speech Cepstrum procedure For LPC based Cepstrum Speech Pre-emphasis Framing Windowing LPC LPC Cepstrum

LPC LP coefficients Frame (Time domain) LPC [a i ] LPC LP coefficients [a i ] Frequency Response Hz () = G M + i i= i az 2 Reduces the spectrum variance Eliminates various source information

Speech Cepstrum procedure Speech Pre-emphasis Framing Windowing For FFT based Cepstrum FFT Cepstrum OR LPC LPC For LPC based Cepstrum

FFT or LPC s(k) Filter Bank scale filter bank.9.8.7 Gain.6.5.4 # samples = # filters in filter bank.3.2. 5 5 2 25 3 35 4 Filter bank

Filter bank frequency table M l ( k) l =,,... L k =,,..., N / 2 l = M ( k) l l =,,..., L k =,,..., N / 2 4KHz Hz l =8 33Hz 4KHz center frequency

Filter bank 2 N/2 points 2 L=2 points L<N Multiply the power spectrum with each of the triangular weighting filters and add the result Perform a weighted averaging procedure around the frequency, similar to a weighted Daniel periodogram

Filter bank Half the FFT size Original Total number of triangular weighing filters (2) N /2 Sl %( ) = SkM ( ) l ( k) l=,,..., L k = Will get the whole range of frequencies but only L samples k th l kf > s N Filter from filter bank Hz

spectrum plots FFT-based Filter Bank LPC-based Filter Bank

spectrum plots Amplitude Amplitude 2.5 2.5.5 3 x -4 FFT based 2 3 4 8 6 4 2 LPC based Normalized Amplitude Normalized Amplitude.8.6.4.2 -FFT based 2 3 4.8.6.4.2 -LPC based BW of the peak expands because of the loss of resolution in scale Peaks that are not located exactly at the center frequency of their corresponding filter may move to center on the frequency e.g. Hz peak Hz 2 3 4 2 3 4

spectrum plots Amplitude 5 x -5 FFT based 4 3 2 Normalized Amplitude.8.6.4.2 -FFT based Loss of resolution Maintains prominent peaks 2 3 4 2 3 4 6 LPC based -LPC based Amplitude 5 4 3 2 Normalized Amplitude.8.6.4.2 2 3 4 2 3 4

spectrum plots Amplitude.25.2.5..5 FFT based Normalized Amplitude.8.6.4.2 -FFT based Small peaks close to large ones are lumped into one main peak 2 3 4 2 3 4 2 LPC based -LPC based Amplitude 5 5 Normalized Amplitude.8.6.4.2 2 3 4 2 3 4

Speech Cepstrum procedure Speech Pre-emphasis Framing Windowing For FFT based Cepstrum FFT Cepstrum OR LPC LPC For LPC based Cepstrum

Speech Cepstrum parameterization Cepstrum Total number of triangular weighing filters (2) Log() DCT DCT Equation: L 2 πi ci ( ) = log( Sm % ( ))cos ( m.5) i=,,... C L m= L # cepstral coefficients desired

Scaled Cepstral Coefficients Scaled Cepstral Coeff.5 FFT based LP C ba s e d Normalized Amplitude -.5-2 4 6 8 2 4 Coefficient number

Why the DCT? The signal is real with mirror symmetry The IFFT requires complex arithmetic The DCT does NOT The DCT implements the same function as the FFT more efficiently by taking advantage of the redundancy in a real signal. The DCT is more efficient computationally

Conclusions LPC approximates speech linearly at all frequencies LPCC are more robust and reliable than LPC alone -scaled LPCC and -scaled FFT-CC are robust and also take into account the psychoacoustic properties of the human auditory system

References [] Wong, E. and Sridharan, S. Comparison of linear prediction cepstrum coefficients and -frequency cepstrum coefficients for language identification, Intelligent Multimedia, Video and Speech Processing Proceedings, pp. 95-98, 2. [2] Molau, S., Pitz, M., Schluter, R. and Ney, H., Computing -frequency cepstral coefficients on the power spectrum, Acoustics, Speech, and Signal Processing Proceedings, Volume:, pp. 73-76, 2. [3] De Lima Araujo, A.M. and Violaro, F., Formant frequency estimation using a -scale LPC algorithm, ITS '98 Proceedings, Volume:, pp. 27-22, 998. [4] Picone, J.W., Signal modeling techniques in speech recognition, Proceedings of the IEEE, Volume: 8, Issue: 9, pp. 25-247 Sep 993. [5] Umesh, S., Cohen, L. and Nelson, D., Frequency warping and the scale, IEEE Signal Processing Letters, Volume: 9, Issue: 3, pp. 4-7, 22. [6] http://www-2.cs.cmu.edu/~mseltzer/sphinxman/