Speech parameterization using the Mel scale Part II. T. Thrasyvoulou and S. Benton

Similar documents
Implementation of High Throughput Radix-16 FFT Processor

Advanced Digital Signal Processing Part 4: DFT and FFT

QUANTIFICATION OF TONAL PENALTIES IN ENVIRONMENTAL NOISE ASSESSMENTS

Laboratory Exercise #6

AN10943 Decoding DTMF tones using M3 DSP library FFT function

PHYS 3152 Methods of Experimental Physics I E5. Frequency Analysis of Electronic Signals

Equipment Required. WaveRunner Zi series Oscilloscope 10:1 High impedance passive probe. 3. Turn off channel 2.

Faster GPS via the Sparse Fourier Transform. Haitham Hassanieh Fadel Adib Dina Katabi Piotr Indyk

Faster GPS via the Sparse Fourier Transform. Haitham Hassanieh Fadel Adib Dina Katabi Piotr Indyk

Room Acoustics, PA Systems and Speech Intelligibility for Voice Alarm Systems using the example of the Cruise Center Steinwerder

FREQUENCY ENHANCEMENT OF DUAL-JUNCTION THERMOCOUPLE PROBES

condition monitoring WP4086 Condition Monitoring System The WP4086 CMS System surveys vibrations with up to 8 external accelerometers

WP4086 Condition Monitoring System

Application Note. Technique description CPB measurements give early fault detection with minimal risk of false alarms

Detection of Alarm Sounds in Noisy Environments

A Comparative Study of Different FFT Architectures for Software Defined Radio

A Novel VLSI Based Pipelined Radix-4 Single-Path Delay Commutator (R4SDC) FFT

mach 5 100kHz Digital Joulemeter: M5-SJ, M5-IJ and M5-PJ Joulemeter Probes

Assignment 5: Fourier Analysis Assignment due 9 April 2012, 11:59 pm

Predicting Narrow-band and Wideband Speech Quality with WB-PESQ and TOSQA

Distributed Rayleigh scatter dynamic strain sensing above the scan rate with optical frequency domain reflectometry

Australian Journal of Basic and Applied Sciences. Leak Detection in MDPE Gas Pipeline using Dual-Tree Complex Wavelet Transform

Separation of the Pyro- and Piezoelectric Response of Electroactive Polymers for Sensor Applications

Real time Video Fire Detection using Spatio-Temporal Consistency Energy

OVERVIEW AND FUTURE DEVELOPMENT OF THE NEUTRON SENSOR SIGNAL SELF-VALIDATION (NSV) PROJECT

Frequency (cycles/earthyear)

A 0.75 Million Point Fourier Transform Chip for Frequency Sparse Signals

Loudspeaker s frequency response was measured in-room, using windowed MLS techniques. The 0.5m set-up and results are presented on the pictures below.

Automatic Lyrics Alignment for Cantonese Popular Music

Experimental Dynamic Behaviour and Pedestrian Excited Vibrations Mitigation at Ceramique Footbridge (Maastricht, NL)

International Journal of Engineering Research-Online A Peer Reviewed International Journal

Thermal Video Analysis for Fire Detection Using Shape Regularity and Intensity Saturation Features

An FPGA Spectrum Sensing Accelerator for Cognitive Radio

by Ray Escoffier September 28, 1987

NFPA Changes

Proceedings of Meetings on Acoustics

Modeling of Ceiling Fan Based on Velocity Measurement for CFD Simulation of Airflow in Large Room

2 A e ( I ( ω t k r)

1 Introduction - The rating level. 3 Subjective or objective analysis? 2 Determination of the adjustment for annoying tones

UWB Stepped-FM Sensor for Home Security

Online Detection of Fire in Video

CRUSH: Cognitive Radio Universal Software Hardware

A Forest Fire Warning Method Based on Fire Dangerous Rating Dan Wang 1, a, Lei Xu 1, b*, Yuanyuan Zhou 1, c, Zhifu Gao 1, d

2018 Voluntary Page and Overlength Article Charges Updated 3/14/18

An FT-NIR Primer. NR800-A-006

Electronic Radiator Valve Acoustic Testing KTP Associate - Shyel Stark

Samsung SDS BMS Ver.2.0. Technical Specification

Measuring Sound Pressure Level with APx

REAL-TIME PROCESSING OF A LONG PERIMETER FIBER OPTIC INTRUSION SYSTEM

Study and Design of Diaphragm Pump Vibration Detection Fault Diagnosis System Based on FFT

ANALYSIS OF OTDR MEASUREMENT DATA WITH WAVELET TRANSFORM. Hüseyin ACAR *

Thermal Performance of Green Roofs: A Parametric Study through Energy Modelling in Different Climates

Sentry G3 Machinery Protection Monitor

Utilization of advanced clutter suppression algorithms for improved standoff detection and identification of radionuclide threats

An Area Efficient 2D Fourier Transform Architecture for FPGA Implementation

AFMG. Design for Speech Intelligibility Using Software Modeling. Stefan Feistel, Bruce C. Olson AHNERT FEISTEL MEDIA GROUP

DSP EC Pro. Advanced, Digital Gamma-Ray Spectrometer for HPGe Detector Systems

Advanced Test Equipment Rentals ATEC (2832)

Benefits of Enhanced Event Analysis in. Mark Miller

CFD Analysis of a 24 Hour Operating Solar Refrigeration Absorption Technology

e+/e- Vertical Beam Dynamics during CHESS Operation-Part II

Video Smoke Detection using Deep Domain Adaptation Enhanced with Synthetic Smoke Images

High Speed Reconfigurable FFT Processor Using Urdhava Thriyambakam

DS-2TD Thermal & Optical Network Bullet Camera

More Precision. thermometer // Non-contact IR temperature sensors

Design and Development of Industrial Pollution Monitoring System using LabVIEW and GSM

Optical Time Domain Reflectometry for the OMEGA EP Laser

ENSC 388: Engineering Thermodynamics and Heat Transfer

ISO INTERNATIONAL STANDARD

Approach to Detecting Forest Fire by Image Processing Captured from IP Cameras 1

Summary of Screen Materials Available

Queen St E & Leslie St Noise Analysis Toronto, Ontario. Toronto Transit Commission Streetcar Department 1900 Yonge Street Toronto, ON M4S 1Z2

Human Position/Height Detection Using Analog Type Pyroelectric Sensors

CHAPTER 7 APPLICATION OF 128 POINT FFT PROCESSOR FOR MIMO OFDM SYSTEMS. Table of Contents

1441. Quantification of the flow noise in household refrigerators

REVIEW ON CONDENSER AIR LEAK TEST

Supporting Information

The Earthquake Monitoring System for Dams of a reservoir in China SUMMARY

Agilent E6000A E6003A Specs Provided by E6000C Mini-OTDR. Technical Data Sheet

TENTH INTERNATIONAL CONGRESS ON SOUND AND VIBRATION, ICSV10 REAL TIME CONTROL OF SOUND PRESSURE AND ENERGY DENSITY IN A MINING VEHICLE CABIN

NEW CD WARP CONTROL SYSTEM FOR THE CORRUGATING INDUSTRY

IR Tracking NERF Sentry Gun. Ryan Alvaro, Emily Dixon, Lauren Klindworth Team 13

Agilent E6012A Specs Provided by E6000C Mini-OTDR. Technical Data Sheet

Automobile Security System Based on Face Recognition Structure Using GSM Network

Sentry G3 Machinery Protection Monitor

Implementing Efficient Split-Radix FFTs in FPGAs

Floating Point Fast Fourier Transform v2.1. User manual

INTERNATIONAL STANDARD

Product data sheet Palas Fidas 200 S

Meindert s HPD315 project

THE combination of the multiple-input multiple-output

1066. A self-adaptive alarm method for tool condition monitoring based on Parzen window estimation

metro B6012 VIBRATIONS SIMPLIFIED... Product brochure Easiest portable vibration analyzer and balancer TECHNOLOGIES PVT. LTD.

Preparation of response measurements for crossover simulation with VituixCAD

VLSI implementation of high speed and high resolution FFT algorithm based on radix 2 for DSP application

A Study on the 2-D Temperature Distribution of the Strip due to Induction Heater

Music Sculptures of Glass and Gold

Oil and Gas Pipeline Monitoring. Oil and Gas Well Monitoring. Power Line Monitoring. Highway Safety Monitoring

Implementation and testing of a model for the calculation of equilibrium between components of a refrigeration installation

How to Use Fire Risk Assessment Tools to Evaluate Performance Based Designs

Transcription:

Speech parameterization using the scale Part II T. Thrasyvoulou and S. Benton

Speech Cepstrum procedure Speech Pre-emphasis Framing Windowing For FFT based Cepstrum FFT Cepstrum OR LPC LPC For LPC based Cepstrum

Speech Cepstrum procedure Speech Pre-emphasis Framing Windowing For FFT based Cepstrum FFT Cepstrum OR LPC LPC For LPC based Cepstrum

Pre-emphasis/Framing/Windowing -2ms frames Pre-emphasis Framing Windowing H(z)=+az - a=-.95 Used to boost the signal spectrum approximately 2dB/decade Hamming window 2 π n.54.46cos( ) N Reduces the edge effect when taking the FFT Improves spectral estimate accuracy

Speech Cepstrum procedure For FFT based Cepstrum Speech Pre-emphasis Framing Windowing FFT Cepstrum

FFT FFT FFT 2 Frame (Frequency domain) Frame (Time domain) FFT of length N

Speech Cepstrum procedure For LPC based Cepstrum Speech Pre-emphasis Framing Windowing LPC LPC Cepstrum

LPC LP coefficients Frame (Time domain) LPC [a i ] LPC LP coefficients [a i ] Frequency Response Hz () = G M + i i= i az 2 Reduces the spectrum variance Eliminates various source information

Speech Cepstrum procedure Speech Pre-emphasis Framing Windowing For FFT based Cepstrum FFT Cepstrum OR LPC LPC For LPC based Cepstrum

FFT or LPC s(k) Filter Bank scale filter bank.9.8.7 Gain.6.5.4 # samples = # filters in filter bank.3.2. 5 5 2 25 3 35 4 Filter bank

Filter bank frequency table M l ( k) l =,,... L k =,,..., N / 2 l = M ( k) l l =,,..., L k =,,..., N / 2 4KHz Hz l =8 33Hz 4KHz center frequency

Filter bank 2 N/2 points 2 L=2 points L<N Multiply the power spectrum with each of the triangular weighting filters and add the result Perform a weighted averaging procedure around the frequency, similar to a weighted Daniel periodogram

Filter bank Half the FFT size Original Total number of triangular weighing filters (2) N /2 Sl %( ) = SkM ( ) l ( k) l=,,..., L k = Will get the whole range of frequencies but only L samples k th l kf > s N Filter from filter bank Hz

spectrum plots FFT-based Filter Bank LPC-based Filter Bank

spectrum plots Amplitude Amplitude 2.5 2.5.5 3 x -4 FFT based 2 3 4 8 6 4 2 LPC based Normalized Amplitude Normalized Amplitude.8.6.4.2 -FFT based 2 3 4.8.6.4.2 -LPC based BW of the peak expands because of the loss of resolution in scale Peaks that are not located exactly at the center frequency of their corresponding filter may move to center on the frequency e.g. Hz peak Hz 2 3 4 2 3 4

spectrum plots Amplitude 5 x -5 FFT based 4 3 2 Normalized Amplitude.8.6.4.2 -FFT based Loss of resolution Maintains prominent peaks 2 3 4 2 3 4 6 LPC based -LPC based Amplitude 5 4 3 2 Normalized Amplitude.8.6.4.2 2 3 4 2 3 4

spectrum plots Amplitude.25.2.5..5 FFT based Normalized Amplitude.8.6.4.2 -FFT based Small peaks close to large ones are lumped into one main peak 2 3 4 2 3 4 2 LPC based -LPC based Amplitude 5 5 Normalized Amplitude.8.6.4.2 2 3 4 2 3 4

Speech Cepstrum procedure Speech Pre-emphasis Framing Windowing For FFT based Cepstrum FFT Cepstrum OR LPC LPC For LPC based Cepstrum

Speech Cepstrum parameterization Cepstrum Total number of triangular weighing filters (2) Log() DCT DCT Equation: L 2 πi ci ( ) = log( Sm % ( ))cos ( m.5) i=,,... C L m= L # cepstral coefficients desired

Scaled Cepstral Coefficients Scaled Cepstral Coeff.5 FFT based LP C ba s e d Normalized Amplitude -.5-2 4 6 8 2 4 Coefficient number

Scaled Cepstral Coefficients Scaled Cepstral Coeff.5 FFT based LP C ba s e d Normalized Amplitude -.5-2 4 6 8 2 4 Coefficient number

Why the DCT? The signal is real with mirror symmetry The IFFT requires complex arithmetic The DCT does NOT The DCT implements the same function as the FFT more efficiently by taking advantage of the redundancy in a real signal. The DCT is more efficient computationally

Conclusions LPC approximates speech linearly at all frequencies LPCC are more robust and reliable than LPC alone -scaled LPCC and -scaled FFT-CC are robust and also take into account the psychoacoustic properties of the human auditory system

References [] Wong, E. and Sridharan, S. Comparison of linear prediction cepstrum coefficients and -frequency cepstrum coefficients for language identification, Intelligent Multimedia, Video and Speech Processing Proceedings, pp. 95-98, 2. [2] Molau, S., Pitz, M., Schluter, R. and Ney, H., Computing -frequency cepstral coefficients on the power spectrum, Acoustics, Speech, and Signal Processing Proceedings, Volume:, pp. 73-76, 2. [3] De Lima Araujo, A.M. and Violaro, F., Formant frequency estimation using a -scale LPC algorithm, ITS '98 Proceedings, Volume:, pp. 27-22, 998. [4] Picone, J.W., Signal modeling techniques in speech recognition, Proceedings of the IEEE, Volume: 8, Issue: 9, pp. 25-247 Sep 993. [5] Umesh, S., Cohen, L. and Nelson, D., Frequency warping and the scale, IEEE Signal Processing Letters, Volume: 9, Issue: 3, pp. 4-7, 22. [6] http://www-2.cs.cmu.edu/~mseltzer/sphinxman/