Raw Source and Filter Modelling for Dysarthric Speech Recognition

Research Output

Acoustic modelling for automatic dysarthric speech recognition (ADSR) is a challenging task. Data deficiency is a major problem and substantial differences between the typical and dysarthric speech complicates transfer learning. In this paper, we build acoustic models using the raw magnitude spectra of the source and filter components. The proposed multi-stream model consists of convolutional and recurrent layers. It allows for fusing the vocal tract and excitation components at different levels of abstraction and after per-stream pre-processing. We show that such a multi-stream processing leverages these two information streams and helps s model towards normalising the speaker attributes and speaking style. This potentially leads to better handling of the dysarthric speech with a large inter-speaker and intra-speaker variability. We compare the proposed system with various features, study the training dynamics, explore usefulness of the data augmentation and provide interpretation for the learned convolutional filters. On the widely used TORGO dysarthric speech corpus, the proposed approach results in up to 1.7% absolute WER reduction for dysarthric speech compared with the MFCC base-line. Our best model reaches up to 40.6% and 11.8% WER for dysarthric and typical speech, respectively.

Date:

27 April 2022
Publication Status:

Published
Publisher

IEEE
DOI:

10.1109/icassp43922.2022.9746553
Funders:

Engineering and Physical Sciences Research Council

http://researchrepository.napier.ac.uk/output/3585825 <p>Yue, Z., Loweimi, E., & Cvetkovic, Z. (2022). Raw Source and Filter Modelling for Dysarthric Speech Recognition. In <i>ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>. https://doi.org/10.1109/icassp43922.2022.9746553</p>

Citation

Yue, Z., Loweimi, E., & Cvetkovic, Z. (2022). Raw Source and Filter Modelling for Dysarthric Speech Recognition. In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://doi.org/10.1109/icassp43922.2022.9746553

Authors

Dr Erfan Loweimi

School of Computing Engineering and the Built Environment

Keywords

Dysarthric speech recognition, source-filter separation and fusion, multi-stream acoustic modelling

Monthly Views:

Available Documents

Files currently unavailable for download , please contact E.Loweimi@napier.ac.uk to request a copy
Downloadable citations
HTML BIB RTF

Date:

Publication Status:

Publisher

DOI:

Funders:

Citation

Authors

Dr Erfan Loweimi

Keywords

Monthly Views:

Files currently unavailable for download , please contact E.Loweimi@napier.ac.uk to request a copy

Downloadable citations