Research Output
Iterative Speech Enhancement with Transformers
  Enhancing audio quality in audio-video speech enhancement (AVSE) is a crucial step in improving the performance of speech recognition systems, particularly by integrating visual and auditory data to create more robust and accurate models. This study addresses the challenge of speech enhancement in audio-only settings, which can be a preliminary stage for AVSE applications. The primary goal is to refine the clarity of speech in noisy environments, especially where multiple speakers are present, thereby laying a foundation for more advanced multimodal systems. In our approach, we iteratively input the output of the SepFormer back into the model across several cycles. This iterative process has led to improvements in speech quality, as shown by mean opinion scores (MOS), a standard metric for evaluating the perceptual quality of speech. By applying iterative enhancement, we observed a substantial improvement in speech clarity, with MOS reaching a maximum after five enhancement cycles.

  • Date:

    01 September 2024

  • Publication Status:

    Published

  • Publisher

    ISCA

  • DOI:

    10.21437/avsec.2024-14

  • Funders:

    Engineering and Physical Sciences Research Council

Citation

Nazemi, A., Sami, A., Sami, M., & Hussain, A. (2024, September). Iterative Speech Enhancement with Transformers. Presented at 3rd COG-MHEAR Workshop on Audio-Visual Speech Enhancement (AVSEC), Kos, Greece

Authors

Keywords

Speech enhancement, Transformors, SpeechBrain, Iterative Transformer

Monthly Views:

Linked Projects

Available Documents