Research Output
A Framework for Speech Enhancement based on Audio Signal and Speaker Embeddings
  This study addresses the challenge of speech enhancement within an audio-only context. Our proposed framework extracts speaker embeddings and voice signals, subsequently integrating these components to synthesise a voice based on the extracted data. Despite the preliminary nature of our work based on utilisation of predefined male and female embeddings, significant improvements were observed. The quality of the generated voice was enhanced by an increase in the Mean Opinion Score (MOS). This improvement shows the effectiveness of our approach in enhancing speech clarity and naturalness, even without customised speaker-specific embeddings. The current study lays the groundwork for future research aimed at integrating unique speaker embeddings to further refine voice generation. Although technical challenges prevented the incorporation of individualised embeddings in this phase, ongoing developments are expected to address these issues.

  • Date:

    01 September 2024

  • Publication Status:

    Published

  • Publisher

    ISCA

  • DOI:

    10.21437/avsec.2024-13

  • Funders:

    Engineering and Physical Sciences Research Council

Citation

Nazemi, A., Sami, A., Sami, M., & Hussain, A. (2024, September). A Framework for Speech Enhancement based on Audio Signal and Speaker Embeddings. Presented at 3rd COG-MHEAR Workshop on Audio-Visual Speech Enhancement (AVSEC), Kos Island, Greece

Authors

Keywords

Speech enhancement, Transformors, SpeechBrain, Speaker Embeddings, Audio Signal

Monthly Views:

Available Documents