A Framework for Speech Enhancement based on Audio Signal and Speaker Embeddings

Research Output

This study addresses the challenge of speech enhancement within an audio-only context. Our proposed framework extracts speaker embeddings and voice signals, subsequently integrating these components to synthesise a voice based on the extracted data. Despite the preliminary nature of our work based on utilisation of predefined male and female embeddings, significant improvements were observed. The quality of the generated voice was enhanced by an increase in the Mean Opinion Score (MOS). This improvement shows the effectiveness of our approach in enhancing speech clarity and naturalness, even without customised speaker-specific embeddings. The current study lays the groundwork for future research aimed at integrating unique speaker embeddings to further refine voice generation. Although technical challenges prevented the incorporation of individualised embeddings in this phase, ongoing developments are expected to address these issues.

Date:

01 September 2024
Publication Status:

Published
Publisher

ISCA
DOI:

10.21437/avsec.2024-13
Funders:

Engineering and Physical Sciences Research Council

http://researchrepository.napier.ac.uk/output/4289589 <p>Nazemi, A., Sami, A., Sami, M., & Hussain, A. (2024, September). <i>A Framework for Speech Enhancement based on Audio Signal and Speaker Embeddings</i>. Presented at 3rd COG-MHEAR Workshop on Audio-Visual Speech Enhancement (AVSEC), Kos Island, Greece</p>

Citation

Nazemi, A., Sami, A., Sami, M., & Hussain, A. (2024, September). A Framework for Speech Enhancement based on Audio Signal and Speaker Embeddings. Presented at 3rd COG-MHEAR Workshop on Audio-Visual Speech Enhancement (AVSEC), Kos Island, Greece