towards multilingual audio visual speech enhancement in real noisy environments

Towards multilingual audio-visual speech enhancement in real noisy environments

Speech enhancement aims to improve the overall quality and intelligibility of speech degraded by noise sources in real-world noisy environments. In recent years, researchers have proposed audio-visual speech enhancement models that go beyond traditional audio-only processing to provide better noise suppression and speech restoration in low SNR environments where multiple competing background noise sources are present. However, the audio-visual speech enhancement methods are language dependent as they exploit the correlations between visemes and the uttered speech. In addition, it has been shown that speaker pose variation significantly degrades the performance of these models.
This project aims to address the aforementioned two critical challenges with the current audio-visual speech enhancement models. The following research objectives will contribute to this development.

1. To design a novel multilingual audio-visual (AV) speech enhancement framework exploiting advanced machine learning techniques to address
2. To develop a novel multiview AV speech enhancement framework exploiting image translation and pose-invariant features.
3. Finally, we will integrate the two frameworks and critically evaluate the robustness and generalisation of the framework in a range of real world environments (e.g. cafeteria and restaurant) and use cases (e.g. car).

Start Date:

17 February 2023
End Date:

31 August 2025
Activity Type:

Externally Funded Research
Funder:

Royal Society
Value:

£12000