Research Output
Deep Neural Network Driven Binaural Audio Visual Speech Separation
  The central auditory pathway exploits the auditory signals and visual information sent by both ears and eyes to segregate speech from multiple competing noise sources and help disambiguate phonological ambiguity. In this study, inspired from this unique human ability, we present a deep neural network (DNN) that ingest the binaural sounds received at the two ears as well as the visual frames to selectively suppress the competing noise sources individually at both ears. The model exploits the noisy binaural cues and noise robust visual cues to improve speech intelligibility. The comparative simulation results in terms of objective metrics such as PESQ, STOI, SI-SDR and DBSTOI demonstrate significant performance improvement of the proposed audio-visual (AV) DNN as compared to the audio-only (A-only) variant of the proposed model. Finally, subjective listening tests with the real noisy AV ASPIRE corpus shows the superiority of the proposed AV DNN as compared to state-of-the-art approaches.

  • Date:

    28 September 2020

  • Publication Status:


  • Publisher


  • DOI:


  • Cross Ref:


  • Funders:

    Edinburgh Napier Funded


Gogate, M., Dashtipour, K., Bell, P., & Hussain, A. (2020). Deep Neural Network Driven Binaural Audio Visual Speech Separation. In 2020 International Joint Conference on Neural Networks (IJCNN).


Monthly Views:

Available Documents