Research Output
Generating Unambiguous and Diverse Referring Expressions
  Neural Referring Expression Generation (REG) models have shown promising results in generating expressions which uniquely describe visual objects. However, current REG models still lack the ability to produce diverse and unambiguous referring expressions (REs). To address the lack of diversity, we propose generating a set of diverse REs, rather than one-shot REs. To reduce the ambiguity of referring expressions, we directly optimise non-differentiable test metrics using reinforcement learning (RL), and we show that our approaches achieve better results under multiple different settings. Specifically, we initially present a novel RL approach to REG training, which instead of drawing one sample per input, it averages over multiple samples to normalize the reward during RL training. Secondly, we present an innovative REG model that utilizes an object attention mechanism that explicitly incorporates information about the target object and is optimised using our proposed RL approach. Thirdly, we propose a novel transformer model optimised with RL that exploits different levels of visual information. Our human evaluation demonstrates the effectiveness of this model, where we improve the state-of-the-art results in RefCOCO testA and testB in terms of task success from to and from to respectively. While in RefCOCO+ testA we show improvements from to . Finally, we present a thorough comparison of diverse decoding strategies (sampling and maximisation-based) and how they control the trade-off between the quality and diversity.

  • Type:

    Article

  • Date:

    31 December 2020

  • Publication Status:

    Published

  • DOI:

    10.1016/j.csl.2020.101184

  • ISSN:

    0885-2308

  • Funders:

    EPSRC Engineering and Physical Sciences Research Council

Citation

Panagiaris, N., Hart, E., & Gkatzia, D. (2021). Generating Unambiguous and Diverse Referring Expressions  . Computer Speech and Language, 68, Article 101184. https://doi.org/10.1016/j.csl.2020.101184

Authors

Keywords

Referring Expression Generation, Natural Language Generation, Neural Models

Monthly Views:

Linked Projects

Available Documents