Referring Expression Generation of Objects in Real World Cluttered Scenes - School of Computing Seminar Series

A well studied subtask of Natural Language Generation (NLG) is Referring Expression Generation (REG) that studies the automatic generation of a description of an entity in a way that allows the hearer to successful identify the entity in a context [21]. The REG
algorithms generate Referring Expressions (REs) by replicating the examples found in a corpus in order to produce human-like or naturalistic expressions. However, the produced expressions are often rather rigid with the same style of text and content which is not appealing to people. In order to overcome these limitations, I propose an alternative framework that focuses on the variability of the produced RE, a fundamental prerequisite for naturalness to be achieved. The proposed framework is inspired by the recent success of Conditional Generative Adversarial Networks (CGANs). The intuition behind the use of that model is the ”guided” generation process which conditions the model to additional information (e.g. regions of images and text) leading to diverse, natural and informative REs. In addition, the second part of this doctoral project will focus on the comprehension of the generated referring expressions.