CiViL: Common-sense- and Visual-enhanced natural Language generation

  One of the most compelling problems in Artificial Intelligence is to create computational agents capable of interacting in
real-world environments using natural language. Computational agents such as robots can offer multiple benefits to
society, for instance, they can be used to look after the ageing population, act as companions, can be used for skills
training or even provide assistance in public spaces. These are extremely challenging tasks due to their complex
interdisciplinary nature, which spans across several fields including Natural Language Generation, engineering, computer
vision, and robotics.
Communication through language is the most vital and natural way of interaction. Humans are able to effectively
communicate with each other using natural language, utilising common-sense knowledge and by making inferences about
other people's backgrounds based on previous interactions with them. At the same time, they can successfully describe
their surroundings, even when encountering unknown entities and object. For decades, researchers have tried to recreate
the way humans communicate through natural language and although there are major breakthroughs during recent years
(such as Apple's Siri or Amazon's Alexa), Natural Language Generation systems still lack the ability to reason, exploit
common-sense knowledge, and utilise multi-modal information from a variety of sources such as knowledge bases,
images, and videos.
This project aims to develop a framework for common-sense- and visually- enhanced Natural Language Generation that
can enable natural real-time communication between humans and artificial agents such as robots to enable effective
collaboration between humans and robots. Human-Robot Interaction poses additional challenges to Natural Language
Generation due to uncertainty derived from the dynamic environments and the non-deterministic fashion of interaction. For
instance, the viewpoint of a situated robot will change when the robot moves and hence its representation of the world, which will result in failure of current state-of-art methods, which are not able to adapt to changing environments. The
project aims to investigate methods for linking various modalities, taking into account their dynamic nature. To achieve
natural, efficient and intuitive communication capabilities, agents will also need to acquire human-like abilities in
synthesising knowledge and expression. The conditions under which external knowledge bases (such as Wikipedia) can be
used to enhance natural language generation still have to be explored as well as whether existing knowledge bases are
useful for language generation.
The novel ways to integrate multi-modal data for language generation will lead to more robust and efficient interactions and
will have an impact on natural language generation, social robotics, computer vision, and related fields. This might, in turn,
spawn entirely novel applications, such as explaining exact procedures for e-health treatments and enhance tutoring
systems for educational purposes.

  • Start Date:

    1 April 2020

  • End Date:

    31 March 2023

  • Activity Type:

    Externally Funded Research

  • Funder:

    Engineering and Physical Sciences Research Council

  • Value:

    £280060

Project Team