Evaluating human-machine conversation for appropriateness.

Research Output

Evaluation of complex, collaborative dialogue systems is a difficult task.Traditionally, developers have relied upon subjective feedback from the user,and parametrisation over observable metrics. However, both models place somereliance on the notion of a task; that is, the system is helping to userachieve some clearly defined goal, such as book a flight or complete a bankingtransaction. It is not clear that such metrics are as useful when dealing witha system that has a more complex task, or even no definable task at all, beyondmaintain and performing a collaborative dialogue. Working within the EU fundedCOMPANIONS program, we investigate the use of appropriateness as a measure ofconversation quality, the hypothesis being that good companions need to be goodconversational partners . We report initial work in the direction of annotatingdialogue for indicators of good conversation, including the annotation andcomparison of the output of two generations of the same dialogue system

Date:

31 May 2010
Publication Status:

Published
Publisher

European Language Resources Association (ELRA)
Library of Congress:

QA76 Computer software
Dewey Decimal Classification:

004 Data processing & computer science

http://researchrepository.napier.ac.uk/output/209131 <p>Webb, N., Benyon, D., Hansen, P., & Mival, O. (2010). Evaluating human-machine conversation for appropriateness. In <i>Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)</i>, 84-91</p>

Citation

Webb, N., Benyon, D., Hansen, P., & Mival, O. (2010). Evaluating human-machine conversation for appropriateness. In Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10), 84-91