COMP401, Project 1: Towards simulating listening behaviour

Project Outline

Human body language is a vital part of face-to-face communication, but it is highly complex and not well understood. In conversation we both perform and interpret a large range of complex behaviour, such as patterns of eye gaze, gestures, posture changes and facial expression, and we do so largely subconsciously, without a full conscious understanding of their meaning. This makes it difficult to create avatars and animated characters with realistic body language. For example, Vilhjálmsson and Cassell (1998) argue that users can't create realistic body language for avatars in virtual worlds because they don't consciously know the right thing to do. This means that avatars don't take a useful part in conversation. This project will aim to take a step towards helping with this problem.

It will look particularly at listening behaviour, also called backchannel feedback. They behaviours that listeners perform to support, encourage and give feedback to some one who is speaking, for example head nodding, looking at the speaker and occasional utterances (e.g. "uh-huh"). This behaviour is highly responsive to what the speaker is doing and can only be meaningful and appropriate if it responds to the speaker in the right way. So an important part of simulating this behaviour is to detect relevant behaviour of a speaker. The project will be to do this detection using the method proposed by Maatman et al. (2005). You will use the Motion Capture laboratory at Goldsmiths. The first step will be to capture some appropriate test data that you will then use from home. You will write a programme that can receive this data using the OSC network protocol, and then automatically detect important features using the algorithms proposed by Maatman et al. (2005). You will then test the end result live in the motion capture lab. If all goes well, a final step would be to send the resulting data to an existing piece of software that will animate a character. If the project it is successful the results will be used in further research into how to best animated listener behaviour.

If you are interested, you may optionally investigate some further work in this area. Gratch's group has done some further work in this area. Gratch et al. (2007) did a user test with an improved version of the algorithm and Morency et al. (2008) produced a new probabilistic algorithm. Both use work proposed by Ward and Tsukahara (2000). You may decide to use any of these papers in your project, but reading and using them is not required.


Jonathan Gratch, Ning Wang, Jillian Gerten, Edward Fast, and Robin Duffy. Creating rapport with virtual agents. In Catherine Pelachaud, Jean-Claude Martin, Elisabeth Andre ́, Ge ́rard Chollet, Kostas Karpouzis, and Danielle Pele ́, editors, Intelligent Virtual Agents, 7th International Conference, IVA 2007, Paris, France, September 17-19, 2007, Proceedings, Lecture Notes in Computer Science, pages 125-138. Springer, 2007.

R. M. Maatman, J. Gratch, and S. Marsella. Natural behavior of a listening agent. In T. Panayiotopoulos, J. Gratch, R. Aylett, D. Ballin, P. Olivier, and T. Rist, editors, Intelligent Virtual Agents, 5th International Working Conference, Kos, Greece, September 2005.

Louis-Philippe Morency, Iwan de Kok, and Jonathan Gratch. Predicting listener backchannels: A 1 probabilistic multimodal approach. In Intelligent Virtual Agents, 8th International Conference, IVA 2008, Tokyo, Japan, September 1-3, 2008. Proceedings, Lecture Notes in Computer Science, pages 176-190. Springer, 2008.

H. H. Vilhjálmsson and J. Cassell. Bodychat: Autonomous communicative behaviors in avatars. In second ACM international conference on autonomous agents, 1998. Nigel Ward and Wataru Tsukahara. Prosodic features which cue back-channel responses in english and japanese. Journal of Pragmatics, 32(8):1177-1207, 2000.