Body Language Interaction with Virtual Humans

This is a video of a talk I have at Queen Mary, University of London to the Cognitive Science Research Group.

This talk describes a number of research projects aimed at creating natural non-verbal communications between real users of Virtual Reality and animated virtual characters. It will describe how relatively simple state machine models can be highly effective in creating compelling interactive character, including work with Xueni Pan on the effect of interaction with virtual characters. However, I will also describe how these methods inevitably loose the nuances of embodied human behaviour. I will then describe alternative methods using interactive machine learning to enable people to design  character’s behaviour without coding and a number of future directions.

Social Interaction, Emotion and Body Language in VR: Lessons Learnt from 15 Years of Research

Pan Xueni, Harry Brenton and I will be giving a talk at this year’s DevelopVR conference in London on the 1st December. The talk is called “Social Interaction, Emotion and Body Language in VR: Lessons Learnt from 15 Years of Research“.

Based on our 15 years of VR research, we explain how to create characters whose behaviour feels real and who respond to players. This gives the player a strong illusion of being “together” with another “person” which is not possible without VR.


Embodied Design of Full Bodied Interaction

The second paper I presented at MOCO this year was called Embodied Design of Full Bodied Interaction with virtual humans. It is (probably) my paper from my EPSRC grant “Performance Driven Expressive Virtual Characters” and it was my chance to talk about some of the stuff that I thought was interesting in the grant (but I maybe can’t prove).

Here is an extract that explains some of the ideas:

Non-verbal communication is a vital part of our social interactions. While the often quoted estimate that only seven percent of human communication is verbal is contested, it is clear that a large part of people’s communication with each other is through gestures, postures, and movements. This is very different from the way that we traditionally communicate with machines. Creating computer systems capable of this type of non-verbal interaction is therefore an important challenge. Interpreting and animating body language is challenging for a number of reasons, but particularly because it is something we do subconsciously and we are often not aware of what exactly we are doing and would not be able to describe it later. Experts in body language (the people we would like to design the game) are not computer scientists but professionals such as actors and choreographers. Their knowledge of body language is embodied: they understand it by physically doing it and often find it hard to explicitly describe it in words (see Kirsh for a discussion of embodied cognition in the area of dance). This makes it very hard for people to translate it into the explicit, symbolic form needed for computer programming.

The last few years have seen introduction of new forms of user interface device such as the Nintendo WiiMote, the Microsoft Kinect and the Sony Move go beyond the keyboard and mouse and use body movements as a means of interacting with technology. These devices promise many innovations, but maybe the most profound and exciting was one that appeared as a much hyped demo prior to the release of the Microsoft Kinect. The Milo demo showed a computer animated boy interacting with a real woman, replying to her speech and responding to her body language. This example shows the enormous potential for forms of interaction that make use of our natural body movements, including our subconscious body language. However, this demo was never released to the public, showing the important challenges that still remain. While sensing technology and Natural Language Processing have developed considerably in the 5 years since this demo there are still major challenges in simulating the nuances of social interaction, and body language in particular. This is very complex work that combines Social Signal Processing with computer animation of body language. Perhaps the greatest challenge is that body language is a tacit skill \cite{Polanyi1966} in the sense we are able to do it without being able to explicitly say what we are doing or how we are doing it; and it is a form of embodied (social) cognition  in which our body and environment play a fundamental role in our process of thought. The physicality of movement and the environment is an integral part of cognition and so a movement-based interaction is best understood through embodied movement. Kirsh therefore argues that the next generation of interaction techniques should take account of this embodiment, part of a larger trend towards embodiment in interaction design. This raises an important challenge for designing computational systems because they traditionally must be programmed with explicit rules that are abstract and disembodied (in the sense that body movement is not an innate part of their creation). The problem of representing the embodied, tacit skills of body language and social interaction requires us to develop computational techniques that are very different from the explicit and abstract representations used in computer programming.

In Fiebrink’s evaluation of the Wekinator, a system for designing new gestural musical instruments one of the participants commented: “With [the Wekinator], it’s
possible to create physical sound spaces where the connections between body and
sound are the driving force behind the instrument design, and they feel right. … it’s very difficult to create instruments that feel embodied with explicit mapping strategies, while the whole approach
of [the Wekinator] … is precisely to create instruments that feel embodied.” This shows that the wekinator uses a new approach to design gestural interfaces that not only makes it easier to design but changes the way people think about designing, from a explicit focus on features of the movement (e.g. shoulder rotation) to a holistic, embodied view of movement. This approach is called Interactive Machine Learning (IML): the use of machine learning algorithms to design by interactively providing examples of interaction. This “embodied” form of design taps into our natural human understanding of movement which is itself embodied and implicit. We are able to move and recognize movement effectively but less able to analyze it into components. IML allows designers to design by moving rather than by analyzing movement.

This paper presents a first attempt at applying Fiebrink’s method to full body interaction with animated virtual characters, allowing an embodied form of designing by doing as suggested by Kleinsmith et al.  We call this approach Interactive Performance Capture. Performance capture is the process of recording actors’ performances for mapping into a 3D animation. This is able to bring the nuance of the performance to the animation, but it works for static animations, not interactive systems. We use interactive machine learning as a way of capturing the interactions between two performers, as well as their movements.

Here is the reference and link to the full paper:

Embodied Design of Full Bodied Interaction with virtual humans

Gillies, Marco , Brenton, Harry and Kleinsmith, Andrea. 2015. ‘Embodied Design of Full Bodied Interaction with virtual humans’. In: 2nd International Conference on Movement and Computing. Vancouver, Canada.

Gestural Archeology

I’ve said my final goodbyes to Pisa and the Scuola Normale.

I thought I would do a sneak peak of the work we have done there. I’ve been working on using Baptiste Caramiaux‘s fantastic Gesture Variation Follower in the CAVE immersive environment at DreamsLab (with the help of Niccolò Albertini and Andrea Brogni). Our first test was working with Archaeologists Riccardo Olivito and Emanuele Taccola.  I will say more when the work is finished and published, but we were building on their previous work shown in the video below.

Conceptual models in Interactive Machine Learning

At the end of March I will be going to IUI 2015 to present my paper Applying the CASSM Framework to Improving End User Debugging of Interactive Machine Learning.

This talks about some work I’ve done applying Ann Blandford’s framework for analysing software Concept-based Analysis of Surface and Structural Misfits: CASSM to Interactive Machine Learning. It is a framework that looks at user concepts and how they relate to concepts present in the software. A really interesting element is that is separates concepts in the interface from concepts in the system, so concepts that are central in the underlying algorithm can be missing from the interface and concepts in the interface might not be well represented in the functioning of the system. This lead me to the idea that for interactive machine learning the learning algorithms used should be well aligned to the users concepts of the situation and they should also be well represented visually in the interface. This should make the system easier to use and in particular easier to debug when it goes wrong (because debugging requires a good conceptual model of the system). In order to do this I suggested a nearest neighbour learning algorithm would be well suited to a learning system for full body interaction because users thought in terms of whole poses, not individual features (which are common concepts in other learning algorithms) and it works with the original training data, which users understand well. It also lead us to develop the visualisation you can see in the image above.

If you are interested, here is the abstract and full reference.

This paper presents an application of the CASSM (Concept-based Analysis of Surface and Structural Misfits) framework to interactive machine learning for a bodily interaction domain. We developed software to enable end users to design full body interaction games involving interaction with a virtual character. The software used a machine learning algorithm to classify postures as based on examples provided by users. A longitudinal study showed that training the algorithm was straightforward, but that debugging errors was very challenging. A CASSM analysis showed that there were fundamental mismatches between the users concepts and the working of the learning system. This resulted in a new design in which both the learning algorithm and user interface were better aligned with users’ concepts. This work provides and example of how HCI methods can be applied to machine learning in order to improve its usability and provide new insights into its use.

Applying the CASSM Framework to Improving End User Debugging of Interactive Machine Learning

Gillies, Marco , Kleinsmith, Andrea and Brenton, Harry . 2015. ‘Applying the CASSM Framework to Improving End User Debugging of Interactive Machine Learning’. In: ACM Intelligent User Interfaces (IUI). Atlanta, United States.

Virtual character personality influences participant attitudes and behavior

Xueni Pan, Mel Slater and I have just published a new paper in Frontiers: Virtual Environments. It looks at the effect of virtual character personality on participants behaviour responses in immersive virtual reality and uses my Piavca animation framework.

From the paper abstract:

We introduce a novel technique for the study of human–virtual character interaction in immersive virtual reality. The human participants verbally administered a standard questionnaire about social anxiety to a virtual female character, which responded to each question through speech and body movements. The purpose was to study the extent to which participants responded differently to characters that exhibited different personalities, even though the verbal content of their answers was always the same. A separate online study provided evidence that our intention to create two different personality types had been successful. In the main between-groups experiment that utilized a Cave system there were 24 male participants, where 12 interacted with a female virtual character portrayed to exhibit shyness and the remaining 12 with an identical but more confident virtual character. Our results indicate that although the content of the verbal responses of both virtual characters was the same, participants showed different subjective and behavioral responses to the two different personalities. In particular participants evaluated the shy character more positively, for example, expressing willingness to spend more time with her. Participants evaluated the confident character more negatively and waited for a significantly longer time to call her back after she had left the scene in order to answer a telephone call. The method whereby participants interviewed the virtual character allowed naturalistic conversation while avoiding the necessity of speech processing and generation, and natural language understanding. It is therefore a useful method for the study of the impact of virtual character personality on participant responses.

The full paper is available online here:

“Social” is the future of VR

There were many interesting things said at the Oculus Connect conference a few weeks ago (which I only saw remotely through the live stream), but the one that caught me was that both Michael Abrash and John Carmack said that “Social” was the future of VR:

Social, of course, can mean many things these days, and these statements are probably at least partially motivated  by Facebook’s involvement in Oculus. However, as some one who has spent years working on simulating social interactions in virtual reality, I know that this is a really exciting area and it is great to have industry leaders like Carmack and Abrash backing this up.

Very few people have actually experienced a face-to-face encounter with a life size virtual human in virtual reality, but those, like me, who have know that it is one of the most compelling experiences that virtual reality has to offer. If you get it right and the character’s body language responds to you (making eye contact, responding when you move closer), then it creates a sense of social connection which is quite unlike anything that is possible on a screen.

It’s not something I have worked on for quite a while, but with the release of the Oculus Rift I’ve been bullied into revisiting some of my own work. I think it will be far more compelling now just because of the massive improvements in VR technology. Exciting times.

In the mean time, here are a couple of papers I published (with Xueni Pan and Mel Slater) some years ago in that area: