Body Language Interaction with Virtual Humans

This is a video of a talk I have at Queen Mary, University of London to the Cognitive Science Research Group.

This talk describes a number of research projects aimed at creating natural non-verbal communications between real users of Virtual Reality and animated virtual characters. It will describe how relatively simple state machine models can be highly effective in creating compelling interactive character, including work with Xueni Pan on the effect of interaction with virtual characters. However, I will also describe how these methods inevitably loose the nuances of embodied human behaviour. I will then describe alternative methods using interactive machine learning to enable people to design  character’s behaviour without coding and a number of future directions.

Human-Centred Machine Learning

We are currently asking for submissions to a workshop on Human-Centred Machine Learning at CHI 2016. The workshop aims to bring together people working on the HCI of machine learning, an emerging filed.

If you are interested in finding out more about Human Centred Machine Learning, here is an extract from our proposal:

Statistical machine learning is one of the most successful areas of computer science research in recent decades. It has driven advances in domains from medical and scientific research to the arts.  It provides people the ability to create new systems based on example data, for instance creating a face recognition system from a large dataset of face images, rather than by reasoning about what features make something a face and translating that reasoning into program code. This makes it possible to provide excellent performance on tasks for which it would be very difficult, if not impossible, to describe computational procedures explicitly in code.

In practice, however, machine learning is still a difficult technology to use, requiring an understanding of complex algorithms and working processes, as well as software tools which may have steep learning curves. Patel et al.  studied expert programmers working with machine learning and identified a number of difficulties, including treating methods as a “black box” and difficulty interpreting results. Usability challenges inherent in both existing software tools and the learning algorithms themselves (e.g., algorithms may lack  a human-understandable means for communicating how decisions are made) restrict who can use machine learning and how. A human-centered approach to machine learning that rethinks algorithms and interfaces to algorithms in terms of human goals, contexts, and ways of working can make machine learning more useful and usable.

Past work also demonstrates ways in which a human-centered perspective leads to new approaches to evaluating, analysing, and understanding machine learning methods (Amershi 2014). For instance, Fiebrink showed that users building gestural control and analysis systems use a range of evaluation criteria when testing trained models, such as decision boundary shape and subjective judgements of misclassification cost. Conventional model evaluation metrics focusing on generalisation accuracy may not capture such criteria, which means that computationally comparing alternative models (e.g., using cross-validation) may be insufficient to identify a suitable model. Users may therefore instead rely on tight action-feedback loops in which they modify model behavior by changing the training data, followed by real-time experimentation with models to evaluate them and inform further modifications. Users may also develop strategies for creating training sets that efficiently guide model behaviour using very few examples (e.g., placing training examples near desired decision boundaries), which results in training sets that may break common theoretical assumptions about data (e.g., that examples are independent and identically distributed). Summarizing related work in a variety of application domains, Amershi et al. enumerate several properties of machine learning systems that can be beneficial to users, such as enabling users to critique learner output, providing information beyond mere example labels, and receiving information about the learner that helped them understand it as more than a “black box.” These criteria are not typically considered when formulating or evaluating learning algorithms in machine learning research.

I’ve also included a full reference list at the bottom of the post. If you are interested here is the Call for Papers and you can find the full proposal here.

Saleema Amershi, Maya Cakmak, W. Bradley Knox, and Todd Kulesza. 2014. Power to the people: The role of human sininteractive machine learning. AI Magazine 35, 4 (2014), 105–120.

Saleema Amershi, James Fogarty, and Daniel S. Weld. 2012. Regroup: Interactive machine learning for on- demand group creation in social networks. In Pro- ceedings of the SIGCHI Conference on Human Fac- tors in Computing Systems (CHI ’12). 21–30. DOI:

Bill Buxton. 2007. Sketching user experiences: Getting the design right and the right design. Morgan Kauf- mann Publishers Inc., San Francisco, CA, USA.

Steven P. Dow, Alana Glassco, Jonathan Kass, Melissa Schwarz, Daniel L. Schwartz, and Scott R. Klemmer. 2010. Parallel prototyping leads to bet- ter design results, more divergence, and increased self-efficacy. ACM Transactions on Computer- Human Interaction 17, 4 (Dec. 2010), 1–24. DOI:

Jerry Alan Fails and Dan R. Olsen Jr. 2003. Interactive machine learning. In Proceedings of the International Conference on Intelligent User Interfaces (IUI ’03). 39– 45. DOI:

Rebecca Fiebrink. 2011. Real-time human interaction with supervised learning algorithms for music compo- sition and performance. Ph.D. Dissertation. Princeton University, Princeton, NJ, USA.

Andrea Kleinsmith and Marco Gillies. 2013. Customizing by doing for responsive video game characters. International Journal of Human-Computer Studies 71, 7–8 (2013), 775–784. DOI: ijhcs.2013.03.005

Todd Kulesza, Saleema Amershi, Rich Caruana, Danyel Fisher, and Denis Charles. 2014. Structured labeling for facilitating concept evolution in machine learning. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’14). 3075–3084. DOI: 2557238

Kayur Patel, James Fogarty, James A. Landay, and Beverly Harrison. 2008. Investigating statistical ma- chine learning as a tool for software development. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’08). 667–676. DOI:

Novel Dramatic and Ludic Tensions

Nicky Donald is in Copenhagen this week at ICIDS (8th International Conference on Interactive Digital Storytelling) presenting his latest paper Novel Dramatic and Ludic Tensions Arising from Mixed Reality Performance as Exemplified in Better Than Life. Which was based on Better Than Life, a performance we staged last year with Coney.

The paper presents an analysis, from a theatre theory point of view, of the way the performance was able to support new forms of dramatic tension, centred around asymmetries of knowledge and metalepsis (if, like me, you don’t know what that is, well, you’d better read the paper).

Here is the abstract and the full citation and link are below:

We observe that a Mixed Reality Performance called Better Than Life gave rise to novel dramaturgical and ludic possibilities that have not been observed elsewhere. Mixed Reality Performance is an emergent genre that takes many forms, in this case a live experience for a small group of physical participants (PP) and a larger group of online participants (OP). Both groups were offered individual and collective interactions that altered the narrative in real time. A mixed methodology approach to data generated during the performance has identified two key moments where both physical and online participant groups are split into many subgroups by ongoing live events. These events cause tensions that affect the trajectories of participants that make up their experience. Drawing on literary, theatre, cinema and digital game criticism we suggest that the possibilities for engagement in Mixed Reality Performance are exponentially greater than those available to previous media.

Donald, Nicky and Gillies, Marco. 2015. ‘Novel Dramatic and Ludic Tensions Arising from Mixed Reality Performance as Exemplified in Better Than Life’. In: International Conference on Interactive Digital Storytelling. Copenhagen, Denmark.

Better Than Life Final Report

The final report for our Nesta Digital R&D Fund for the Arts project “Better Than Life” has now been published. You can find it here:

This was a collaborative project with Annette Mees and colleagues at Coney and also the (sadly departed) live streaming platform ShowCaster.

This is the executive summary of the report:

We are in a period of significant change. The interconnectivity that the web offers and the quick rise of pervasive media has changed how we communicate with each other, how we access information, how we experience news, stories and the world.

These changes have had a deep impact on storytellers of all kinds. The tools we use to tell tales are evolving, becoming more modular and tailored, more participatory and more engaging than just the printed word or the moving image. These new forms of digitally-enabled storytelling move beyond reinterpreting a text for radio or screen. We need to find new structures, and new relationships with audiences.

Better Than Life, led by Coney, an immersive theatre company that specialises in creating new forms of responsive playing theatre, brought together an extraordinary multidisciplinary team involving award-winning interactive theatre makers, digital broadcasters, developers, multi-platform creatives, academics, VR experts, a magician and many more.

We wanted to create a project that focused, in particular, on how live performance fits into the landscape of this terra nova. The aim was to see how to create an event for a large online audience that combined digital connectivity and interactivity with the liveness and shared experience of theatre.

In particular, we wished to understand what kinds of agency and control audiences might want and enjoy when engaging with this new form of live performance, and we set up a system that allowed both audiences – in the live space and online – to participate in and comment upon the show in several new ways. A total of eight public rehearsals and performances took places in June 2014, with over 300 people taking part either in the live space or online. At the end of the R&D process there emerged a narrative of a new medium. The material in the R&D wasn’t normal theatre and it wasn’t quite broadcast and it wasn’t a game. It was a cultural experience that built on the live-storytelling and visceral nature of theatre, but combined it with the social interaction of MMO (Massively multiplayer online role-playing games) and the delivery infrastructure of online broadcast.

The show was held at a ‘secret’ location in London, with 12 people attending and entering the fictional world of the “Positive Vision Movement” (PVM). In the live space, the audience promenaded through the storyworld of the PVM, following three actors, playing, solving puzzles, chatting, debating and witnessing magic as they went.

Online, people spoke and instructed characters, found commentary, spoke to each other, made choices and switched camera views at will. At points, the online audience could even take control of lighting in the space in order to create specific atmospheres, or shine light on a particular place or person.

In every show the audiences were monitored carefully, questioned at various stages within the show and, in some cases, interviewed in depth about the experience.

Interestingly, interactivity – the ability to ‘take control’ of a situation, make a decision about plot or performance or change the mood through lighting or sound – was not rated as highly, by either audience, as the opportunities to socialise and engage with each other.

Data suggests that the online audience, in particular, enjoyed the ability to form strong social bonds each other, and that they favoured elements of the show in which they were able to connect and communicate directly with performers in the show.

This would suggest that this new kind of hybridised digitally-driven storytelling and play environment is seen first and foremost, as an opportunity to connect with others in a theatrical context – interacting with each other more as one might at a music festival or a house party. This is not then simply theatre with an online component bolted on.

For the three R&D partners, the project was also a great ‘social’ success in terms of what we learned from each other. The project genuinely worked within the gaps of the knowledge overlaps between Coney, Goldsmiths and Showcaster, and we pushed each other to deliver a project with as many interesting new features as we could cram into one production space.

Better Than Life explored what is possible – and proved that hybridised models of entertainment and performance can open up experiences to audiences that genuinely span beyond the geographic boundaries of a single location or building.

Emotional and Functional Challenge in Core and Avant-garde Games

I’m watching Tom Cole give his talk “Emotional and Functional Challenge in Core and Avant-garde Games” at CHI Play 2015. He looked at game reviews to understand the different between main stream games and more “avant-garde” games.

You can read the abstract here and get the full paper below:

Digital games are a wide, diverse and fast developing art form, and it is important to analyse games that are pushing the medium forward to see what design lessons can be learned. However, there are no established criteria to determine which games show these more progressive qualities. Grounded theory methodology was used to analyse language used in games reviews by critics of both ‘core gamer’ titles and those titles with more avant-garde properties. This showed there were two kinds of challenge being discussed — emotional and functional which appear to be, at least partially, mutually exclusive. Reviews of ‘core’ and ‘avant-garde’ games had different measures of purchase value, primary emotions, and modalities of language used to discuss the role of audiovisual qualities. Emotional challenge, ambiguity and solitude are suggested as useful devices for eliciting emotion from the player and for use in developing more ‘avant-garde’ games, as well as providing a basis for further lines of inquiry.

Emotional and Functional Challenge in Core and Avant-garde Games

Cole, Tom , Cairns, Paul and Gillies, Marco . 2015. ‘Emotional and Functional Challenge in Core and Avant-garde Games’. In: CHI Play 2015. London, United Kingdom.

Embodied Design of Full Bodied Interaction

The second paper I presented at MOCO this year was called Embodied Design of Full Bodied Interaction with virtual humans. It is (probably) my paper from my EPSRC grant “Performance Driven Expressive Virtual Characters” and it was my chance to talk about some of the stuff that I thought was interesting in the grant (but I maybe can’t prove).

Here is an extract that explains some of the ideas:

Non-verbal communication is a vital part of our social interactions. While the often quoted estimate that only seven percent of human communication is verbal is contested, it is clear that a large part of people’s communication with each other is through gestures, postures, and movements. This is very different from the way that we traditionally communicate with machines. Creating computer systems capable of this type of non-verbal interaction is therefore an important challenge. Interpreting and animating body language is challenging for a number of reasons, but particularly because it is something we do subconsciously and we are often not aware of what exactly we are doing and would not be able to describe it later. Experts in body language (the people we would like to design the game) are not computer scientists but professionals such as actors and choreographers. Their knowledge of body language is embodied: they understand it by physically doing it and often find it hard to explicitly describe it in words (see Kirsh for a discussion of embodied cognition in the area of dance). This makes it very hard for people to translate it into the explicit, symbolic form needed for computer programming.

The last few years have seen introduction of new forms of user interface device such as the Nintendo WiiMote, the Microsoft Kinect and the Sony Move go beyond the keyboard and mouse and use body movements as a means of interacting with technology. These devices promise many innovations, but maybe the most profound and exciting was one that appeared as a much hyped demo prior to the release of the Microsoft Kinect. The Milo demo showed a computer animated boy interacting with a real woman, replying to her speech and responding to her body language. This example shows the enormous potential for forms of interaction that make use of our natural body movements, including our subconscious body language. However, this demo was never released to the public, showing the important challenges that still remain. While sensing technology and Natural Language Processing have developed considerably in the 5 years since this demo there are still major challenges in simulating the nuances of social interaction, and body language in particular. This is very complex work that combines Social Signal Processing with computer animation of body language. Perhaps the greatest challenge is that body language is a tacit skill \cite{Polanyi1966} in the sense we are able to do it without being able to explicitly say what we are doing or how we are doing it; and it is a form of embodied (social) cognition  in which our body and environment play a fundamental role in our process of thought. The physicality of movement and the environment is an integral part of cognition and so a movement-based interaction is best understood through embodied movement. Kirsh therefore argues that the next generation of interaction techniques should take account of this embodiment, part of a larger trend towards embodiment in interaction design. This raises an important challenge for designing computational systems because they traditionally must be programmed with explicit rules that are abstract and disembodied (in the sense that body movement is not an innate part of their creation). The problem of representing the embodied, tacit skills of body language and social interaction requires us to develop computational techniques that are very different from the explicit and abstract representations used in computer programming.

In Fiebrink’s evaluation of the Wekinator, a system for designing new gestural musical instruments one of the participants commented: “With [the Wekinator], it’s
possible to create physical sound spaces where the connections between body and
sound are the driving force behind the instrument design, and they feel right. … it’s very difficult to create instruments that feel embodied with explicit mapping strategies, while the whole approach
of [the Wekinator] … is precisely to create instruments that feel embodied.” This shows that the wekinator uses a new approach to design gestural interfaces that not only makes it easier to design but changes the way people think about designing, from a explicit focus on features of the movement (e.g. shoulder rotation) to a holistic, embodied view of movement. This approach is called Interactive Machine Learning (IML): the use of machine learning algorithms to design by interactively providing examples of interaction. This “embodied” form of design taps into our natural human understanding of movement which is itself embodied and implicit. We are able to move and recognize movement effectively but less able to analyze it into components. IML allows designers to design by moving rather than by analyzing movement.

This paper presents a first attempt at applying Fiebrink’s method to full body interaction with animated virtual characters, allowing an embodied form of designing by doing as suggested by Kleinsmith et al.  We call this approach Interactive Performance Capture. Performance capture is the process of recording actors’ performances for mapping into a 3D animation. This is able to bring the nuance of the performance to the animation, but it works for static animations, not interactive systems. We use interactive machine learning as a way of capturing the interactions between two performers, as well as their movements.

Here is the reference and link to the full paper:

Embodied Design of Full Bodied Interaction with virtual humans

Gillies, Marco , Brenton, Harry and Kleinsmith, Andrea. 2015. ‘Embodied Design of Full Bodied Interaction with virtual humans’. In: 2nd International Conference on Movement and Computing. Vancouver, Canada.

Sketches vs Skeletons

Last month I was in Vancouver at the fantastic MOCO workshop presenting a couple of papers.

The first was called, Sketches vs Skeletons: video annotation can capture what motion capture cannot. It was the outcome of a study we did as part of the Praise project about using technology to give feedback to music learners about their movements and postures. It taught us a lot about the holistic, complex nature of movment, but also about research, being wrong and how to stop being wrong.

We initially made what seemed to be the obvious choice of using motion capture and created a prototype (technology probe) using the kinect, but when we worked with music teachers we discovered that not only was the kinect not sufficient (not particularly surprising), but the whole premise of using skeletal motion capture was misguided.

This really showed us the value of rapid prototyping, we were not particularly attached to our prototype and could recover from it quickly.

Anyway, here is the abstract:

Good posture is vital to successful musical performance and music teachers spend a considerable amount of effort on improving their students’ performance.
This paper presents a user study to evaluate a skeletal motion capture system (based on the Microsoft Kinect for supporting teachers in giving feedback on learner musicians’ posture and movement. The study identified a number of problems with skeletal motion capture that are likely make it unsuitable for this type of feedback: glitches in the capture reduce trust in the system, particularly as the motion data is removed from other contextual cues that could help judge wether it is correct or not; automated feedback can fail to account for the diversity of playing styles required by learners of different physical proportions, and most importantly, the skeleton representation leaves out many cues that are required to detect posture problems in all but the most elementary beginners. The study also included a participatory design stage which resulted in a radically redesigned prototype, which replaced skeletal motion capture with an interface that allows teachers and learners to sketch on video with the support of computer vision tracking.

and this is the full reference:

Sketches vs Skeletons: Video Annotation Can Capture What Motion Capture Cannot

Gillies, Marco , Brenton, Harry , Yee-King, Matthew , Grimalt-Reynes, Andreu and d’Inverno, Mark . 2015. ‘Sketches vs Skeletons: Video Annotation Can Capture What Motion Capture Cannot’. In: Proceedings of the 2nd International Workshop on Movement and Computing. Vancouver, Canada.