Embodied Design of Full Bodied Interaction

The second paper I presented at MOCO this year was called Embodied Design of Full Bodied Interaction with virtual humans. It is (probably) my paper from my EPSRC grant “Performance Driven Expressive Virtual Characters” and it was my chance to talk about some of the stuff that I thought was interesting in the grant (but I maybe can’t prove).

Here is an extract that explains some of the ideas:

Non-verbal communication is a vital part of our social interactions. While the often quoted estimate that only seven percent of human communication is verbal is contested, it is clear that a large part of people’s communication with each other is through gestures, postures, and movements. This is very different from the way that we traditionally communicate with machines. Creating computer systems capable of this type of non-verbal interaction is therefore an important challenge. Interpreting and animating body language is challenging for a number of reasons, but particularly because it is something we do subconsciously and we are often not aware of what exactly we are doing and would not be able to describe it later. Experts in body language (the people we would like to design the game) are not computer scientists but professionals such as actors and choreographers. Their knowledge of body language is embodied: they understand it by physically doing it and often find it hard to explicitly describe it in words (see Kirsh for a discussion of embodied cognition in the area of dance). This makes it very hard for people to translate it into the explicit, symbolic form needed for computer programming.

The last few years have seen introduction of new forms of user interface device such as the Nintendo WiiMote, the Microsoft Kinect and the Sony Move go beyond the keyboard and mouse and use body movements as a means of interacting with technology. These devices promise many innovations, but maybe the most profound and exciting was one that appeared as a much hyped demo prior to the release of the Microsoft Kinect. The Milo demo showed a computer animated boy interacting with a real woman, replying to her speech and responding to her body language. This example shows the enormous potential for forms of interaction that make use of our natural body movements, including our subconscious body language. However, this demo was never released to the public, showing the important challenges that still remain. While sensing technology and Natural Language Processing have developed considerably in the 5 years since this demo there are still major challenges in simulating the nuances of social interaction, and body language in particular. This is very complex work that combines Social Signal Processing with computer animation of body language. Perhaps the greatest challenge is that body language is a tacit skill \cite{Polanyi1966} in the sense we are able to do it without being able to explicitly say what we are doing or how we are doing it; and it is a form of embodied (social) cognition  in which our body and environment play a fundamental role in our process of thought. The physicality of movement and the environment is an integral part of cognition and so a movement-based interaction is best understood through embodied movement. Kirsh therefore argues that the next generation of interaction techniques should take account of this embodiment, part of a larger trend towards embodiment in interaction design. This raises an important challenge for designing computational systems because they traditionally must be programmed with explicit rules that are abstract and disembodied (in the sense that body movement is not an innate part of their creation). The problem of representing the embodied, tacit skills of body language and social interaction requires us to develop computational techniques that are very different from the explicit and abstract representations used in computer programming.

In Fiebrink’s evaluation of the Wekinator, a system for designing new gestural musical instruments one of the participants commented: “With [the Wekinator], it’s
possible to create physical sound spaces where the connections between body and
sound are the driving force behind the instrument design, and they feel right. … it’s very difficult to create instruments that feel embodied with explicit mapping strategies, while the whole approach
of [the Wekinator] … is precisely to create instruments that feel embodied.” This shows that the wekinator uses a new approach to design gestural interfaces that not only makes it easier to design but changes the way people think about designing, from a explicit focus on features of the movement (e.g. shoulder rotation) to a holistic, embodied view of movement. This approach is called Interactive Machine Learning (IML): the use of machine learning algorithms to design by interactively providing examples of interaction. This “embodied” form of design taps into our natural human understanding of movement which is itself embodied and implicit. We are able to move and recognize movement effectively but less able to analyze it into components. IML allows designers to design by moving rather than by analyzing movement.

This paper presents a first attempt at applying Fiebrink’s method to full body interaction with animated virtual characters, allowing an embodied form of designing by doing as suggested by Kleinsmith et al.  We call this approach Interactive Performance Capture. Performance capture is the process of recording actors’ performances for mapping into a 3D animation. This is able to bring the nuance of the performance to the animation, but it works for static animations, not interactive systems. We use interactive machine learning as a way of capturing the interactions between two performers, as well as their movements.

Here is the reference and link to the full paper:

Embodied Design of Full Bodied Interaction with virtual humans

Gillies, Marco , Brenton, Harry and Kleinsmith, Andrea. 2015. ‘Embodied Design of Full Bodied Interaction with virtual humans’. In: 2nd International Conference on Movement and Computing. Vancouver, Canada.

Gestural Archeology

I’ve said my final goodbyes to Pisa and the Scuola Normale.

I thought I would do a sneak peak of the work we have done there. I’ve been working on using Baptiste Caramiaux‘s fantastic Gesture Variation Follower in the CAVE immersive environment at DreamsLab (with the help of Niccolò Albertini and Andrea Brogni). Our first test was working with Archaeologists Riccardo Olivito and Emanuele Taccola.  I will say more when the work is finished and published, but we were building on their previous work shown in the video below.

Conceptual models in Interactive Machine Learning

At the end of March I will be going to IUI 2015 to present my paper Applying the CASSM Framework to Improving End User Debugging of Interactive Machine Learning.

This talks about some work I’ve done applying Ann Blandford’s framework for analysing software Concept-based Analysis of Surface and Structural Misfits: CASSM to Interactive Machine Learning. It is a framework that looks at user concepts and how they relate to concepts present in the software. A really interesting element is that is separates concepts in the interface from concepts in the system, so concepts that are central in the underlying algorithm can be missing from the interface and concepts in the interface might not be well represented in the functioning of the system. This lead me to the idea that for interactive machine learning the learning algorithms used should be well aligned to the users concepts of the situation and they should also be well represented visually in the interface. This should make the system easier to use and in particular easier to debug when it goes wrong (because debugging requires a good conceptual model of the system). In order to do this I suggested a nearest neighbour learning algorithm would be well suited to a learning system for full body interaction because users thought in terms of whole poses, not individual features (which are common concepts in other learning algorithms) and it works with the original training data, which users understand well. It also lead us to develop the visualisation you can see in the image above.

If you are interested, here is the abstract and full reference.

This paper presents an application of the CASSM (Concept-based Analysis of Surface and Structural Misfits) framework to interactive machine learning for a bodily interaction domain. We developed software to enable end users to design full body interaction games involving interaction with a virtual character. The software used a machine learning algorithm to classify postures as based on examples provided by users. A longitudinal study showed that training the algorithm was straightforward, but that debugging errors was very challenging. A CASSM analysis showed that there were fundamental mismatches between the users concepts and the working of the learning system. This resulted in a new design in which both the learning algorithm and user interface were better aligned with users’ concepts. This work provides and example of how HCI methods can be applied to machine learning in order to improve its usability and provide new insights into its use.

Applying the CASSM Framework to Improving End User Debugging of Interactive Machine Learning

Gillies, Marco , Kleinsmith, Andrea and Brenton, Harry . 2015. ‘Applying the CASSM Framework to Improving End User Debugging of Interactive Machine Learning’. In: ACM Intelligent User Interfaces (IUI). Atlanta, United States.

What is natural about “Natural User Interfaces”?

I’ve recently had a paper published in Mark Bishop and Andrew Martin’s excellent volume Contemporary Sensorimotor Theory. I thought I would post an extract in which I use sensorimotor theory to think through some of the issues raised by Donald Norman’s insightful critique Natural User Interfaces are not natural.

The type of interaction I have been describing [in the rest of the paper] has been marketed by Microsoft and others as “Natural User Inter- faces”: interfaces that are claimed to be so natural that they do not need to be learned. The logic behind this phrase is that, because body movements come naturally to us, a body movement interface will be natural. This idea has been criticised by many people, most notably by Norman in his article Natural User Interfaces are not natural in which he argues that bodily interfaces can suffer from many problems associated with traditional interfaces (such as the difficulty of remembering gesture) as well as new problems (the ephemerality of gestures and lack of visual feedback). So is there value in the intuition that bodily inter- faces are natural, and if so what is that value and why is it often not seen in existing interfaces?

I would argue that there is a fundamental difference in the nature of bodily interfaces and traditional interfaces. Jacob et al. propose that a variety of new forms of interaction, including bodily interaction, are successful because they leverage a different set of our pre-existing skills from traditions GUIs. While a graphical user interface leverages our skills in manipulating external visual and symbolic representations, bodily interfaces leverage leverage skills related to body and environmental awareness. The skills that enable us to move and act in the world. Similarly, Dourish proposes that we analyse interaction in terms of embodiment which he defines as: “the property of our engagement with the world that allows us to make it meaningful”. This leads him to define Embodied Interaction as “the creation, manipulation, and sharing of meaning through engaged interaction with artefacts”. While he applies this definition to both traditional and new forms of interaction, the nature of this engaged interaction is very different in bodily interfaces. Following Jacob we could say that, in a successful bodily interface, this engaged interaction can be the same form of engagement we have with our bodies and environment in our daily lives and we can therefore re-use our existing skills that enable us to engage with the world.

If we take a non-representational, sensorimotor view of perception and action these skills are very different from the skills of a traditional interface involving manipulation of representations. This view allows us to keep the intuition that bodily interfaces are different from graphical user interfaces and explain what is meant by natural in the phrase “natural user interface” (the so-called natural skills are non-representational sensorimotor skills), while also allowing us to be critical of the claims of bodily interfaces. Natural user interfaces, on this view, are only natural if they take account of the non-representational, sensorimo- tor nature of our body movement skills. Body movement interfaces which are just extensions of a symbolic, representational interface which are just a more physically tiring version of a GUI.

A good example of this is gestural interaction. A common implementation of this form of interface is to have a number of pre-defined gestures that can be mapped to actions in the interface. This is one of the types of interface that Norman criticises. When done badly there is a fairly arbitrary mapping between a symbolic gesture and a symbolic action. Users’ body movements are used as part of a representation manipulation task. There is nothing wrong with this per se but it does not live up to the hype of natural user interfaces and is not much different from a traditional GUI. In fact, as Norman notes, it can be worse, as users do not have a visual cue to remind them which gestures they should be performing. This makes it closer to a textual command line interface where users must remember obscure commands with no visual prompts. Gestural user interfaces do not have to be like this.

These problems can be avoided if we think of gestural interfaces as tapping sensorimotor skills, not representation manipulation skills. For example, the work of Bevilacqua et al. uses gesture to control music. In this work, ges- tures are tracked continuously rather than being simply recognised at the end of the gesture. This allows users to continuously control the production of sound throughout the time they are performing the gesture, rather than triggering the gesture at the end. This seemingly simple difference transforms the task from representation manipulation (producing a symbolic gesture and expecting a dis- crete response) to a tight sensorimotor loop in which the auditory feedback can influence movement which in turn controls the audio. A more familiar example of this form of continuous feedback is the touch screen “pinch to zoom” gesture developed for the iPhone. In this gesture an image resizes dynamically and continuously in response to the users’ fingers moving together and apart. This continuous feedback and interaction enables a sensorimotor loop that can leverage our real world movement skills.

A second feature of Bevilacqua et al.’s system is that is allows users to easily define their own gestures and the do so by acting out those gestures while listening to the music to be controlled. I will come back to this feature in more detail later, but for now we can note that it means that gestures are not limited to a set of pre-defined symbolic gestures. Users can define movements that feel natural to them for controlling a particular type of music. What does “natural” mean in this context? Again, it means that the user already has a learnt sensorimotor mapping between the music and a movement (for example a certain way of tapping the hands in response to a beat).

This is the full article:

Gillies, Marco and Kleinsmith, Andrea. 2014. Non-representational Interaction Design. In: Mark (J. M.) Bishop and Andrew Martin, eds. Contemporary Sensorimotor Theory. 15 Switzerland: Springer International Publishing, pp. 201-208. ISBN 978-3-319-05106-2


Bruno Zamborlin, whose PhD I have been supervising together with IRCAM in Paris, is now releasing Mogees, a device and app that can turn any object into a musical intrument. See their website for more details:


Performance Driven Expressive Virtual Characters

Creating believable, expressive interactive characters is one of the great, and largely unsolved, technical challenges of interactive media. Human-like characters appear throughout interactive media, virtual worlds and games and are vital to the social and narrative aspects of these media, but they rarely have the psychological depth of expression found in other media. This proposal is for the development of research into a new approach to creating interactive characters which identifies the central problem of current methods as being the fact that creating the interactive behaviour, or Artificial Intelligence (AI), of a character is still primarily a programming task, and therefore in the hands of people with a technical rather than an artistic training. Our hypothesis is that the actors’ artistic understanding of human behaviour will bring an individuality, subtlety and nuance to the character that it would be difficult to create in hand authored models. This will help interactive media represent more nuanced social interaction, thus broadening their range of application.
The proposed research will use information from an actor’s performance to determine the parameters of a character’s behaviour software. We will use Motion Capture to record an actor interacting with another person. The recorded data will be used as input to a machine learning algorithm that will infer the parameters of a behavioural control model for the character. This model will then be used to control a real time animated character in interaction with a person. The interaction will be a full body interaction involving motion tracking of posture and/or gestures, and voice input.

This project is funded by the EPSRC under the first grant theme (project number EP/H02977X/1).

Gillies, Marco, 2009. Learning Finite State Machine Controllers from Motion Capture Data. IEEE transactions on computational intelligence and AI in games, 1 (1). pp. 63-72. ISSN 1943-068X

Gillies, Marco and Pan, Xueni and Slater, Mel and Shawe-Taylor, John, 2008. Responsive Listening Behavior. Computer Animation and Virtual Worlds, 19 (5). pp. 579-589. ISSN 1546-4261

Fluid Gesture Interaction Design

The GIDE gesture design interface

Fluid gesture interaction design: applications of continuous recognition for the design of modern gestural interfaces, Bruno Zamborlin’s new paper (that I helped on) is about to be published, you can access it on the Goldsmiths Repository:


The paper is based on Frédéric Bevilacqua’s Gesture Following algorithm for continuous gestures recognition (which Bruno worked on), but in this worked we really looked carefully at the HCI of gesture interface design. If you want people to design good gesture interfaces it isn’t enough to have good gesture recognition software, you need design tools that support them in doing so. In particular you need to support them in tweaking parameters to get optimal performance and help them know what to do when things don’t work as expected. Bruno showed in this paper, how real time visual and auditory feedback about the recognition process can help people design better interfaces more quickly.

Kinect can open up a new world of games customization

The Microsoft Kinect is the device that has promised to change the way we play games and interact with computers by making real time motion tracking possible on commodity hardware, but it’s potential doesn’t stop there. We’ve been exploring how it can massively expand the way players can customise their games.

Customisation is a big part of modern gaming, particularly in Massively Multiplayer Online games, where players customise their avatars to develop an individual identity within the game, and communicate that identity to other players. Up to now customisation has mostly been about changing how characters look, but that is only one aspect of what makes a character unique. How a character moves is also very important. Even more fundamentally we could customise how characters respond to events in the game, what game developers call Artificial Intelligence. Up to now customising these would involve complex animation and programming, skills that ordinary players don’t have. With Andrea Kleinsmith, I’ve been exploring how motion tracking like the kinect can make customising animtaion and AI easy. Players can use their own movements to make the animations for the characters. AI is harder, but we’ve been looking at how machine learning to build AI customisation tools. Rather than have to program the AI, players can act out examples of behavior using motion capture or a kinect, and our machine learning algorithms can infer AI rules to control the character.

We’ve recently had a paper published in the International Journal of Human-Computer Studies that describes a study we did that allowed players to customise thier avatars’ behaviour when they win or loose a point in a 3D version of the classic video game Pong. You can see it here:

Kleinsmith, Andrea and Gillies, Marco. 2013. Customizing by Doing for Responsive Video Game Characters. international journal of human-computer studies, 71(7), pp. 775-784. ISSN 1071-5819