Human-Centred Machine Learning

We are currently asking for submissions to a workshop on Human-Centred Machine Learning at CHI 2016. The workshop aims to bring together people working on the HCI of machine learning, an emerging filed.

If you are interested in finding out more about Human Centred Machine Learning, here is an extract from our proposal:

Statistical machine learning is one of the most successful areas of computer science research in recent decades. It has driven advances in domains from medical and scientific research to the arts.  It provides people the ability to create new systems based on example data, for instance creating a face recognition system from a large dataset of face images, rather than by reasoning about what features make something a face and translating that reasoning into program code. This makes it possible to provide excellent performance on tasks for which it would be very difficult, if not impossible, to describe computational procedures explicitly in code.

In practice, however, machine learning is still a difficult technology to use, requiring an understanding of complex algorithms and working processes, as well as software tools which may have steep learning curves. Patel et al.  studied expert programmers working with machine learning and identified a number of difficulties, including treating methods as a “black box” and difficulty interpreting results. Usability challenges inherent in both existing software tools and the learning algorithms themselves (e.g., algorithms may lack  a human-understandable means for communicating how decisions are made) restrict who can use machine learning and how. A human-centered approach to machine learning that rethinks algorithms and interfaces to algorithms in terms of human goals, contexts, and ways of working can make machine learning more useful and usable.

Past work also demonstrates ways in which a human-centered perspective leads to new approaches to evaluating, analysing, and understanding machine learning methods (Amershi 2014). For instance, Fiebrink showed that users building gestural control and analysis systems use a range of evaluation criteria when testing trained models, such as decision boundary shape and subjective judgements of misclassification cost. Conventional model evaluation metrics focusing on generalisation accuracy may not capture such criteria, which means that computationally comparing alternative models (e.g., using cross-validation) may be insufficient to identify a suitable model. Users may therefore instead rely on tight action-feedback loops in which they modify model behavior by changing the training data, followed by real-time experimentation with models to evaluate them and inform further modifications. Users may also develop strategies for creating training sets that efficiently guide model behaviour using very few examples (e.g., placing training examples near desired decision boundaries), which results in training sets that may break common theoretical assumptions about data (e.g., that examples are independent and identically distributed). Summarizing related work in a variety of application domains, Amershi et al. enumerate several properties of machine learning systems that can be beneficial to users, such as enabling users to critique learner output, providing information beyond mere example labels, and receiving information about the learner that helped them understand it as more than a “black box.” These criteria are not typically considered when formulating or evaluating learning algorithms in machine learning research.

I’ve also included a full reference list at the bottom of the post. If you are interested here is the Call for Papers and you can find the full proposal here.

Saleema Amershi, Maya Cakmak, W. Bradley Knox, and Todd Kulesza. 2014. Power to the people: The role of human sininteractive machine learning. AI Magazine 35, 4 (2014), 105–120.

Saleema Amershi, James Fogarty, and Daniel S. Weld. 2012. Regroup: Interactive machine learning for on- demand group creation in social networks. In Pro- ceedings of the SIGCHI Conference on Human Fac- tors in Computing Systems (CHI ’12). 21–30. DOI:

Bill Buxton. 2007. Sketching user experiences: Getting the design right and the right design. Morgan Kauf- mann Publishers Inc., San Francisco, CA, USA.

Steven P. Dow, Alana Glassco, Jonathan Kass, Melissa Schwarz, Daniel L. Schwartz, and Scott R. Klemmer. 2010. Parallel prototyping leads to bet- ter design results, more divergence, and increased self-efficacy. ACM Transactions on Computer- Human Interaction 17, 4 (Dec. 2010), 1–24. DOI:

Jerry Alan Fails and Dan R. Olsen Jr. 2003. Interactive machine learning. In Proceedings of the International Conference on Intelligent User Interfaces (IUI ’03). 39– 45. DOI:

Rebecca Fiebrink. 2011. Real-time human interaction with supervised learning algorithms for music compo- sition and performance. Ph.D. Dissertation. Princeton University, Princeton, NJ, USA.

Andrea Kleinsmith and Marco Gillies. 2013. Customizing by doing for responsive video game characters. International Journal of Human-Computer Studies 71, 7–8 (2013), 775–784. DOI: ijhcs.2013.03.005

Todd Kulesza, Saleema Amershi, Rich Caruana, Danyel Fisher, and Denis Charles. 2014. Structured labeling for facilitating concept evolution in machine learning. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’14). 3075–3084. DOI: 2557238

Kayur Patel, James Fogarty, James A. Landay, and Beverly Harrison. 2008. Investigating statistical ma- chine learning as a tool for software development. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’08). 667–676. DOI:

Embodied Design of Full Bodied Interaction

The second paper I presented at MOCO this year was called Embodied Design of Full Bodied Interaction with virtual humans. It is (probably) my paper from my EPSRC grant “Performance Driven Expressive Virtual Characters” and it was my chance to talk about some of the stuff that I thought was interesting in the grant (but I maybe can’t prove).

Here is an extract that explains some of the ideas:

Non-verbal communication is a vital part of our social interactions. While the often quoted estimate that only seven percent of human communication is verbal is contested, it is clear that a large part of people’s communication with each other is through gestures, postures, and movements. This is very different from the way that we traditionally communicate with machines. Creating computer systems capable of this type of non-verbal interaction is therefore an important challenge. Interpreting and animating body language is challenging for a number of reasons, but particularly because it is something we do subconsciously and we are often not aware of what exactly we are doing and would not be able to describe it later. Experts in body language (the people we would like to design the game) are not computer scientists but professionals such as actors and choreographers. Their knowledge of body language is embodied: they understand it by physically doing it and often find it hard to explicitly describe it in words (see Kirsh for a discussion of embodied cognition in the area of dance). This makes it very hard for people to translate it into the explicit, symbolic form needed for computer programming.

The last few years have seen introduction of new forms of user interface device such as the Nintendo WiiMote, the Microsoft Kinect and the Sony Move go beyond the keyboard and mouse and use body movements as a means of interacting with technology. These devices promise many innovations, but maybe the most profound and exciting was one that appeared as a much hyped demo prior to the release of the Microsoft Kinect. The Milo demo showed a computer animated boy interacting with a real woman, replying to her speech and responding to her body language. This example shows the enormous potential for forms of interaction that make use of our natural body movements, including our subconscious body language. However, this demo was never released to the public, showing the important challenges that still remain. While sensing technology and Natural Language Processing have developed considerably in the 5 years since this demo there are still major challenges in simulating the nuances of social interaction, and body language in particular. This is very complex work that combines Social Signal Processing with computer animation of body language. Perhaps the greatest challenge is that body language is a tacit skill \cite{Polanyi1966} in the sense we are able to do it without being able to explicitly say what we are doing or how we are doing it; and it is a form of embodied (social) cognition  in which our body and environment play a fundamental role in our process of thought. The physicality of movement and the environment is an integral part of cognition and so a movement-based interaction is best understood through embodied movement. Kirsh therefore argues that the next generation of interaction techniques should take account of this embodiment, part of a larger trend towards embodiment in interaction design. This raises an important challenge for designing computational systems because they traditionally must be programmed with explicit rules that are abstract and disembodied (in the sense that body movement is not an innate part of their creation). The problem of representing the embodied, tacit skills of body language and social interaction requires us to develop computational techniques that are very different from the explicit and abstract representations used in computer programming.

In Fiebrink’s evaluation of the Wekinator, a system for designing new gestural musical instruments one of the participants commented: “With [the Wekinator], it’s
possible to create physical sound spaces where the connections between body and
sound are the driving force behind the instrument design, and they feel right. … it’s very difficult to create instruments that feel embodied with explicit mapping strategies, while the whole approach
of [the Wekinator] … is precisely to create instruments that feel embodied.” This shows that the wekinator uses a new approach to design gestural interfaces that not only makes it easier to design but changes the way people think about designing, from a explicit focus on features of the movement (e.g. shoulder rotation) to a holistic, embodied view of movement. This approach is called Interactive Machine Learning (IML): the use of machine learning algorithms to design by interactively providing examples of interaction. This “embodied” form of design taps into our natural human understanding of movement which is itself embodied and implicit. We are able to move and recognize movement effectively but less able to analyze it into components. IML allows designers to design by moving rather than by analyzing movement.

This paper presents a first attempt at applying Fiebrink’s method to full body interaction with animated virtual characters, allowing an embodied form of designing by doing as suggested by Kleinsmith et al.  We call this approach Interactive Performance Capture. Performance capture is the process of recording actors’ performances for mapping into a 3D animation. This is able to bring the nuance of the performance to the animation, but it works for static animations, not interactive systems. We use interactive machine learning as a way of capturing the interactions between two performers, as well as their movements.

Here is the reference and link to the full paper:

Embodied Design of Full Bodied Interaction with virtual humans

Gillies, Marco , Brenton, Harry and Kleinsmith, Andrea. 2015. ‘Embodied Design of Full Bodied Interaction with virtual humans’. In: 2nd International Conference on Movement and Computing. Vancouver, Canada.

Conceptual models in Interactive Machine Learning

At the end of March I will be going to IUI 2015 to present my paper Applying the CASSM Framework to Improving End User Debugging of Interactive Machine Learning.

This talks about some work I’ve done applying Ann Blandford’s framework for analysing software Concept-based Analysis of Surface and Structural Misfits: CASSM to Interactive Machine Learning. It is a framework that looks at user concepts and how they relate to concepts present in the software. A really interesting element is that is separates concepts in the interface from concepts in the system, so concepts that are central in the underlying algorithm can be missing from the interface and concepts in the interface might not be well represented in the functioning of the system. This lead me to the idea that for interactive machine learning the learning algorithms used should be well aligned to the users concepts of the situation and they should also be well represented visually in the interface. This should make the system easier to use and in particular easier to debug when it goes wrong (because debugging requires a good conceptual model of the system). In order to do this I suggested a nearest neighbour learning algorithm would be well suited to a learning system for full body interaction because users thought in terms of whole poses, not individual features (which are common concepts in other learning algorithms) and it works with the original training data, which users understand well. It also lead us to develop the visualisation you can see in the image above.

If you are interested, here is the abstract and full reference.

This paper presents an application of the CASSM (Concept-based Analysis of Surface and Structural Misfits) framework to interactive machine learning for a bodily interaction domain. We developed software to enable end users to design full body interaction games involving interaction with a virtual character. The software used a machine learning algorithm to classify postures as based on examples provided by users. A longitudinal study showed that training the algorithm was straightforward, but that debugging errors was very challenging. A CASSM analysis showed that there were fundamental mismatches between the users concepts and the working of the learning system. This resulted in a new design in which both the learning algorithm and user interface were better aligned with users’ concepts. This work provides and example of how HCI methods can be applied to machine learning in order to improve its usability and provide new insights into its use.

Applying the CASSM Framework to Improving End User Debugging of Interactive Machine Learning

Gillies, Marco , Kleinsmith, Andrea and Brenton, Harry . 2015. ‘Applying the CASSM Framework to Improving End User Debugging of Interactive Machine Learning’. In: ACM Intelligent User Interfaces (IUI). Atlanta, United States.

What is natural about “Natural User Interfaces”?

I’ve recently had a paper published in Mark Bishop and Andrew Martin’s excellent volume Contemporary Sensorimotor Theory. I thought I would post an extract in which I use sensorimotor theory to think through some of the issues raised by Donald Norman’s insightful critique Natural User Interfaces are not natural.

The type of interaction I have been describing [in the rest of the paper] has been marketed by Microsoft and others as “Natural User Inter- faces”: interfaces that are claimed to be so natural that they do not need to be learned. The logic behind this phrase is that, because body movements come naturally to us, a body movement interface will be natural. This idea has been criticised by many people, most notably by Norman in his article Natural User Interfaces are not natural in which he argues that bodily interfaces can suffer from many problems associated with traditional interfaces (such as the difficulty of remembering gesture) as well as new problems (the ephemerality of gestures and lack of visual feedback). So is there value in the intuition that bodily inter- faces are natural, and if so what is that value and why is it often not seen in existing interfaces?

I would argue that there is a fundamental difference in the nature of bodily interfaces and traditional interfaces. Jacob et al. propose that a variety of new forms of interaction, including bodily interaction, are successful because they leverage a different set of our pre-existing skills from traditions GUIs. While a graphical user interface leverages our skills in manipulating external visual and symbolic representations, bodily interfaces leverage leverage skills related to body and environmental awareness. The skills that enable us to move and act in the world. Similarly, Dourish proposes that we analyse interaction in terms of embodiment which he defines as: “the property of our engagement with the world that allows us to make it meaningful”. This leads him to define Embodied Interaction as “the creation, manipulation, and sharing of meaning through engaged interaction with artefacts”. While he applies this definition to both traditional and new forms of interaction, the nature of this engaged interaction is very different in bodily interfaces. Following Jacob we could say that, in a successful bodily interface, this engaged interaction can be the same form of engagement we have with our bodies and environment in our daily lives and we can therefore re-use our existing skills that enable us to engage with the world.

If we take a non-representational, sensorimotor view of perception and action these skills are very different from the skills of a traditional interface involving manipulation of representations. This view allows us to keep the intuition that bodily interfaces are different from graphical user interfaces and explain what is meant by natural in the phrase “natural user interface” (the so-called natural skills are non-representational sensorimotor skills), while also allowing us to be critical of the claims of bodily interfaces. Natural user interfaces, on this view, are only natural if they take account of the non-representational, sensorimo- tor nature of our body movement skills. Body movement interfaces which are just extensions of a symbolic, representational interface which are just a more physically tiring version of a GUI.

A good example of this is gestural interaction. A common implementation of this form of interface is to have a number of pre-defined gestures that can be mapped to actions in the interface. This is one of the types of interface that Norman criticises. When done badly there is a fairly arbitrary mapping between a symbolic gesture and a symbolic action. Users’ body movements are used as part of a representation manipulation task. There is nothing wrong with this per se but it does not live up to the hype of natural user interfaces and is not much different from a traditional GUI. In fact, as Norman notes, it can be worse, as users do not have a visual cue to remind them which gestures they should be performing. This makes it closer to a textual command line interface where users must remember obscure commands with no visual prompts. Gestural user interfaces do not have to be like this.

These problems can be avoided if we think of gestural interfaces as tapping sensorimotor skills, not representation manipulation skills. For example, the work of Bevilacqua et al. uses gesture to control music. In this work, ges- tures are tracked continuously rather than being simply recognised at the end of the gesture. This allows users to continuously control the production of sound throughout the time they are performing the gesture, rather than triggering the gesture at the end. This seemingly simple difference transforms the task from representation manipulation (producing a symbolic gesture and expecting a dis- crete response) to a tight sensorimotor loop in which the auditory feedback can influence movement which in turn controls the audio. A more familiar example of this form of continuous feedback is the touch screen “pinch to zoom” gesture developed for the iPhone. In this gesture an image resizes dynamically and continuously in response to the users’ fingers moving together and apart. This continuous feedback and interaction enables a sensorimotor loop that can leverage our real world movement skills.

A second feature of Bevilacqua et al.’s system is that is allows users to easily define their own gestures and the do so by acting out those gestures while listening to the music to be controlled. I will come back to this feature in more detail later, but for now we can note that it means that gestures are not limited to a set of pre-defined symbolic gestures. Users can define movements that feel natural to them for controlling a particular type of music. What does “natural” mean in this context? Again, it means that the user already has a learnt sensorimotor mapping between the music and a movement (for example a certain way of tapping the hands in response to a beat).

This is the full article:

Gillies, Marco and Kleinsmith, Andrea. 2014. Non-representational Interaction Design. In: Mark (J. M.) Bishop and Andrew Martin, eds. Contemporary Sensorimotor Theory. 15 Switzerland: Springer International Publishing, pp. 201-208. ISBN 978-3-319-05106-2

Performance Driven Expressive Virtual Characters

Creating believable, expressive interactive characters is one of the great, and largely unsolved, technical challenges of interactive media. Human-like characters appear throughout interactive media, virtual worlds and games and are vital to the social and narrative aspects of these media, but they rarely have the psychological depth of expression found in other media. This proposal is for the development of research into a new approach to creating interactive characters which identifies the central problem of current methods as being the fact that creating the interactive behaviour, or Artificial Intelligence (AI), of a character is still primarily a programming task, and therefore in the hands of people with a technical rather than an artistic training. Our hypothesis is that the actors’ artistic understanding of human behaviour will bring an individuality, subtlety and nuance to the character that it would be difficult to create in hand authored models. This will help interactive media represent more nuanced social interaction, thus broadening their range of application.
The proposed research will use information from an actor’s performance to determine the parameters of a character’s behaviour software. We will use Motion Capture to record an actor interacting with another person. The recorded data will be used as input to a machine learning algorithm that will infer the parameters of a behavioural control model for the character. This model will then be used to control a real time animated character in interaction with a person. The interaction will be a full body interaction involving motion tracking of posture and/or gestures, and voice input.

This project is funded by the EPSRC under the first grant theme (project number EP/H02977X/1).

Gillies, Marco, 2009. Learning Finite State Machine Controllers from Motion Capture Data. IEEE transactions on computational intelligence and AI in games, 1 (1). pp. 63-72. ISSN 1943-068X

Gillies, Marco and Pan, Xueni and Slater, Mel and Shawe-Taylor, John, 2008. Responsive Listening Behavior. Computer Animation and Virtual Worlds, 19 (5). pp. 579-589. ISSN 1546-4261

Fluid Gesture Interaction Design

The GIDE gesture design interface

Fluid gesture interaction design: applications of continuous recognition for the design of modern gestural interfaces, Bruno Zamborlin’s new paper (that I helped on) is about to be published, you can access it on the Goldsmiths Repository:

The paper is based on Frédéric Bevilacqua’s Gesture Following algorithm for continuous gestures recognition (which Bruno worked on), but in this worked we really looked carefully at the HCI of gesture interface design. If you want people to design good gesture interfaces it isn’t enough to have good gesture recognition software, you need design tools that support them in doing so. In particular you need to support them in tweaking parameters to get optimal performance and help them know what to do when things don’t work as expected. Bruno showed in this paper, how real time visual and auditory feedback about the recognition process can help people design better interfaces more quickly.

Data Driven Interactive Virtual Characters

Screen Shot 2014-11-05 at 09.32.51

This work is about using data from peoples movements to learn AI methods for controlling the bahaviour of animated characters. It pioneered the use of machine learning for virtual characters, particularly for AI aspects (as opposed to animation, where it was already quite common).

Screen Shot 2014-11-05 at 09.34.02

One of the key aspects of this work was the development of two person capture techniques which allowed us to capture both the behaviour of the character, but also the behaviour of the person the character was interacting with. By synchronising the two data sets, we were able to learn a model that could generate responses from the character. We now call this technique “Interactive Performance Capture”.

Kinect can open up a new world of games customization

The Microsoft Kinect is the device that has promised to change the way we play games and interact with computers by making real time motion tracking possible on commodity hardware, but it’s potential doesn’t stop there. We’ve been exploring how it can massively expand the way players can customise their games.

Customisation is a big part of modern gaming, particularly in Massively Multiplayer Online games, where players customise their avatars to develop an individual identity within the game, and communicate that identity to other players. Up to now customisation has mostly been about changing how characters look, but that is only one aspect of what makes a character unique. How a character moves is also very important. Even more fundamentally we could customise how characters respond to events in the game, what game developers call Artificial Intelligence. Up to now customising these would involve complex animation and programming, skills that ordinary players don’t have. With Andrea Kleinsmith, I’ve been exploring how motion tracking like the kinect can make customising animtaion and AI easy. Players can use their own movements to make the animations for the characters. AI is harder, but we’ve been looking at how machine learning to build AI customisation tools. Rather than have to program the AI, players can act out examples of behavior using motion capture or a kinect, and our machine learning algorithms can infer AI rules to control the character.

We’ve recently had a paper published in the International Journal of Human-Computer Studies that describes a study we did that allowed players to customise thier avatars’ behaviour when they win or loose a point in a 3D version of the classic video game Pong. You can see it here:

Kleinsmith, Andrea and Gillies, Marco. 2013. Customizing by Doing for Responsive Video Game Characters. international journal of human-computer studies, 71(7), pp. 775-784. ISSN 1071-5819