Groundbreaking video communication brings groups together

Have you ever noticed how webcams and front-facing cameras on mobile phones never catch us at a flattering angle? Do you yearn to be able to view more than just a face on the monitor with friends and family? Would it not be great to interact as if you were there with them in person?


Since February 2008, the ‘Narrative Interactive Media’ (NIM) group from the Department of Computing – along with esteemed industry partners (including BT, Alcatel, Philips and board game manufacturer Ravensburger) – has been examining the potential of video technology in supporting group-to-group communication. An €18m project, TA2 looks at how video technology could improve social relationships and connect groups across different locations.

More cameras, better interaction
Indeed, the limitations of single-camera communication are self-explanatory. Whether you’re on a PC, laptop or tablet, you’re pretty much rooted to the spot – you have to place yourself in the frame of the camera. If you wander out of view, you’re out of view. The setup of the cameras in TA2 provides, in addition to the standard front-facing camera, auxiliary cameras that zoom and move to follow the action.
Incorporating best practice from the world of TV and film production, TA2’s cameras transmit images imitating how human attention would direct the experience. It is this ‘communication orchestration’ where Goldsmiths brought expertise to the project.

Marian Ursu, who leads the NIM group, explains: “Obviously neither the cameras nor the editing can be human-controlled because they have to work as we interact. Also, we have to create a slick communication environment where everyone in the room is for all intents and purposes an active part of the interaction.” This requires some degree of intelligence to be embedded into the system such that orchestration decisions can be taken automatically. “You can think of this as the brain of the system,” says Marian, and “we lead the project’s research at this end.”

Michael Frantzis, an expert in video narration and researcher in the NIM group, explains how this works. “If this were film or TV production we would have a cameraman and a director, but we do not have that luxury. Instead, primitive spatial audio and visual information is used as a basis for the automatic inference or information on, amongst others, who is in frame, when they are talking, who is talking to whom, keywords in their speech and their visual focus on attention.”

“We call these social or conversation cues” adds Martin Groen, a Computer Scientist working in Artificial Intelligence now a Psychologist, whose role in the team is to identify and define these cues. This information is in turn interpreted and transformed into decisions regarding camera choices and screen editing.

To develop software that can carry out such processes is an extremely complex, demanding and difficult task, and NIM has five Computer Science Researchers dedicated to it: Manolis Falelakis, Pedro Torres, Spiros Michalakopoulos, Notis Gasparis and Vilmos Zsombori. They are exploring ways in which knowledge can be expressed and worked with by computers and at the same time ways in which this could work sufficiently fast and reliably to be effective for communication mediation.

More than just a chat
For anyone wondering why board game designers Ravensburger were listed as a corporate partner of TA2, things are about to become clear. “TA2 is not about just having a cup of tea and a chat,” explains Marian. “We thought that when people get together normally they have activities to engage with, such as sharing pictures or playing games.”

Indeed the experimental trials saw two groups of participants split into one of two locations – the NIM Lab, in Richard Hoggart Building, and the Goldsmiths Digital Studios, in the Ben Pimlott Building – with the groups battling it out in a game of Pictionary. Each team was made up of two people, one in each room, with the opposition trying to distract the person communicating the picture to their teammate.

There were three sessions for each group, each 30 minutes long: one with a fixed front-facing camera only, one with an orchestrated video (the editing was done by humans for these trials), and one where the camera editing was random, but respected the rhythm of the human editing. 39 Goldsmiths students participated in these trials over three days.‌

Orchestration trials produce a world-first
This was the first time end user trials for orchestrated multi-camera video communication between social groups in separated locations have been conducted.

Quantitative and qualitative measures were used to assess the experience. The quantitative measure – the number of accurate and inaccurate guesses analysed in the context of the overall number of turns – brought noteworthy results, showing that there were there were significantly more accurate guesses, and significantly fewer incorrect guesses in the orchestrated trial than the other two conditions. “This means the information flowed better with orchestration,” says Marian.

Marian was less enthusiastic, but still very optimistic, regarding the results from the qualitative measure – the Independent Television Commission / Sense of Presence Inventory – a validated questionnaire assessing the subjective experience of participants, developed by the Department of Psychology’s i2 group. The questionnaire, although providing no statistically significant differences between the three conditions, showed participants’ slight preference for the fixed-camera condition. Marian says “It was a shame the orchestration didn’t come out as significantly better. As someone from the team put it: ‘Orchestration is good for you even if you don’t like it!’ And we are confident that with some immediate improvements to the technology we will get there in the next set of trials.

Taking it to the next level
The NIM Group is confident that the groundbreaking work undertaken in the project provides a solid theoretical and empirical foundation for improving social relationships and connecting groups across different locations.

Indeed, TA2’s findings will provide the basis for the group’s next EU collaborative project, entitled ‘VConect’ – Video Communication for Networked Communities – set to get underway in December this year.

Marian sets the scene: “VConect builds upon two of the most significant achievements of the current Internet: video conferencing and social networks.

“It will make social networks as flexible and engaging as chatting face-to-face to a group of friends. It will allow us to see what’s really happening, to know who is hurting and who is laughing. It will allow us to see the real drama, let us be part of the most rowdy crowds or talk quietly to a lonely friend.”