[humaine news] JMUI Special issue: Real-time affect analysis and interpretation in virtual agents and robots is now PUBLISHED!!!
Ginevra Castellano
ginevra at dcs.qmul.ac.uk
Wed Mar 24 11:06:07 GMT 2010
JOURNAL ON MULTIMODAL USER INTERFACES
Volume 3, Issues 1-2, pages 1-153, March 2010
Special issue: Real-time affect analysis and interpretation: closing the
affective loop in virtual agents and robots
Guest Editors: Ginevra Castellano, Kostas Karpouzis, Christopher Peters
and Jean-Claude Martin
http://www.springerlink.com/content/q2080w072713/?p=86740fad4fe94dcab29705938d923c39&pi=0
EDITORIAL
"Special issue on real-time affect analysis and interpretation: closing
the affective loop in virtual agents and robots"
Ginevra Castellano, Kostas Karpouzis, Christopher Peters and Jean-Claude
Martin
Pages 1-3
ARTICLES
"On-line emotion recognition in a 3-D activation-valence-time continuum
using acoustic and linguistic cues"
Florian Eyben, Martin Wöllmer, Alex Graves, Björn Schuller, Ellen
Douglas-Cowie and Roddy Cowie
Pages 7-19
Abstract
For many applications of emotion recognition, such as virtual agents,
the system must select responses while the user is speaking. This
requires reliable on-line recognition of the user’s affect. However most
emotion recognition systems are based on turnwise processing. We present
a novel approach to on-line emotion recognition from speech using Long
Short-Term Memory Recurrent Neural Networks. Emotion is recognised
frame-wise in a two-dimensional valence-activation continuum. In
contrast to currentstate-of-the-art approaches, recognition is performed
on low-level signal frames, similar to those used for speechrecognition.
No statistical functionals are applied to low-level feature contours.
Framing at a higher level is therefore unnecessary and regression
outputs can be produced in real-time for every low-level input frame. We
also investigate the benefits of including linguistic features on the
signal frame level obtained by a keyword spotter.
"Student mental state inference from unintentional body gestures using
dynamic Bayesian networks"
Abdul Rehman Abbasi, Matthew N. Dailey, Nitin V. Afzulpurkar and Takeaki
Uno
Pages 21-31
Abstract
Applications that interact with humans would benefit from knowing the
intentions or mental states of their users. However, mental state
prediction is not only uncertain but also context dependent. In this
paper, we present a dynamic Bayesian network model of the temporal
evolution of students’ mental states and causal associations between
mental states and body gestures in context. Our approach is to convert
sensory descriptions of student gestures into semantic descriptions of
their mental states in a classroom lecture situation. At model learning
time, we use expectation maximization (EM) to estimate model parameters
from partly labeled training data, and at run time, we use the junction
tree algorithm to infer mental states from body gesture evidence. A
maximum a posteriori classifier evaluated with leave-one-out cross
validation on labeled data from 11 students obtains a generalization
accuracy of 97.4% over cases where the student reported a definite
mental state, and 83.2% when we include cases where the student reported
no mental state. Experimental results demonstrate the validity of our
approach. Future work will explore utilization of the model in real-time
intelligent tutoring systems.
"Multimodal emotion recognition in speech-based interaction using facial
expression, body gesture and acoustic analysis"
Loic Kessous, Ginevra Castellano and George Caridakis
Pages 33-48
Abstract
In this paper a study on multimodal automatic emotion recognition during
a speech-based interaction is presented. A database was constructed
consisting of people pronouncing a sentence in a scenario where they
interacted with an agent using speech. Ten people pronounced a sentence
corresponding to a command while making 8 different emotional
expressions. Gender was equally represented, with speakers of several
different native languages including French, German, Greek and Italian.
Facial expression, gesture and acoustic analysis of speech were used to
extract features relevant to emotion. For the automatic classification
of unimodal data, bimodal data and multimodal data, a system based on a
Bayesian classifier was used. After performing an automatic
classification of each modality, the different modalities were combined
using a multimodal approach. Fusion of the modalities at the feature
level (before running the classifier) and at the results level
(combining results from classifier from each modality) were compared.
Fusing the multimodal data resulted in a large increase in the
recognition rates in comparison to the unimodal systems: the multimodal
approach increased the recognition rate by more than 10% when compared
to the most successful unimodal system. Bimodal emotion recognition
based on all combinations of the modalities (i.e., ‘face-gesture’,
‘face-speech’ and ‘gesture-speech’) was also investigated. The results
show that the best pairing is ‘gesture-speech’. Using all three
modalities resulted in a 3.3% classification improvement over the best
bimodal results.
"Multimodal user’s affective state analysis in naturalistic interaction"
George Caridakis, Kostas Karpouzis, Manolis Wallace, Loic Kessous and
Noam Amir
Pages 49-66
Abstract
Affective and human-centered computing have attracted an abundance of
attention during the past years, mainly due to the abundance of
environments and applications able to exploit and adapt to multimodal
input from the users. The combination of facial expressions with prosody
information allows us to capture the users’ emotional state in an
unintrusive manner, relying on the best performing modality in cases
where one modality suffers from noise or bad sensing conditions. In this
paper, we describe a multi-cue, dynamic approach to detect emotion in
naturalistic video sequences, where input is taken from nearly real
world situations, contrary to controlled recording conditions of
audiovisual material. Recognition is performed via a recurrent neural
network, whose short term memory and approximation capabilities cater
for modeling dynamic events in facial and prosodic expressivity. This
approach also differs from existing work in that it models user
expressivity using a dimensional representation, instead of detecting
discrete ‘universal emotions’, which are scarce in everyday
human-machine interaction. The algorithm is deployed on an audiovisual
database which was recorded simulating human-human discourse and,
therefore, contains less extreme expressivity and subtle variations of a
number of emotion labels. Results show that in turnslasting more than a
few frames, recognition rates rise to 98%.
"From expressive gesture to sound - The development of an embodied
mapping trajectory inside a musical interface"
Pieter-Jan Maes, Marc Leman, Micheline Lesaffre, Michiel Demey and Dirk
Moelants
Pages 67-78
Abstract
This paper contributes to the development of a multimodal, musical tool
that extends the natural action range of the human body to communicate
expressiveness into the virtual music domain. The core of this musical
tool consists of a low cost, highly functional computational model
developed upon the Max/MSP platform that (1) captures real-time movement
of the human body into a 3D coordinate system on the basis of the
orientation output of any type of inertial sensor system that is
OSC-compatible, (2) extract low-level movement features that specify the
amount of contraction/expansion as a measure of how a subject uses the
surrounding space, (3) recognizes these movement features as being
expressive gestures, and (4) creates a mapping trajectory between these
expressive gestures and the sound synthesis process of adding harmonic
related voices on an in origin monophonic voice. The concern for a
user-oriented and intuitive mapping strategy was thereby of central
importance. This was achieved by conducting an empirical experiment
based on theoretical concepts from the embodied music cognition
paradigm. Based on empirical evidence, this paper proposes a mapping
trajectory that facilitates the interaction between a musician and his
instrument, the artistic collaboration between (multimedia) artists and
the communication of expressiveness in a social, musical context.
"The mental ingredients of bitterness"
Isabella Poggi and Francesca D’Errico
Pages 79-86
Abstract
In view of multimodal interfaces capable of a detailed representation of
the User’s possible emotions, the paper analyses bitterness in terms of
its mental ingredients, the beliefs and goals represented in the mind of
a person when feeling an emotion. Bitterness is a negative emotion in
between anger and sadness: like anger, it is caused by a sense of
injustice, but also entails a sense of impotence which makes it similar
to sadness. Often caused by betrayal, it comes from the disappointment
of an expectation from oneself or anothers with whom one is affectively
involved, or from a disproportion between commitment and actual results.
The ingredients found in a pilot study were tested through qualitative
analysis of a further questionnaire, which confirmed the ingredients
hypothesized, further revealing the different nature of bitterness
across ages and across types of work.
"Affect recognition for interactive companions: Challenges and design in
real world scenarios"
Ginevra Castellano, Iolanda Leite, André Pereira, Carlos Martinho, Ana
Paiva and Peter W. McOwan
Pages 89-98
Abstract
Affect sensitivity is an important requirement for artificial companions
to be capable of engaging in social interaction with human users. This
paper provides a general overview of some of the issues arising from the
design of an affect recognition framework for artificial companions.
Limitations and challenges are discussed with respect to other
capabilities of companions and a real world scenario where an iCat robot
plays chess with children is presented. In this scenario, affective
states that a robot companion should be able to recognise are identified
and the non-verbal behaviours that are affected by the occurrence of
these states in the children are investigated. The experimental results
aim to provide the foundation for the design of an affect recognition
system for a game companion: in this interaction scenario children tend
to look at the iCat and smile more when they experience a positive
feeling and they are engaged with the iCat.
"When my robot smiles at me - Enabling human-robot rapport via real-time
head gesture mimicry"
Laurel D. Riek, Philip C. Paul and Peter Robinson
Pages 99-108
Abstract
People use imitation to encourage each other during conversation. We
have conducted an experiment to investigate how imitation by a robot
affect people’s perceptions of their conversation with it. The robot
operated in one of three ways: full head gesture mimicking, partial head
gesture mimicking (nodding), and non-mimicking (blinking). Participants
rated how satisfied they were with the interaction. We hypothesized that
participants in the full head gesture condition will rate their
interaction the most positively, followed by the partial and
non-mimicking conditions. We also performed gesture analysis to see if
any differences existed between groups, and did find that men made
significantly more gestures than women while interacting with the robot.
Finally, we interviewed participants to try to ascertain additional
insight into their feelings of rapport with the robot, which revealed a
number of valuable insights.
"Communication of musical expression by means of mobile robot gestures"
Birgitta Burger and Roberto Bresin
Pages 109-118
Abstract
We developed a robotic system that can behave in an emotional way. A
3-wheeled simple robot with limited degrees of freedom was designed. Our
goal was to make the robot displaying emotions in music performance by
performing expressive movements. These movements have been compiled and
programmed based on literature about emotion in music, musicians’
movements in expressive performances, and object shapes that convey
different emotional intentions. The emotions happiness, anger, and
sadness have been implemented in this way. General results from
behavioral experiments show that emotional intentions can be
synthesized, displayed and communicated by an artificial creature, also
in constrained circumstances.
"Investigating shared attention with a virtual agent using a gaze-based
interface"
Christopher Peters, Stylianos Asteriadis and Kostas Karpouzis
Pages 119-130
Abstract
This paper investigates the use of a gaze-based interface for testing
simple shared attention behaviours during an interaction scenario with a
virtual agent. The interface is non-intrusive, operating in real-time
using a standard web-camera for input, monitoring users’ head directions
and processing them in real-time for resolution to screen coordinates.
We use the interface to investigate user perception of the agent’s
behaviour during a shared attention scenario. Our aim is to elaborate
important factors to be considered when constructing engagement models
that must account not only for behaviour in isolation, but also for the
context of the interaction, as is the case during shared attention
situations.
"HMM modeling of user engagement in advice-giving dialogues"
Nicole Novielli
Pages 131-140
Abstract
This research aims at defining a real-time probabilistic model of user’s
engagement in advice-giving dialogues. We propose an approach based on
Hidden Markov Models (HMMs) to describe the differences in the dialogue
pattern due to the different level of engagement experienced by the
users. We train our HMM models on a corpus of natural dialogues with an
Embodied Conversational Agent (ECA) in the domain of healthy-eating. The
dialogues are coded in terms of Dialogue Acts associated to each system
or user move. Results are quite encouraging: HMMs are a powerful
formalism for describing the differences in the dialogue patterns, due
to the different level of engagement of users and they can be
successfully employed in real-time user’s engagement detection. Though,
the HMM learning process shows a lack of robustness when using
low-dimensional and skewed corpora. Therefore we plan a further
validation of our approach with larger corpora in the near future.
"Natural interaction with a virtual guide in a virtual environment - A
multimodal dialogue system"
Dennis Hofs, Mariët Theune and Rieks op den Akker
Pages 141-153
Abstract
This paper describes the Virtual Guide, a multimodal dialogue system
represented by an embodied conversational agent that can help users to
find their way in a virtual environment, while adapting its affective
linguistic style to that of the user. We discuss the modular
architecture of the system, and describe the entire loop from multimodal
input analysis to multimodal output generation. We also describe how the
Virtual Guide detects the level of politeness of the user’s utterances
in real-time during the dialogue and aligns its own language to that of
the user, using different politeness strategies. Finally we report on
our first user tests, and discuss some potential extensions to improve
the system.
--
Dr. Ginevra Castellano
Postdoctoral Research Assistant
Department of Computer Science
School of Electronic Engineering and Computer Science
Queen Mary University of London
Mile End Road
London E1 4NS
Telephone: +44 (0)20 7882 3234
Email: ginevra at dcs.qmul.ac.uk
http://www.eecs.qmul.ac.uk/~ginevra/
More information about the Announce
mailing list