Ilhaire.eu - Project

Introduction

The ILHAIRE project will last 36 months. It will gather data on laughter using high quality sound, video, facial and upper body motion capture. The process of database collection will be grounded in psychological foundations and the data will be used to validate computational and theoretical models of analysis and synthesis of audio-visual laughter. Dialog systems between humans and avatars will be developed and we will conduct studies to capture the qualitative experience evoked by a laughing avatar.

Work Packages

Work Package 1
Work Package 2
Work Package 3
Work Package 4
Work Package 5
Work Package 6
Work Package 7

Incremental Database

WP1 provides the resources required for most of the other work packages by assembling existing resources and generating new ones to create an incremental database. Initially existing resources containing audio-visual records of laughter will be assembled. These resources will be used to construct an annotated database showing multimodal records of laughter in naturalistic interactions, and will incorporate a system of labels distinguishing the main kinds and functions of laughter represented in it.

WP1 will also use focus groups and large-scale questionnaires to identify a range of material that reliably makes different kinds of people laugh. In the early stages this will focus on material in English; at a later stage it will be extended to other languages and cultures. The laughter-inducing material will also inform experiments designed to induce and record naturalistic laughter using both audio-visual and motion capture recording techniques.

Multimodal Analysis and Recognition of Laughter

WP2 concerns multimodal analysis and modelling of laughter. This covers both hilarious and conversational (social) laughter.

Objectives are:

to infer the sequences of phonemes and facial action units of laughs;
to define a set of expressive gesture features for analysing gesture during laugh;
to develop novel fusion algorithms based on the current signal context, for integrating information from the auditory, facial, and gestural channels; to automatically detect and classify laugh, based on such an integrated multimodal information.

Moreover, WP2 aims at investigating the influence of culture and gender to improve laughter detection. The models and techniques developed here will form the ground for the adaptive models to be employed for laughter synthesis. The work is organised in steps, according to a spiral research and development approach, designed to converge toward the final results. A first step uses existing material to produce baseline methods; then these methods are refined for hilarious laughter and finally integrated for specific analyses of social laughter.

Multimodal Generation and Synthesis of Laughter

WP3 deals with the development of models for generation and synthesis of audiovisual laughter.

The task of the Generation stage is to describe laughter episodes at the behavioural level (audio structure, body posture, facial action unit sequences), so as to make them appropriate in a given conversational context. The input of the Generation stage is provided by the Dialogue module (WP4), which makes decisions on the timing, duration, and style of laughter as a function of the application scenario and according to user interaction.

The Synthesis stage then provides the acoustic and visual representation of laughter, from this behavioural description. Audio synthesis will be based on statistical parametric synthesis (HMMs), adapted to the specifically inarticulate nature of laughter. Visual synthesis is based on a Finites State Machines in which states will be mostly associated to body, head, or facial postures, and transitions will provide natural looking movements from/to these postures.

Both stages heavily rely on the annotated laughter databases obtained by from WP1 and analyzed in WP2. First, for training the HMM models involved in audio synthesis and for having a large collection of examples of visual laughter to choose from for visual synthesis. Second, for using the data in copy synthesis experiments, for checking each stage separately.

Laughter-enabled Dialog Modelling

The objective of WP4 is to design an adaptive and data-driven multimodal dialogue management system for achieving natural and user-friendly interactions integrating laughter. The dialogue management task will focus on non-linguistic information to generate acceptable and appropriate types of laughter at appropriate times. Machine learning methods such as reinforcement learning will serve to optimise the interaction strategy and to decide the most optimized moments when to generate laughter. It will first use existing data and data collected in WP1 to generate baseline dialogue strategies, and inputs from WP2 for adapting the strategy online afterwards. It will provide information to the laughter synthesis system developed in WP3.

More specifically, the goals to achieve in this WP will be to:

Develop and evaluate a robust and adaptive dialogue manager capable of integrating information from the multimodal analysis of the user inputs and laughter recognition (WP2);
Develop data-driven Reinforcement Learning methods able to handle large state spaces for optimising dialogue management, based on the current context, user inputs, and laughter gathered from WP2 and data collected in WP1;
Develop imitation-based algorithms based on rules or learning from data to artificially expand data sets and to mimic human users.

Psychological Foundations of Laughter

Work Package 5 will lay the psychological foundations of laughter. The goals are to understand the factors affecting the conveyance and perception of laughter expressions. Psychologically appropriate methods (quantitative and qualitative) of assessing affective and cognitive responses to laughing avatars will be developed.

In experimental settings, personality characteristics, such as trait cheerfulness or dispositions to being laughed at and ridicule (e.g., gelotophobia, the fear of being laughed at) will be considered, as they predispose individuals to different responses to laughter and laughter-related stimuli. It is expected that persons with different personalities will respond differently to avatars and their laughter.

The knowledge of those differences will help identifying factors that make an avatar laughter contagious and positively valued (sound, facial expression, intensity). A further goal is to identify a model of mimicry, counter-mimicry and emotional contagion multimodal responses, also under the focus of cultural differences.

Integration and Evaluations

This WP has two main tasks, one related to the integration of the various technologies developed within ILHAIRE, and one regarding the evaluation of the integrated system and of its components.

The first task is concerned with integrating the various components developed within the ILHAIRE project (laugh recognition, dialog manager, laughing conversational model, contagion model, etc). There will be 3 phases for the system integration, one per year.

Regarding the evaluation studies, we will verify the contribution of expressive features in laugh through experimental procedures. Congruent and incongruent combinations of facial expression and body movement arising during acted and spontaneous laughter will be shown to participants.

Both quantitative and qualitative approaches (e.g., semi-structured interviews, effect on task performance) will be used to evaluate the level of emotional contagion triggered by the avatar's laughter expressions. The results will be used to refine the emotional contagion probabilistic model driving the ECA.

The WP will also establish the dimensions under which the evaluation will be conducted, and perform those evaluations, in particular in terms of acceptability, believability, added-value and impact.

Dissemination and Exploitation

This work package includes the creation of this web site for the dissemination of project results. Other communication channels will also be used through participation in conferences and creation of showcases, such as a laughter authenticity detector, a hilarious contagious laughter machine, and a laughter driven virtual agent. These showcases will be inspired by the evaluation WP.

Other actions will be carried out to maximize the reused potential of project knowledge and software, in connection with other projects of the partners.

Participants

University of Mons
CNRS
University of Augsburg
Università degli Studi of Genova
University College London
Queen's University Belfast
University of Zurich
Supélec
Cantoche

University of Mons - Belgium

The University of Mons (UMONS) is located in the beautiful city of Mons, Cultural Capital of Europe 2015. Its Research Center in Information Technologies and Numediart Institute for New Media Art Technology, gathering 70 researchers, are coordinating ILHAIRE. UMONS has an important track record in the area of speech technologies, a core activity since the early 90's. Non-verbal aspects of communication also emerged as important areas, with research tracks on voice quality, emotions, speaking styles, and laughter having been introduced more recently. Here, UMONS group is essentially focusing on the analysis and synthesis of laughter from an acoustic perspective, relying on a range of technologies for audio processing, statistical modeling, recognition, and generation.

Contact:

Dr Stéphane Dupont
http://www.umons.ac.be/poleti
http://www.numediart.org/

Centre National de la Recherche Scientifique - France

LTCI (Laboratoire de Traitement et Communication de l'Information) is a joint laboratory between CNRS and TELECOM ParisTech (UMR 5141). It hosts all the research efforts of TELECOM ParisTech (a faculty of about 150 full-time staff (full professors, associate and assistant professors), 30 full time researchers from CNRS and 300 Ph.D students). Its disciplines include all the sciences and techniques that fall within the term "Information and Communications": Computer Science Networks, Communications, Electronics, Signal and Image Processing, as well as the study of economic and social aspects associated with modern technology. ILHAIRE project will be undertaken in the Signal and Image Department which is deeply involved in studies about multimedia information processing, speech, images and video.

Contact:

Catherine Pelachaud
Radoslaw Niewiadomski
http://www.tsi.telecom-paristech.fr/mm/greta

University of Augsburg - Germany

The Laboratory for Human-Centered Multimedia (HCM) was founded in 2001 as part of the department of Computer Science at Augsburg University. The team brings profound experience in the design and evaluation of anthropomorphic interfaces that allow for more natural man-machine interaction using mimics, gestures, gaze and speech. In particular, they developed SSI, short for Social Signal Interpretation, a framework for multimodal signal processing in real-time that has already been used in a number of European projects, such as CALLAS, COMPANIONS or CEEDS, for recognizing user states, such as emotions, interest and engagement. In ILHAIRE, SSI will be used for multimodal laughter detection.

Contact:

Prof. Dr. Elisabeth André
http://www.uni-augsburg.de

Università degli Studi of Genova - Italy

Casa Paganini – InfoMus, Università degli Studi of Genova, Italy, carries on scientific research, design, development, and experimentation of multimodal interfaces and systems. The main research topic concerns multimodal interactive systems, integrating computational models of non-verbal expressive and social communication, with a special focus on human movement and gesture, audio, music, expressive multimodal interfaces, and social signal processing. Music, dance, theatre, museums, cultural institutions, therapy and rehabilitation, and education are the main test-beds and at the same time sources for inspiration for scientific research. Casa Paganini - InfoMus developed the EyesWeb software platform for real-time multimodal systems (www.eyesweb.org).

Contact:

Gualtiero Volpe

http://www.eyesweb.org

University College London - United Kingdom

UCL is London’s leading multidisciplinary university, ranked in the top ten universities in the world, with over 4,000 research and academic staff. Situated in the heart of one of the world’s most cosmopolitan cities, UCL takes a global view and is determined to engage with the world’s key problems. UCL Interaction Centre is a centre of excellence in Human-Computer Interaction (HCI) teaching and research. Our projects cover cognitive, affective, physical, social and technical aspects of HCI. We are therefore perfectly placed to contribute to ILHAIRE by investigating body movements associated with laughter and people’s psychological response to synthesised multimodal laughter.

Contact:

Dr Nadia Bianchi-Berthouze
http://www.ucl.ac.uk/uclic/
http://www.ucl.ac.uk/uclic/people/n_berthouze

Queen's University Belfast - United Kingdom

Queen's University Belfast is part of the prestigious Russell Group, which represents 20 leading UK universities committed to maintaining the very best research. The emotion research group is based in the School of Psychology and, under the leadership of Roddy Cowie, has an established track record in researching human-avatar emotional interactions.

Contact:

Professor Roddy Cowie
Dr Will Curran
http://www.qub.ac.uk

University of Zurich - Switzerland

WP5 is led by the section on Personality Psychology and Assessment at the University of Zurich (Switzerland). Prof. Dr. Willibald Ruch and his team have a longstanding interest and expertise in the study of humour, smiling and laughter with a special emphasis on classification and assessment (e.g., types of laughter using the Facial Action Coding System [FACS], or assessment of different laughter-related traits, experiments and laughter induction and its relation to humor and personality).

Contact:

Prof Dr Ruch
Jennifer Hofmann

http://www.psychologie.uzh.ch/fachrichtungen/perspsy.html

Supélec - France

Supelec's IMS research group (Information, Multimodality & Signal) is located in Metz, has expertise in audio, image, video and speech processing as well as statistical machine learning. The IMS groups undertakes research on statistical and bio-inspired machine learning applied to robotics and optimal control and is also involved in research concerning human-machine interaction and especially in optimal strategy learning for spoken dialogue systems. Its main contribution to the project is related to reinforcement learning and apprenticeship learning for imitation for the purpose of building a laughter-enabled dialogue system.

Contact:

Prof. Olivier Pietquin
http://ims.metz.supelec.fr/~pietquin/

Cantoche - France

Since 1996, Cantoche has been singled out as one of the world leaders in the animation of interactive characters. Cantoche is grounded in two unique areas of expertise: the artistic field for the creation and animation of characters, and the software development, with its proprietary Agent technologies, Living Actor™. More than 500 different Avatar have been created by Cantoche for international companies like Microsoft, Hewlett-Packard, Warner Bros, EDF, GDF, Alcatel-Lucent. Cantoche has also partnered with several research laboratories using Living Actor™ avatar (Imperial College of London, Austrian Research Institute of Artificial Intelligence, University of Twente, Open University Stanford University, etc.).

Contact:

Emeline Bantegnie
http://www.livingactor.com
http://cantoche.com